I frequently use sys.sql_modules
to search for substrings in T-SQL code modules (views, stored procedures, functions, triggers, etc.). While I know I can use "View Dependencies" to identify references to tables, views, and stored procedures, I generally prefer to use sys.sql_modules
because it lets me search for arbitrary strings.
Here's an example of querying for the T-SQL behind ufnGetCustomerInformation
in AdventureWorksLT (the sample database available for Azure SQL Database):
SELECTO.[name] AS ObjName, SM.[definition] FROM sys.sql_modules AS SM JOIN sys.objects AS O ON SM.[object_id] = O.[object_id] WHERE O.[name] = 'ufnGetCustomerInformation';
Before running this query, I recommend configuring the Results|Grid page of the Query Options for SSMS as follows. This can be done either on a per-query basis or in the SSMS Preferences. If you don't see all of these options, consider upgrading to SSMS 2017.
- Check "Retain CR/LF on copy or save"
- Under "Maximum Characters Retrieved", set "Non XML data" to "65535"
- Under "Maximum Characters Retrieved", set "XML data" to "Unlimited"
The output from this query includes the definition
for ufnGetCustomerInformation
. Copy the definition
into a new query window and observe that it uses the CustomerID
column. If we wanted to identify all T-SQL objects in the database that use CustomerID
, we might use the following query:
SELECTS.[name] + N'.' + O.[name] AS ObjName, SM.[definition] FROM sys.sql_modules AS SM JOIN sys.objects AS O ON SM.[object_id] = O.[object_id] JOIN sys.schemas AS S ON O.[schema_id] = S.[schema_id] WHERE SM.[definition] LIKE '%CustomerID%';
This lists two additional T-SQL objects, dbo.ufnGetAllCategories
and SalesLT.vGetAllCategories
. Copy the definition
for dbo.ufnGetAllCategories into a new query window and observe something interesting:
CREATE FUNCTION [dbo].[ufnGetAllCategories]() RETURNS @retCategoryInformation TABLE ( -- Columns returned by the function [ParentProductCategoryName] nvarchar(50) NULL, [ProductCategoryName] nvarchar(50) NOT NULL, [ProductCategoryID] int NOT NULL ) AS -- Returns the CustomerID, first name, and last name for the specified customer. BEGIN WITH CategoryCTE([ParentProductCategoryID], [ProductCategoryID], [Name]) AS ( SELECT [ParentProductCategoryID], [ProductCategoryID], [Name] FROM SalesLT.ProductCategory WHERE ParentProductCategoryID IS NULL ....
The actual T-SQL code contains no references to CustomerID
- the only occurrence is in a incorrect comment that likely resulted from poor copy-and-paste coding! The same copy-and-paste comment mistake shows up in SalesLT.vGetAllCategories. Unfortunately, there is no easy way to match against sys.sql_modules.definition
that will avoid matching within the comments, so it would be nice to remove the comments from the T-SQL before matching the pattern. Web searches identified a number of attempts by several DBAs and developers to programatically remove comments, but none of them seemed to meet all my requirements. My requirements, starting with the most important, were:
- Correctness. The code must properly ignore comment-like sequences in identifiers and string literals, handle corner cases such as
--
embedded in/* */
and vice-versa, etc. - No procedural code, no functions, no dependence on .NET, etc. I wanted a single T-SQL query that could run on its own.
- Understandable code. The above requirements (especially the correctness) meant this wouldn't be a simple query, but it should be cleanly laid out.
- Reasonable performance. Because this is used for investigating databases and not production execution, performance is a benefit, but not required.
I knew that, given the limitations of T-SQL, avoiding procedural code meant using recursive Common Table Expressions to iterate through the T-SQL code and match elements. I also knew that in order to address the requirement for correctness, I would need to build a comprehensive test suite that covered a whole host of T-SQL syntax edge and corner cases. In additional articles, I will go into more detail about the T-SQL syntax requirements as well as the implementation of the query and the test suite. This first article simply covers using the query.
To use the query, download SQL_Search_Decommenter.sql
and run it in the desired database. The query will return all objects with T-SQL code (views, stored procedures, functions, triggers, etc.) in the database. The results can be filtered by specifying patterns in the @NamePat
and @SQLPat
variables. I specify these with variables at the top to simplify locating and modifying them. The @DecommentedOnly
variable can be changed to 1
to return only objects whose decommented SQL matches @SQLPat
.(by default, the query returns all objects that match, along with the column DecommentedHasMatch
indicating whether the decommented code matches) Finally, the @MaxIter
variable can be used to control how many iterations of the recursive Common Table Expression will be permitted as part of the decommenting process.
The output columns are as follows:
-
ObjName
: 2-part name for the object -
DecommentedHasMatch
:1
indicates that@SQLPat
matched theDecommented
code,0
indicates no match -
Definition
: Original code fromsys.sql_modules
.NULL
if length of original code is greater than 65,535 characters (SSMS won't retrieve more than 65,535 characters from non-XML character datatypes). -
Decommented
: Code fromsys.sql_modules
with comments stripped./*...*/
is replaced with a space.--...\n
is replaced is with\r\n
.NULL
if length of original code is greater than 65,35 characters. -
Definition_in_XML
: If length of original code is greater than 65,535 characters, this contains the code embedded in an XML comment. Click on the XML, then copy the contents out of the XML tag and paste into an Query window. While I agree that a code module over 64 KB is a bit extreme, I have seen vendors ship triggers larger than 64K, as well as stored procedures that are multiple megabytes! -
Decommented_in_XML
: If length of original code is greater than 65,535 characters, this contains the decommented code embedded in an XML comment. Click on the XML, then copy the contents out of the XML tag and paste into an Query window. -
Iter
: Number of iterations required to decomment the code.
Using this query, we could proceed to search AdventureWorksLT
for references to CustomerID
that aren't in the comments. Update the @SQLPat
variable at the top of the query as follows and then run the query:
@SQLPat nvarchar(1000) = '%CustomerID%',
You should get three rows back, but only one of them will have 1
in the DecommentedHasMatch
column:
You should be able to inspect the Definition
and Decommented
columns to observe that the query effectively strips comments from the original Definition
and returns the decommented T-SQL in the Decommented
column. If you didn't want to see the objects that had CustomerID
only in the comments, you could set the @DecommentedOnly
variable at the top to 1
and the query would filter out those rows.
In the next article, I'll discuss the T-SQL comment behaviors that the code must support, how those were encoded to support a data-driven query design, and then demonstrate how the recursive query iterates through a sample piece of T-SQL to strip comments.