Building a complicated query like this and ensuring that it has the correct behavior may not be easy, but it is achievable when guided by a comprehensive test suite. When I set out to write this query, I started by developing a test suite and only then started writing code. As I explored the T-SQL syntax, I continued to add test cases to validate edge cases. In general, I prefer to err on the side of having too many tests rather than too few, especially for code where there are wide variety of permutations possible for the input. The full test suite along with the decommenter rigged to run against it is in the attached file Decommenter_Test.sql.
The tests in the suite are defined in the temp table #tests
, which receives two populations of tests. The columns in #tests
are:
-
ID
, a unique identifier for each test -
OrigSQL
, the SQL to be decommented -
ValidSQL
, the correctly decommented SQL against which the query is validated -
MaxIter
, the iteration limit for the test.
The first INSERT
into #tests
contains hundreds of tests for a wide variety of scenarios. Here is a drastically redacted sample of this test population:
INSERT INTO #tests (ID, OrigSQL, ValidSQL, MaxIter) SELECT ID, CAST(OrigSQL AS nvarchar(max)) AS OrigSQL, CAST(ValidSQL AS nvarchar(max)) AS ValidSQL, 100 AS MaxIter FROM ( VALUES (0, NULL, NULL), (1, '', ''), (2, ' ', ' '), (3, 'S', 'S'), (108, ' /*Foo*/', ' '), (134, '/* Foo */'+CHAR(13)+CHAR(10)+ 'SELECT *'+CHAR(13)+CHAR(10)+ '/* Bar */'+CHAR(13)+CHAR(10)+ 'FROM Bar', ' '+CHAR(13)+CHAR(10)+ 'SELECT *'+CHAR(13)+CHAR(10)+ ' '+CHAR(13)+CHAR(10)+ 'FROM Bar' ), (203, '--'+CHAR(13)+CHAR(10)+'-Foo-', CHAR(13)+CHAR(10)+'-Foo-'), (310, '[]/**/', '[] '), (311, '[/**/]', '[/**/]'), (312, '/**/[]', ' []'), (416, '''/*Foo*/', '''/*Foo*/'), (530, '"/**/"""', '"/**/"""'), (950, REPLICATE('/* */ ', 50), REPLICATE(' ', 49) + '/* */ '), (NULL, NULL, NULL) ) AS V(ID, OrigSQL, ValidSQL) WHERE ID IS NOT NULL;
This batch of tests focuses on correctly and completely decommenting the T-SQL, so it uses a MaxIter
of 100
for all of the tests. The one exception to completely decommenting the T-SQL is the final test, 950
, which is used to validate that MaxIte
r is indeed set to 100
and that MAXRECURSION
at the end of the query is not set below 100
. Also note that I'm lazy and don't like having to remember to remove the comma on the last test, so the final (NULL, NULL, NULL)
is the record without a trailing comma and is excluded by the WHERE
clause.
The second batch of tests focuses on how the decommenter degrades when the iteration limit is reached. All of the tests in this batch share the same OrigSQL
, but as MaxIter
drops the ValidSQL
begins to include comments. At MaxIter
19
and above, all of the comments are removed. As MaxIter
drops to 18
and lower, the decommenter begins to leave comments at the end of the output as it runs out of iterations to remove the comments.
The test harness consists of the decommenter logic hooked up with #tests
as the source for the anchor rows. A fairly comprehensive list of columns is included in the Decomment
recursive CTE to support debugging. At the end of the whole query, all output rows from Decomment
are checked to see if the test passes.
FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Iter DESC) AS RowN FROM Decomment ) AS D CROSS APPLY ( SELECT CASE WHEN D.RowN = 1 AND D.IsFinished = 1 AND D.Iter <= D.MaxIter AND ( CAST(D.NewSQL AS varbinary(max)) = CAST(D.ValidSQL AS varbinary(max)) OR ( D.NewSQL IS NULL AND D.ValidSQL IS NULL ) ) THEN 1 WHEN D.RowN > 1 AND D.IsFinished = 0 AND D.Iter <= D.MaxIter THEN 1 ELSE 0 END AS TestPassed ) AS C1
The first thing is to determine whether a given row is the final iteration. This is handled by using ROW_NUMBER()
on the Decomment
output, partitioned by ID
and ordered on Iter
in descending order. When RowN
is 1
, it is the final iteration for that test.
The logic for determining TestPassed
has two main forks, one for the final iteration for a given test (RowN = 1)
and one for all of the prior iterations.
- For the final iteration, all of the following must be true for the test to pass:
IsFinished
must be1
,Iter
must not exceedMaxIter
, andD.NewSQL
andD.ValidSQL
must be identical (including trailing spaces, case, and so forth, checked using a cast tovarbinary(max)
), or both must beNULL
.
- For the prior iterations, the test is easier:
IsFinished
must be0
andIter
must not exceedMaxIter
.
The WHERE
clause isolates the rows where the test did not pass (C1.TestPassed = 0
). There is a commented out WHERE
clause illustrating debugging. If a test fails, substitute the failing test ID
into the alternative WHERE
clause and toggle which clause is commented out. All of the iterations for the failing test will be displayed, making it easier to understand the issue. This can also be used to investigate how the decommenter iteratively processes various tests.
The Decommenter_Test.sql
file is the definitive version of the decommenter - it's what I work from whenever I'm working on the decommenter code, because I need to know immediately if a change breaks any tests. However, to be useful, I want a decommenter that runs against sys.sql_modules
. That version is is SQL_Search_Decommenter.sql
, which was developed from Decommenter_Test.sql
. Whenever I update Decommenter_Test.sql
, I compare the two files and then make similar changes in SQL_Search_Decommenter.sql
to bring it into line. There are a number of differences between the two, both to adapt the code to sys.sql_modules
and to improve execution performance.
First, SQL_Search_Decommenter.sql
starts with a DECLARE
block. I use the DECLARE
block to keep the commonly adjusted query parameters in one place at the start of the query, making them easy to set.
DECLARE @NamePat nvarchar(1000) = '%', @SQLPat nvarchar(1000) = '%', @DecommentedOnly bit = 0, @MaxIter int = 32767;
-
@NamePat
: Matches the 2-part name for the object (e.g. to matchv_Foo*
, either use'%v_Foo%'
or use'dbo.v_Foo%'
, or to be precise'dbo.v[_]Foo%'
). Leave as'%'
to skip filtering by object name. -
@SQLPat
: Matches the SQL. For instance, to search fortbl_Foo
, use'%tbl_Foo%'
(or, to be precise,'%tbl[_]Foo%'
). By default, this will return objects that match in the commented SQL, whether or not the decommented SQL matches, with theDecommentedHasMatch
column indicating whether a match was found in the decommented SQL. -
@DecommentedOnly
: Switch from0
to1
to return only objects that match in the decommented SQL. -
@MaxIter
: Controls how many iterations to perform in the recursive query when decommenting. TheIter
column in the output indicates how many iterations were required to decomment the code. Setting this to a lower value will improve performance for lengthy objects that have lots of comments, string literals, and delimited (both"
and[
) identifiers, but at the expense of leaving comments in the decommented code once the limit is reached.
Next is the SourceSQL
Common Table Element (CTE), which handles retrieving module definitions and their two-part object names. It specifies name filtering using @NamePat
and filtering of the commented SQL using @SQLPat
. Note that there is a subtle bug here caused by matching the commented SQL prior to decommenting. Under some circumstances, @SQLPat
can match a string that shows up only in the decommented SQL. For instance, the pattern '%Long_Column_Name AS ShtColNm%'
wouldn't match the commented SQL 'SELECT Long_Column_Name/**/AS ShtColNm'
, but would match it after the decommenter replaces /**/
with a single space. The query could defer all @SQLPat
filtering until after all of sys.sql_modules
has been decommented, but that would have a substantial negative impact on performance. The query could be further complicated to detect patterns that might suffer from this issue (hint - they all have one or more of the Replacement
strings in them) and only defer matching in that scenario, coupled of course with OPTION (RECOMPILE)
. I think I'll live with the subtle bug and just avoid triggering it!
SourceSQL AS ( SELECT O.[object_id] AS ID, C1.ObjName, SM.[definition] AS OrigSQL FROM sys.sql_modules AS SM JOIN sys.objects AS O ON SM.[object_id] = O.[object_id] JOIN sys.schemas AS S ON O.[schema_id] = S.[schema_id] CROSS APPLY ( SELECT S.[name] + N'.' + O.[name] AS ObjName ) AS C1 WHERE C1.ObjName LIKE @NamePat AND SM.[definition] LIKE @SQLPat ),
This is straight forward, except for the use of CROSS APPLY
to avoid redundant calculation of ObjName
. It has negligible performance impact, but I dislike repeating myself in code.
The TokTypes
and Matchers
CTEs are identical to those in Decommenter_Test.sql
. There are some columns left out of the SELECT
clauses of the Decomment
CTE. As explained in the previous article, only the columns up through and including RemSQL
are required for the Decomment
CTE to function. All of the subsequent columns are only included in Decomment_Test.sql
to facilitate debugging, and so they are omitted in SQL_Search_Decommenter.sql
in order to reduce the size of the Worktable and improve performance. Finally, the one reference in the Decomment
CTE to Old.MaxIter
is replaced with @MaxIter
(since there is no need to control the iteration limit on a per-test basis).
An additional CTE, Decommented
, is added near the end to select only the final iteration for each anchor row, to use ID
to look up the ObjName
and Definition
columns from the SourceSQL
CTE, and to provide a clean interface for accessing the output from Decomment
.
Decommented AS ( SELECT D.ID, S.ObjName, CASE WHEN D.NewSQL LIKE @SQLPat THEN 1 ELSE 0 END AS DecommentedHasMatch, S.OrigSQL AS [Definition], NewSQL AS Decommented, Iter FROM Decomment AS D JOIN SourceSQL AS S ON D.ID = S.ID WHERE D.IsFinished = 1 )
Finally, there is the actual SELECT
clause that generates the result set:
SELECT ObjName, DecommentedHasMatch, CASE WHEN DATALENGTH([Definition]) <= 65535 * 2 THEN [Definition] ELSE NULL END AS [Definition], CASE WHEN DATALENGTH([Definition]) <= 65535 * 2 THEN Decommented ELSE NULL END AS Decommented, CASE WHEN DATALENGTH([Definition]) > 65535 * 2 THEN CAST(N'<?def --' + NCHAR(13)+NCHAR(10) + [Definition] + NCHAR(13)+NCHAR(10) + N'--?>' AS xml) ELSE NULL END AS Definition_in_XML, CASE WHEN DATALENGTH([Definition]) > 65535 * 2 THEN CAST(N'<?def --' + NCHAR(13)+NCHAR(10) + Decommented + NCHAR(13)+NCHAR(10) + N'--?>' AS xml) ELSE NULL END AS Decommented_in_XML, Iter FROM Decommented WHERE (@DecommentedOnly = 0 OR DecommentedHasMatch = 1) ORDER BY ObjName OPTION (MAXRECURSION 32767);
- Because SSMS won't retrieve more than 65,535 characters from non-XML text blocks, if the
DATALENGTH
(which counts trailing spaces) for[Definition]
is greater than twice 65,535 (nvarchar
values have two bytes per character), then[Definition]
andDecommented
are both output asNULL
and the SQL is embedded in XML columns (since SSMS can retrieve XML of unlimited size). - The filter on
DecommentedHasMatch
is controlled by@DecommentedOnly
. -
MAXRECURSION
is set to32767
since@MaxIter
will limit the recursion.
This completes the discussion of the T-SQL Decommenter. I hope that you find the code useful and that you learned something along the way!