In Level 1 of this series, we learned about what ScriptDOM is and what it can do. In Level 2, we learned a few terms around ScriptDOM, such as Abstract Syntax Tree, Tokens and how the parser works. In Level 3, we explored the abstract syntax tree in detail and how to find patterns using the visitor class. In this article, we will look at using more than one visitor class in same script so that we can find patterns where there is a need to find more than one way of anti-pattern usage.
More than One Way of Expressing Anti-patterns
The two most common anti-patterns we have been exploring are NOLOCK hint and SELECT *. Both of these can be more nuanced than what we commonly find. NOLOCK is not just expressed in a query as a hint, SET TRANSACTION LEVEL READ UNCOMMITTED does the same thing and it is important to find usage of that as well. SELECT * is not always bad, there are cases where it is allowable – such as selecting from a temp table or in an IF…EXISTS clause. If linting scripts do not find these cases, it would be a half-baked script that does not satisfy the needs of an automated linter.
Using these examples, we can understand how to create scripts with more than one visitor pattern. Depending on the linting rule involved, these scripts can be very complex or relatively simple.
Example 1: Expanding on SELECT *
What are scenarios under which SELECT * usage is permissible? Some of them are in IF..EXISTS, IF..NOT EXISTS, and selecting from a table variable. There are more, but these are some we are going to consider. Let us look at the script shown below, which has all these occurrences and more.
USE [WideWorldImporters] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE OR ALTER PROCEDURE [Website].[InsertCustomerOrders] @Orders Website.OrderList READONLY, @OrderLines Website.OrderLineList READONLY, @OrdersCreatedByPersonID INT, @SalespersonPersonID INT WITH EXECUTE AS OWNER AS BEGIN SET NOCOUNT ON; SET XACT_ABORT ON; DECLARE @OrdersToGenerate AS TABLE ( OrderReference INT PRIMARY KEY, -- reference from the application OrderID INT ); -- allocate the new order numbers IF NOT EXISTS (SELECT * FROM Sales.Orders) BEGIN RETURN; END INSERT @OrdersToGenerate (OrderReference, OrderID) SELECT * FROM dbo.Orders (NOLOCK); SELECT * FROM @OrdersToGenerate; SELECT * FROM dbo.Orders; SET TRANSACTION ISOLATION LEVEL READ uncommitted; BEGIN TRY --Comment 1 IF EXISTS (SELECT * FROM Sales.Orders) BEGIN BEGIN TRAN; INSERT Sales.Orders (OrderID, CustomerID, SalespersonPersonID, PickedByPersonID, ContactPersonID, BackorderOrderID, OrderDate, ExpectedDeliveryDate, CustomerPurchaseOrderNumber, IsUndersupplyBackordered, Comments, DeliveryInstructions, InternalComments, PickingCompletedWhen, LastEditedBy, LastEditedWhen) SELECT otg.OrderID, o.CustomerID, @SalespersonPersonID, NULL, o.ContactPersonID, NULL, SYSDATETIME(), o.ExpectedDeliveryDate, o.CustomerPurchaseOrderNumber, o.IsUndersupplyBackordered, o.Comments, o.DeliveryInstructions, NULL, NULL, @OrdersCreatedByPersonID, SYSDATETIME() from dbo.OrdersToGenerate AS otg INNER JOIN dbo.Orders AS o ON otg.OrderReference = o.OrderReference; --Comment 2 insert Sales.OrderLines (OrderID, StockItemID, [Description], PackageTypeID, Quantity, UnitPrice, TaxRate, PickedQuantity, PickingCompletedWhen, LastEditedBy, LastEditedWhen) SELECT * FROM dbo.OrdersToGenerate AS otg INNER JOIN OrderLines (nolock) AS ol ON otg.OrderReference = ol.OrderReference INNER JOIN dbo.Orders AS o ON ol.OrderReference = o.OrderReference INNER JOIN Warehouse.StockItems (nolock) AS si ON ol.StockItemID = si.StockItemID; COMMIT; END END TRY BEGIN CATCH THROW; RETURN -1; END CATCH; RETURN 0; END; GO
In this script, we have
- SELECT * in IF..NOT EXISTS on line 32
- SELECT * in an Insert into a table variable as source from a regular table on line 38
- SELECT * from table variable on line 41
- SELECT * from a table on line 43
- SELECT * in IF.. EXISTS on line 50
- SELECT * from a table as source to another table on line 67
The linting script only needs to throw an alert on conditions in 2,4 and 6, and ignore the rest.
If we use the same script we used in earlier post, we will run this code:
# Find references to select * in views/stored procs/functions function Find-SelectStarWithPattern { [CmdletBinding()] param( $TSQLFragmentForRule ) Try { class VisitorSelectStar: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlConcreteFragmentVisitor { [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.SelectStarExpression] $fragment) { $errorline = "WARNING: 'SELECT * found at " write-host $errorline $fragment.StartLine ":" $fragment.StartColumn ":" $fragment.FragmentLength -BackgroundColor red } } $visitorselectstar = [VisitorSelectStar]::new() $tSqlFragmentforrule.Accept($visitorselectstar) } #end try catch { throw } }
The results are shown below:
It is clear that the script needs to be refined to leave out patterns that are not undesirable, those from lines other than 38, 43 and 67. There are multiple ways we can go about this. The way I chose is to find additional info on the select * - where and how it is used, and put the results into an object, called lintererror. I then filter out the acceptable occurrences before showing the final errors.
I declare the lintererror object as below. I have made the assumption that there is only one SELECT * on each line of this procedure, which is reasonable.
[CmdletBinding()] param( [Parameter(Mandatory = $True)] [string]$CheckName, [Parameter(Mandatory = $True)] [string]$ErrorMessage, [int]$StartLine ) return [pscustomobject]@{ CheckName = $CheckName ErrorMessage = $ErrorMessage StartLine = $StartLine } }
For finding additional info on the select *, I’ve used a visitor pattern, called ‘queryexpression’. This is shown below. This gets called on every query in the stored procedure and looks to see if there is a variable used in the FROM clause. This helps me find out if the select * is being run from a table variable, which is an acceptable condition, and bypass checking for select * if this is true.
#Look to see if select is from temp table or table variable and not search for any select star there class VisitorQueryExpression: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragmentVisitor { $lintererror = @() $selectstarok = 0 [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.QueryExpression] $fragment) { $tablecount = $fragment.fromclause.tablereferences.count $this.selectstarok = 0 if ($tablecount -eq 1 ) { if ($fragment.fromclause.tablereferences[0].gettype().name -eq 'VariableTableReference') { $this.selectstarok = 1 } } if ($this.selectstarok -eq 0) { $visitorselectstar = [VisitorSelectStar]::new() $Fragment.Accept($visitorselectstar) $this.lintererror += $visitorselectstar.lintererror } } }
I also use another visitor class, called IfExpression, which finds all the ‘if’ statements in the code and checks if there is a select * used. If yes , put that into an array.
Why not leave it out? We can’t leave it out because the select * would have been found in the earlier queryexpression visitor class already. We have to put it somewhere and then exclude it from final results.
#Get instances of select * in if statements and put it into a hash table class VisitorIfStatement: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlConcreteFragmentVisitor { [boolean] $ifstatement $iflintererror = @() [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.IfStatement] $fragment ) { $stmttype = $fragment.predicate.GetType().Name #Look for if exists if ($stmttype -eq 'ExistsPredicate') { $selectelementcount = $fragment.Predicate.subquery.queryExpression.selectelements.count $selectelement = 0 while ($selectelement -lt $selectelementcount) { if ($fragment.predicate.subquery.queryexpression.selectelements[$selectelement].gettype().name -eq 'SelectStarExpression') { $this.iflintererror += New-LinterError -checkname "Find-SelectStarName" -errormessage "SELECT * found at " -startline $fragment.StartLine } $selectelement++ } } #Look for if not exists if ($stmttype -eq 'BooleanNotExpression') { $selectelementcount = $fragment.Predicate.Expression.subquery.queryExpression.selectelements.count $selectelement = 0 while ($selectelement -lt $selectelementcount) { if ($fragment.predicate.Expression.subquery.queryexpression.selectelements[$selectelement].gettype().name -eq 'SelectStarExpression') { $this.iflintererror += New-LinterError -checkname "Find-SelectStarName" -errormessage "IF SELECT * found at " -startline $fragment.StartLine } $selectelement++ } } } }
In the end, the script compares what both these visitor classes found.
#Compare errors in all select star instances and those in if statement and only throw those that are not in if error array $diffobject = @() #$diffobject = Compare-Object -ReferenceObject $visitorif.iflintererror -DifferenceObject $visitorqueryexpression.lintererror $i = 0 for($i=0 ; $i -lt $visitorqueryexpression.lintererror.Length; $i++) { if ($visitorif.iflintererror.startline -contains $visitorqueryexpression.lintererror.startline[$i]) { #Exclude select * in if exists } else { $diffobject += $visitorqueryexpression.lintererror[$i] } } } $i = 0 for($i=0 ; $i -lt $diffobject.lintererror.Length; $i++) { write-host $diffobject.errormessage[$i] $diffobject.startline[$i] }
The complete script is here:
function Find-SelectStar { [CmdletBinding()] param( $TSQLFragment ) Try { #Get instances of select * in if statements and put it into a hash table class VisitorIfStatement: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlConcreteFragmentVisitor { [boolean] $ifstatement $iflintererror = @() [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.IfStatement] $fragment ) { $stmttype = $fragment.predicate.GetType().Name #Look for if exists if ($stmttype -eq 'ExistsPredicate') { $selectelementcount = $fragment.Predicate.subquery.queryExpression.selectelements.count $selectelement = 0 while ($selectelement -lt $selectelementcount) { if ($fragment.predicate.subquery.queryexpression.selectelements[$selectelement].gettype().name -eq 'SelectStarExpression') { $this.iflintererror += New-LinterError -checkname "Find-SelectStarName" -errormessage "SELECT * found at " -startline $fragment.StartLine } $selectelement++ } } #Look for if not exists if ($stmttype -eq 'BooleanNotExpression') { $selectelementcount = $fragment.Predicate.Expression.subquery.queryExpression.selectelements.count $selectelement = 0 while ($selectelement -lt $selectelementcount) { if ($fragment.predicate.Expression.subquery.queryexpression.selectelements[$selectelement].gettype().name -eq 'SelectStarExpression') { $this.iflintererror += New-LinterError -checkname "Find-SelectStarName" -errormessage "IF SELECT * found at " -startline $fragment.StartLine } $selectelement++ } } } } #Get all instances of select star class VisitorSelectStar: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlConcreteFragmentVisitor { $lintererror = @() [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.SelectStarExpression] $fragment ) { $this.lintererror = New-LinterError -checkname "Find-SelectStarName" -errormessage "SELECT * found at " -startline $fragment.StartLine } } #Look to see if select is from temp table or table variable and not search for any select star there class VisitorQueryExpression: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragmentVisitor { $lintererror = @() $selectstarok = 0 [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.QueryExpression] $fragment) { $tablecount = $fragment.fromclause.tablereferences.count $this.selectstarok = 0 if ($tablecount -eq 1 ) { if ($fragment.fromclause.tablereferences[0].gettype().name -eq 'VariableTableReference') { $this.selectstarok = 1 } } if ($this.selectstarok -eq 0) { $visitorselectstar = [VisitorSelectStar]::new() $Fragment.Accept($visitorselectstar) $this.lintererror += $visitorselectstar.lintererror } } } $visitorqueryexpression = [VisitorQueryExpression]::new() $TSQLFragment.Accept($visitorqueryexpression) $visitorIf = [VisitorIfStatement]::new() $TSQLFragment.Accept($visitorIf) $lintererror = @() If ($visitorif.iflintererror -eq $null) { RETURN $visitorqueryexpression.lintererror } else { #Compare errors in all select star instances and those in if statement and only throw those that are not in if error array $diffobject = @() #$diffobject = Compare-Object -ReferenceObject $visitorif.iflintererror -DifferenceObject $visitorqueryexpression.lintererror $i = 0 for($i=0 ; $i -lt $visitorqueryexpression.lintererror.Length; $i++) { if ($visitorif.iflintererror.startline -contains $visitorqueryexpression.lintererror.startline[$i]) { #Exclude select * in if exists } else { $diffobject += $visitorqueryexpression.lintererror[$i] } } } $i = 0 for($i=0 ; $i -lt $diffobject.lintererror.Length; $i++) { write-host $diffobject.errormessage[$i] $diffobject.startline[$i] } } #end try catch { throw } } function Find-LintingErrors { #selectstatement contains the statement to be examined for existence of select * [CmdletBinding()] param( [string] $SelectStatement ) try { Add-Type -Path "C:\ugpresentation\ScriptDom\Microsoft.SqlServer.TransactSql.ScriptDom.dll" $DDLParser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true) $DDLparserErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError] # create a StringReader for the script for parsing $stringReader = New-Object System.IO.StringReader($selectstatement) # parse the script $tSqlFragment = $DDLParser.Parse($stringReader, [ref]$DDLParsererrors) # raise an exception if any parsing errors occur if($DDLParsererrors.Count -gt 0) { throw "$($DDLParsererrors.Count) parsing error(s): $(($DDLParsererrors | ConvertTo-Json))" } Find-SelectStar $tSqlFragment } catch { throw } }
I can now run this against my code with this script:
$ScriptData = @" USE [WideWorldImporters] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE OR ALTER PROCEDURE [Website].[InsertCustomerOrders] @Orders Website.OrderList READONLY, @OrderLines Website.OrderLineList READONLY, @OrdersCreatedByPersonID INT, @SalespersonPersonID INT WITH EXECUTE AS OWNER AS BEGIN SET NOCOUNT ON; SET XACT_ABORT ON; DECLARE @OrdersToGenerate AS TABLE ( OrderReference INT PRIMARY KEY, -- reference from the application OrderID INT ); -- allocate the new order numbers IF NOT EXISTS (SELECT * FROM Sales.Orders) BEGIN RETURN; END INSERT @OrdersToGenerate (OrderReference, OrderID) SELECT * FROM dbo.Orders (NOLOCK); SELECT * FROM @OrdersToGenerate; SELECT * FROM dbo.Orders; SET TRANSACTION ISOLATION LEVEL READ uncommitted; BEGIN TRY --Comment 1 IF EXISTS (SELECT * FROM Sales.Orders) BEGIN BEGIN TRAN; INSERT Sales.Orders (OrderID, CustomerID, SalespersonPersonID, PickedByPersonID, ContactPersonID, BackorderOrderID, OrderDate, ExpectedDeliveryDate, CustomerPurchaseOrderNumber, IsUndersupplyBackordered, Comments, DeliveryInstructions, InternalComments, PickingCompletedWhen, LastEditedBy, LastEditedWhen) SELECT otg.OrderID, o.CustomerID, @SalespersonPersonID, NULL, o.ContactPersonID, NULL, SYSDATETIME(), o.ExpectedDeliveryDate, o.CustomerPurchaseOrderNumber, o.IsUndersupplyBackordered, o.Comments, o.DeliveryInstructions, NULL, NULL, @OrdersCreatedByPersonID, SYSDATETIME() from dbo.OrdersToGenerate AS otg INNER JOIN dbo.Orders AS o ON otg.OrderReference = o.OrderReference; --Comment 2 insert Sales.OrderLines (OrderID, StockItemID, [Description], PackageTypeID, Quantity, UnitPrice, TaxRate, PickedQuantity, PickingCompletedWhen, LastEditedBy, LastEditedWhen) SELECT * FROM dbo.OrdersToGenerate AS otg INNER JOIN OrderLines (nolock) AS ol ON otg.OrderReference = ol.OrderReference INNER JOIN dbo.Orders AS o ON ol.OrderReference = o.OrderReference INNER JOIN Warehouse.StockItems (nolock) AS si ON ol.StockItemID = si.StockItemID; COMMIT; END END TRY BEGIN CATCH THROW; RETURN -1; END CATCH; RETURN 0; END; GO "@ Find-LintingErrors $ScriptData
I get these results:
The patterns that are not errors are left out and only valid occurrences of bad select * usage are found.
Example 2: Nolock at different levels
In this example, we are required to find occurrences of nolock that include finding it at query level and the isolation level declaration. This is easier than the earlier example, since there are no exceptions to nolock usage. We only need to use two visitor classes and both will help us find what we need to find.
In code below there are two visitor class declarations. One for finding hints at query level, called class VisitorTableHintRef. The other for finding isolation level declaration, called VisitorSetOnOffRule. By invoking both as below, it is possible to find nolock declaration at statement level and isolation level.
$visitortablehintref = [VisitorTableHintRef]::new() $tSqlFragmentforrule.Accept($visitortablehintref) $visitorsetonoffrule = [VisitorSetOnOffRule]::new() $tSqlFragmentforrule.Accept( $visitorsetonoffrule) #Find no lock hint in views/stored procs/functions function Find-NoLockHintWithPattern { [CmdletBinding()] param( $tsqlfragmentforrule ) Try { class VisitorTableHintRef: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragmentVisitor { [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.TableHint] $fragment) { $tablehint = $fragment if ($tablehint.HintKind -eq 'Nolock') { $errorline = "WARNING: NOLOCK hint found at " write-host $errorline $fragment.StartLine ":" $fragment.StartColumn ":" $fragment.FragmentLength -BackgroundColor Red } } } class VisitorSetOnOffRule: Microsoft.SqlServer.TransactSql.ScriptDom.TSQLFragmentVisitor { #Initializing variables to param values as they can't be accessed inside visit [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.SetTransactionIsolationLevelStatement ] $fragment) { #Making sure use statement is the first if ($fragment.level -eq 'READUNCOMMITTED') { $errorline = "WARNING: NOLOCK isolation level found at " write-host $errorline $fragment.StartLine ":" $fragment.StartColumn ":" $fragment.FragmentLength -BackgroundColor Red } } } $visitortablehintref = [VisitorTableHintRef]::new() $tSqlFragmentforrule.Accept($visitortablehintref) $visitorsetonoffrule = [VisitorSetOnOffRule]::new() $tSqlFragmentforrule.Accept( $visitorsetonoffrule) } catch { throw } }
The stored procedure provided as input is shown below. It has nolock at the statement level on lines 39, 69 and 73. It also has an isolation level declaration on line 46. (Typically declaring nolock at isolation level does not need further statement level usage, but in this example, it is used to show that code can catch both).
$ScriptData = @" USE [WideWorldImporters] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE OR ALTER PROCEDURE [Website].[InsertCustomerOrders] @Orders Website.OrderList READONLY, @OrderLines Website.OrderLineList READONLY, @OrdersCreatedByPersonID INT, @SalespersonPersonID INT WITH EXECUTE AS OWNER AS BEGIN SET NOCOUNT ON; SET XACT_ABORT ON; DECLARE @OrdersToGenerate AS TABLE ( OrderReference INT PRIMARY KEY, -- reference from the application OrderID INT ); -- allocate the new order numbers IF NOT EXISTS (SELECT * FROM Sales.Orders) BEGIN RETURN; END INSERT @OrdersToGenerate (OrderReference, OrderID) SELECT * FROM dbo.Orders (NOLOCK); SELECT * FROM @OrdersToGenerate; SELECT * FROM dbo.Orders; SET TRANSACTION ISOLATION LEVEL READ uncommitted; BEGIN TRY --Comment 1 IF EXISTS (SELECT * FROM Sales.Orders) BEGIN BEGIN TRAN; INSERT Sales.Orders (OrderID, CustomerID, SalespersonPersonID, PickedByPersonID, ContactPersonID, BackorderOrderID, OrderDate, ExpectedDeliveryDate, CustomerPurchaseOrderNumber, IsUndersupplyBackordered, Comments, DeliveryInstructions, InternalComments, PickingCompletedWhen, LastEditedBy, LastEditedWhen) SELECT otg.OrderID, o.CustomerID, @SalespersonPersonID, NULL, o.ContactPersonID, NULL, SYSDATETIME(), o.ExpectedDeliveryDate, o.CustomerPurchaseOrderNumber, o.IsUndersupplyBackordered, o.Comments, o.DeliveryInstructions, NULL, NULL, @OrdersCreatedByPersonID, SYSDATETIME() from dbo.OrdersToGenerate AS otg INNER JOIN dbo.Orders AS o ON otg.OrderReference = o.OrderReference; --Comment 2 insert Sales.OrderLines (OrderID, StockItemID, [Description], PackageTypeID, Quantity, UnitPrice, TaxRate, PickedQuantity, PickingCompletedWhen, LastEditedBy, LastEditedWhen) SELECT * FROM dbo.OrdersToGenerate AS otg INNER JOIN OrderLines (nolock) AS ol ON otg.OrderReference = ol.OrderReference INNER JOIN dbo.Orders AS o ON ol.OrderReference = o.OrderReference INNER JOIN Warehouse.StockItems (nolock) AS si ON ol.StockItemID = si.StockItemID; COMMIT; END END TRY BEGIN CATCH THROW; RETURN -1; END CATCH; RETURN 0; END; GO "@ Find-LintingErrors $ScriptData function Find-LintingErrors { #selectstatement contains the statement to be examined for existence of select * [CmdletBinding()] param( [string] $SelectStatement ) try { Add-Type -Path "C:\ugpresentation\ScriptDom\Microsoft.SqlServer.TransactSql.ScriptDom.dll" $DDLParser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true) $DDLparserErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError] # create a StringReader for the script for parsing $stringReader = New-Object System.IO.StringReader($selectstatement) # parse the script $tSqlFragment = $DDLParser.Parse($stringReader, [ref]$DDLParsererrors) # raise an exception if any parsing errors occur if($DDLParsererrors.Count -gt 0) { throw "$($DDLParsererrors.Count) parsing error(s): $(($DDLParsererrors | ConvertTo-Json))" } Find-NoLockHintWithPattern $tSqlFragment } catch { throw } }
The results are shown here.
Summary
In this post went into some ways of finding nuanced anti patterns in code using scriptdom. There are no set ways to finding these patterns. When one gains an understanding of how visitor patterns work, it is possible to come up with a programmatic solution to specific needs. In the last post we will go into fixing some anti-patterns and the pros and cons of ScriptDOM usage.