Stairway to ScriptDOM

Stairway to ScriptDOM Level 2 - Parsing, Tokens, and the Abstract Syntax Tree

,

Introduction

In the last article, I explained what is ScriptDOM and what are the different things this multifaceted tool can do. In this level, I will show how to install the library, explain how to parse scripts with this tool and understand the internals of how that works. I will also explain tokenizing with two examples of how useful that can be.

Installing ScriptDOM

The DacFX package that ScriptDOM is part of can be downloaded and installed from here. By default, the library is put in this directory: C:\Program Files\Microsoft SQL Server\150\DAC\bin\. The file name is Microsoft.SqlServer.TransactSql.ScriptDom.dll. It can be copied from here into any directory desired. There is nothing further needed to work with the tool.

Parsing with ScriptDOM

In this section I will be walking through some code line by line that teaches how to use the dll downloaded above to parse a script with PowerShell.

Line 1: Add a reference to the ScriptDOM DLL. Use the path as appropriate to the library.

Add-Type -Path "C:\ugpresentation\ScriptDom\Microsoft.SqlServer.TransactSql.ScriptDom.dll"

Next, we create the parser object from parser class. The version of the parser will depend on the compatibility level of SQL Server needed. The level goes from TSQL150parser(SQL Server 2019) to TSQL80parser(SQL Server 2008). We set the compatibility level we want to true.

$parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)

Let’s create an object to hold parsing errors, if any occur.

$parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]

Now we call the textreader to read the text stream from script to be parsed.

$stringReader = New-Object System.IO.StringReader($ScriptData)

The next line will invoke the parse method of parser object created in Step 1, and pass to it the text stream from Step 4. Use the parseerrors object created in Step 3 to hold errors.

$parsedObjects = $parser.Parse($stringReader, [ref]$parseErrors)

The complete script is shown below.

Add-Type -Path "C:\ugpresentation\ScriptDom\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
Write-Host "Attempting to parse..." 
$parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
#object to handle parsing errors if any
$parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
#object that handles strings script to parse
$stringReader = New-Object System.IO.StringReader($ScriptData)
#parser creates parsed objects from strings 
$parsedObjects = $parser.Parse($stringReader, [ref]$parseErrors)
if($parseErrors.Count -eq 0) {
        Write-Host "$($parser.GetType().Name): No parsing errors" -ForegroundColor Green
    }
    else {
        Write-Host "$($parser.GetType().Name): $($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))" -ForegroundColor Yellow
    }

To test the script, I pass a stored procedure from WideWorldImporters as a variable. I have introduced two syntax errors: line 25, where I removed the comma after primary key, and line 31, where I misspelt INSERT. Here is the script to pass the stored procedure to the script.

$ScriptData = @"
USE [WideWorldImporters]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE OR ALTER PROCEDURE [Website].[InsertCustomerOrders]
@Orders Website.OrderList READONLY,
@OrderLines Website.OrderLineList READONLY,
@OrdersCreatedByPersonID INT,
@SalespersonPersonID INT
WITH EXECUTE AS OWNER
AS
BEGIN
    SeT NOCOUNT ON;
    SET XACT_ABORT ON;
    DECLARE @OrdersToGenerate AS TABLE
    (
        OrderReference INT PRIMARY KEY  -- reference from the application
        OrderID INT
    );
    -- allocate the new order numbers
    INSET @OrdersToGenerate (OrderReference, OrderID)
    SELECT OrderReference, NEXT VALUE FOR Sequences.OrderID
    FROM
    @Orders;
    BeGIN TRY
        BEGIN TRAN;
        INSERT Sales.Orders
            (OrderID, CustomerID, SalespersonPersonID, PickedByPersonID, ContactPersonID, BackorderOrderID, OrderDate,
             ExpectedDeliveryDate, CustomerPurchaseOrderNumber, IsUndersupplyBackordered, Comments, DeliveryInstructions, InternalComments,
             PickingCompletedWhen, LastEditedBy, LastEditedWhen)
        SELECT otg.OrderID, o.CustomerID, @SalespersonPersonID, NULL, o.ContactPersonID, NULL, SYSDATETIME(),
               o.ExpectedDeliveryDate, o.CustomerPurchaseOrderNumber, o.IsUndersupplyBackordered, o.Comments, o.DeliveryInstructions, NULL,
               NULL, @OrdersCreatedByPersonID, SYSDATETIME()
        FROM @OrdersToGenerate AS otg
        InNER JOIN @Orders AS o
        ON otg.OrderReference = o.OrderReference;
        INSERT Sales.OrderLines
            (OrderID, StockItemID, [Description], PackageTypeID, Quantity, UnitPrice,
             TaxRate, PickedQuantity, PickingCompletedWhen, LastEditedBy, LastEditedWhen)
        SELECT otg.OrderID, ol.StockItemID, ol.[Description], si.UnitPackageID, ol.Quantity,
               Website.CalculateCustomerPrice(o.CustomerID, ol.StockItemID, SYSDATETIME()),
               si.TaxRate, 0, NULL, @OrdersCreatedByPersonID, SYSDATETIME()
        FROM @OrdersToGenerate AS otg
        INNER JOIN @OrderLines AS ol
        ON otg.OrderReference = ol.OrderReference
        INNER JOIN @Orders AS o
        ON ol.OrderReference = o.OrderReference
        INNER JOIN Warehouse.StockItems AS si
        ON ol.StockItemID = si.StockItemID;
        COMMIT;
    END TRY
    BEGIN CATCH
        IF XACT_STATE() <> 0 ROLLBACK;
        PRINT N'Unable to create the customer orders.';
        THROW;
        RETURN -1;
    END CATCH;
    RETURN 0;
END;
GO
"@
Add-Type -Path "C:\ugpresentation\ScriptDom\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
Write-Host "Attempting to parse..." 
$parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
#object to handle parsing errors if any
$parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
#object that handles strings script to parse
$stringReader = New-Object System.IO.StringReader($ScriptData)
#parser creates parsed objects from strings 
$parsedObjects = $parser.Parse($stringReader, [ref]$parseErrors)
if($parseErrors.Count -eq 0) {
        Write-Host "$($parser.GetType().Name): No parsing errors" -ForegroundColor Green
    }
    else {
        Write-Host "$($parser.GetType().Name): $($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))" -ForegroundColor Yellow
    }

On running the script, I get the output shown below. The parser shows errors on lines 25 and 31 and alerts that there are parsing errors on the script.

If I fix those errors and re-run the script, I get a message that the script is syntax error free, as below.

The Anatomy of the Parser

The parser is the backbone of ScriptDOM functionality. Therefore, understanding how it works is fundamental to anyone who wants to put the tool to good use. The components of the parser and how it works are shown here.

There are three basic steps involved in the parsing process.

  1. The text (script) passed is processed by ‘Lexer’  and converted into a stream of ‘tokens’.
  2. The ‘tokens’ are used by ‘parser’ to construct an ‘Abstract Syntax Tree’ – a tree-like representation of the syntax.
  3. The ‘syntax evaluator’ then matches the syntax rules against the tree and throws out any errors or approves the code as free of syntax errors.

Let us explore this with the script above. If we look at the output of the parser, $parsedobjects, at the PS Command prompt, we see this output:

The ‘scripttokenstream’ at the bottom is the token stream that is output by the lexer.

Looking deeper at what the scripttokenstream contains, we see this output:

For a simple line, like ‘USE WideWorldImporters’, we can see that each word in the script, including the whitespace between ‘USE’ and ‘WideWorldImporters’ has a token type. The token type indicates the  ‘type’ of the word, the offset in code at which it occurs, the line and column at which it occurs, and the text. There is also a function available for scripttokenstream, named ‘iskeyword’, that tells us if the token is a SQL Server keyword or not.

We can look at what this contains by using the simple script below:

$ScriptData = @"
--Testing tokens
USE WideWorldImporters
"@
Add-Type -Path "C:\ugpresentation\ScriptDom\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
Write-Host "Attempting to parse..." 
try {
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
    #object to handle parsing errors if any
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
    #object that handles strings script to parse
    $stringReader = New-Object System.IO.StringReader($ScriptData)
    #parser creates parsed objects from strings 
    $parsedObjects = $parser.Parse($stringReader, [ref]$parseErrors)
    $ctr = 0
     foreach ($token in $parsedobjects.ScriptTokenStream)
    {
        write-host "Text:          " $token.text  
        write-host "Type:          " $token.tokentype 
        write-host "Line:          " $parsedobjects.ScriptTokenStream[$ctr].Line  
        write-host "Column:        " $parsedobjects.ScriptTokenStream[$ctr].column 
        write-host "Reserved Word: " $token.IsKeyword()
        write-host  
        $ctr++
    } #token
    if($parseErrors.Count -gt 0) {
        throw "$($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))"
    }
    Write-Host "Complete!" -ForegroundColor Green
}
catch {
    throw
}

The output from this script is shown here:

The $parsedobjects object, the output of the parser method, contains the abstract syntax tree, or the tree representation of the syntax. The tokens are available at every level of this tree. Tokens can be useful in many ways. They can be used to find:

  1. A specific keyword in the text
  2. If a keyword is upper case or lower case.
  3. Finding text in Comment lines – single line or multi line comments.

Uses of Tokens

Below are two examples of token usage. The first one is about finding reserved words, and throw a warning if there is a reserved word that is not upper cased. The second example is about finding comments in code.

Example 1: Finding reserved words not in upper case.

Here is an example of a script that finds reserved words that are not upper cased. We first make sure the script is parsed and free of syntax errors as below.

  #parser creates parsed objects from strings
  $parsedObjects = $parser.Parse($stringReader, [ref]$parseErrors)

Then we take the output of the parser, look at the scripttokenstream. Looping through the tokenstream to find  keywords using the iskeyword function, and if that returns true, if the word is upper cased or not.

    $ctr = 0
    foreach ($token in $parsedobjects.ScriptTokenStream)
    {
        if ($token.IsKeyword() -and !($token.text -ceq $token.text.toupper()))

 

I run this code with an example, a stored procedure from WideWorldImporters in which there are two keywords not in upper case. These are ‘declare’ on line 22 and ‘inNer’ on line 49. Here is the script:

$ScriptData = @"
USE [WideWorldImporters]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
--John Doe added for demo reasons 3/20
CREATE OR ALTER PROCEDURE [Website].[InsertCustomerOrders]
@Orders Website.OrderList READONLY,
@OrderLines Website.OrderLineList READONLY,
@OrdersCreatedByPersonID INT,
@SalespersonPersonID INT
WITH EXECUTE AS OWNER
AS
BEGIN
    SET NOCOUNT ON;
    SET XACT_ABORT ON;
    declare @OrdersToGenerate AS TABLE
    (
        OrderReference INT PRIMARY KEY,   -- reference from the application
        OrderID INT
    );
    /*allocate the new order numbers
    added insert into temp table 
    John Doe 12th April 2022
    */    INSERT @OrdersToGenerate (OrderReference, OrderID)
    SELECT OrderReference, NEXT VALUE FOR Sequences.OrderID
    FROM
    @Orders;
    BEGIN TRY
        BEGIN TRAN;
        INSERT Sales.Orders
            (OrderID, CustomerID, SalespersonPersonID, PickedByPersonID, ContactPersonID, BackorderOrderID, OrderDate,
             ExpectedDeliveryDate, CustomerPurchaseOrderNumber, IsUndersupplyBackordered, Comments, DeliveryInstructions, InternalComments,
             PickingCompletedWhen, LastEditedBy, LastEditedWhen)
        SELECT otg.OrderID, o.CustomerID, @SalespersonPersonID, NULL, o.ContactPersonID, NULL, SYSDATETIME(),
               o.ExpectedDeliveryDate, o.CustomerPurchaseOrderNumber, o.IsUndersupplyBackordered, o.Comments, o.DeliveryInstructions, NULL,
               NULL, @OrdersCreatedByPersonID, SYSDATETIME()
        FROM @OrdersToGenerate AS otg
        InNER JOIN @Orders AS o
        ON otg.OrderReference = o.OrderReference;
        INSERT Sales.OrderLines
            (OrderID, StockItemID, [Description], PackageTypeID, Quantity, UnitPrice,
             TaxRate, PickedQuantity, PickingCompletedWhen, LastEditedBy, LastEditedWhen)
        SELECT otg.OrderID, ol.StockItemID, ol.[Description], si.UnitPackageID, ol.Quantity,
               Website.CalculateCustomerPrice(o.CustomerID, ol.StockItemID, SYSDATETIME()),
               si.TaxRate, 0, NULL, @OrdersCreatedByPersonID, SYSDATETIME()
        FROM @OrdersToGenerate AS otg
        INNER JOIN @OrderLines AS ol
        ON otg.OrderReference = ol.OrderReference
INNER JOIN @Orders AS o
ON ol.OrderReference = o.OrderReference
        INNER JOIN Warehouse.StockItems AS si
        ON ol.StockItemID = si.StockItemID;
        COMMIT;
    END TRY
    BEGIN CATCH
        IF XACT_STATE() <> 0 ROLLBACK;
        PRINT N'Unable to create the customer orders.';
        THROW;
        RETURN -1;
    END CATCH;
    RETURN 0;
END;
GO
"@

Add-Type -Path "C:\ugpresentation\ScriptDom\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
Write-Host "Attempting to parse..." 
try {
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
    #object to handle parsing errors if any
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
    #object that handles strings script to parse
    $stringReader = New-Object System.IO.StringReader($ScriptData)
    #parser creates parsed objects from strings 
    $parsedObjects = $parser.Parse($stringReader, [ref]$parseErrors)
    $ctr = 0
     foreach ($token in $parsedobjects.ScriptTokenStream)
    {
        if ($token.IsKeyword() -and !($token.text -ceq $token.text.toupper()))
        {
            write-host "Text $($token.text) is not upper case and found on line $($parsedobjects.ScriptTokenStream[$ctr].Line) Column $($parsedobjects.ScriptTokenStream[$ctr].column) " -BackgroundColor blue 
         }
        $ctr++
    } #token
    if($parseErrors.Count -gt 0) {
        throw "$($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))"
    }
    Write-Host "Complete!" -ForegroundColor Green
}
catch {
    throw
}

The output is shown below.

Example 2: Finding comments with a certain text in comments

In this example we want to find comments with name John Doe in them. There are two comments with his name, line 10 (a single line comment) and line 28, a multiline comment.

We first make sure the script is parsed and free of syntax errors as below.

#parser creates parsed objects from strings 
$parsedObjects = $parser.Parse($stringReader, [ref]$parseErrors)

Then we take the output of the parser, look at the scripttokenstream. Looping through the tokenstream to check if token type is a single line or multiline comment. If that returns true, if the word ‘John Doe’ is present in comments.  (to make it general we put ‘John Doe’ into a variable called $wordtofindincomments).

    foreach ($token in $parsedobjects.ScriptTokenStream)
    {
      if ($token.TokenType -eq 'SingleLinecomment' -or $token.TokenType -eq 'MultiLinecomment' -and $token.text -match $wordtofindincomments)

The complete script is as below.

$ScriptData = @"
USE [WideWorldImporters]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
--John Doe added for demo reasons 3/20
CREATE OR ALTER PROCEDURE [Website].[InsertCustomerOrders]
@Orders Website.OrderList READONLY,
@OrderLines Website.OrderLineList READONLY,
@OrdersCreatedByPersonID INT,
@SalespersonPersonID INT
WITH EXECUTE AS OWNER
AS
BEGIN
    SET NOCOUNT ON;
    SET XACT_ABORT ON;
    DECLARE @OrdersToGenerate AS TABLE
    (
        OrderReference INT PRIMARY KEY,   -- reference from the application
        OrderID INT
    );
    /*allocate the new order numbers
    added insert into temp table 
    John Doe 12th April 2022
    */    INSERT @OrdersToGenerate (OrderReference, OrderID)
    SELECT OrderReference, NEXT VALUE FOR Sequences.OrderID
    FROM
    @Orders;
    BEGIN TRY
        BEGIN TRAN;
        INSERT Sales.Orders
            (OrderID, CustomerID, SalespersonPersonID, PickedByPersonID, ContactPersonID, BackorderOrderID, OrderDate,
             ExpectedDeliveryDate, CustomerPurchaseOrderNumber, IsUndersupplyBackordered, Comments, DeliveryInstructions, InternalComments,
             PickingCompletedWhen, LastEditedBy, LastEditedWhen)
        SELECT otg.OrderID, o.CustomerID, @SalespersonPersonID, NULL, o.ContactPersonID, NULL, SYSDATETIME(),
               o.ExpectedDeliveryDate, o.CustomerPurchaseOrderNumber, o.IsUndersupplyBackordered, o.Comments, o.DeliveryInstructions, NULL,
               NULL, @OrdersCreatedByPersonID, SYSDATETIME()
        FROM @OrdersToGenerate AS otg
        InNER JOIN @Orders AS o
        ON otg.OrderReference = o.OrderReference;
        INSERT Sales.OrderLines
            (OrderID, StockItemID, [Description], PackageTypeID, Quantity, UnitPrice,
             TaxRate, PickedQuantity, PickingCompletedWhen, LastEditedBy, LastEditedWhen)
        SELECT otg.OrderID, ol.StockItemID, ol.[Description], si.UnitPackageID, ol.Quantity,
               Website.CalculateCustomerPrice(o.CustomerID, ol.StockItemID, SYSDATETIME()),
               si.TaxRate, 0, NULL, @OrdersCreatedByPersonID, SYSDATETIME()
        FROM @OrdersToGenerate AS otg
        INNER JOIN @OrderLines AS ol
        ON otg.OrderReference = ol.OrderReference
INNER JOIN @Orders AS o
ON ol.OrderReference = o.OrderReference
        INNER JOIN Warehouse.StockItems AS si
        ON ol.StockItemID = si.StockItemID;
        COMMIT;
    END TRY
    BEGIN CATCH
        IF XACT_STATE() <> 0 ROLLBACK;
        PRINT N'Unable to create the customer orders.';
        THROW;
        RETURN -1;
    END CATCH;
    RETURN 0;
END;
GO
"@
$wordtofindincomments = 'John Doe'
Add-Type -Path "C:\ugpresentation\ScriptDom\Microsoft.SqlServer.TransactSql.ScriptDom.dll"
Write-Host "Attempting to parse..." 
try {
    $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser($true)
    #object to handle parsing errors if any
    $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
    #object that handles strings script to parse
    $stringReader = New-Object System.IO.StringReader($ScriptData)
    #parser creates parsed objects from strings 
    $parsedObjects = $parser.Parse($stringReader, [ref]$parseErrors)
    $ctr = 0
     foreach ($token in $parsedobjects.ScriptTokenStream)
    {
        if ($token.TokenType -eq 'SingleLinecomment' -or $token.TokenType -eq 'MultiLinecomment' -and $token.text -match $wordtofindincomments)
        {
            write-host "Text $($wordtofindincomments) is found on line $($parsedobjects.ScriptTokenStream[$ctr].Line) Column $($parsedobjects.ScriptTokenStream[$ctr].column) " -BackgroundColor blue   
        }
        $ctr++
    } #token
    if($parseErrors.Count -gt 0) {
        throw "$($parseErrors.Count) parsing error(s): $(($parseErrors | ConvertTo-Json))"
    }
    Write-Host "Complete!" -ForegroundColor Green
}
catch {
    throw
}

Running this, the output is as below.

Summary

In this article, we saw:

  1. How to parse scripts using ScriptDOM
  2. The internal workings of the parser
  3. The token stream and its uses, with two examples.

In the next article we can look at finding anti-patterns with ScriptDOM.

This article is part of the parent stairway Stairway to ScriptDOM

Rate

5 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (2)

You rated this post out of 5. Change rating