UPDATE 2012-09-12 : For my latest adventures with TSQL Parsers please see this post
Every once in a while, I hit an issue that would require me to interrogate/parse some T-SQL code. Normally, I would shy away from this and attempt to solve the problem in some other way. I have written parsers before in the the past using LEX and YACC, and as much fun and awesomeness that path is, I couldnt justify the time it would take.
However, this week I have been faced with just such an issue and at the back of my mind I can remember reading through the SQLServer 2012 feature pack and seeing something called “Microsoft SQL Server 2012 Transact-SQL Language Service “. This is described there as :
“The SQL Server Transact-SQL Language Service is a component based on the .NET Framework which provides parsing validation and IntelliSense services for Transact-SQL for SQL Server 2012, SQL Server 2008 R2, and SQL Server 2008. “
Sounds just what I was after. Documentation is very scant on this so dont take what follows as best practice or best use, just a practice and a use.
Knowing what I was sort of looking for something, I found the relevant assembly in the gac which is the simply named ,’Microsoft.SqlServer.Management.SqlParser’.
Even knowing that you wont find much in terms of documentation if you do a web-search, but you will find the MSDN documentation that list the members and methods etc…
The “scanner” class sounded the most appropriate for my needs as that is described as “Scans Transact-SQL searching for individual units of code or tokens.”.
After a bit of poking, around the code i ended up with was something like
[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Management.SqlParser") | Out-Null $ParseOptions = New-Object Microsoft.SqlServer.Management.SqlParser.Parser.ParseOptions $ParseOptions.BatchSeparator = 'GO' $Parser = new-object Microsoft.SqlServer.Management.SqlParser.Parser.Scanner($ParseOptions) $Sql = "Create Procedure MyProc as Select top(10) * from dbo.Table" $Parser.SetSource($Sql,0) $Token=[Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]::TOKEN_SET $Start =0 $End = 0 $State =0 $IsEndOfBatch = $false $IsMatched = $false $IsExecAutoParamHelp = $false while(($Token = $Parser.GetNext([ref]$State ,[ref]$Start, [ref]$End, [ref]$IsMatched, [ref]$IsExecAutoParamHelp ))-ne [Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]::EOF) { try{ ($TokenPrs =[Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]$Token) | Out-Null $TokenPrs $Sql.Substring($Start,($end-$Start)+1) }catch{ $TokenPrs = $null } }
As you can see , the $Sql variable holds the sql to be parsed , that is pushed into the $Parser object using SetSource, and then we will use GetNext until the EOF token is returned. GetNext will also return the Start and End character positions within the source string of the parsed text.
This script’s output is :
TOKEN_CREATE
Create
TOKEN_PROCEDURE
Procedure
TOKEN_ID
MyProc
TOKEN_AS
as
TOKEN_SELECT
Select
TOKEN_TOP
top
TOKEN_INTEGER
10
TOKEN_FROM
from
TOKEN_ID
dbo
TOKEN_TABLE
Table
note that the ‘(‘, ‘)’ and ‘*’ characters have returned a token type that is not present in the Microsoft.SqlServer.Management.SqlParser.Parser.Tokens Enum that has caused an error which has been caught in the catch block.
Fun, Fun ,Fun , Simple T-SQL Parsing. Hope this helps someone in the same position, let me know how you get on.