I’m working on a project where I need to convert Firebird SQL code into T-SQL code.
No schema, just the modules. There are more than 1000 objects between stored procedures, views, triggers, user-defined data types, etc.
First – the pain…
While checking the Firebird reference manuals I saw a lot of different concepts (Selectable Stored Procedures – Yes you can do SELECT FROM StoredProcedure) and different functions names with different syntax compared to T-SQL.
With this in mind, can you imagine doing a code migration between two different SQL flavours that contains more than 60000 lines of code by hand?
and…the false alarm – My approach
When I started to think about the possible approaches to this huge amount of work I will have, one of the ideas (and actually the one I ended up doing) was write a find/replace function in PowerShell that can receive an hashtable with “pattern to search” and the “pattern to replace”.
Because the -replace
method allows the use of regular expressions this become very powerful.
Don’t reinvent the wheel
I did a search and found a gist from Matthew Steeples that has a function to do a find/replace of a single string on a file using regular expressions.
I picked this script and I have adapted to my reality. This means, I have changed the code in order to:
- Accept an ordered hashtable and this way do multiple changes.
- Open the file and do all the changes before save it again.
You can find on my GitHub repository the Set-Expression PowerShell function I ended with.
Filling the hastable
To fill the hashtable I had to make a match between the two SQL flavours.
Just a couple of examples to ilustrate the differences:
Firebird has the similar
function which translate to LIKE '%%'
on T-SQL
Other example is the SUBSTRING
function which has the following structure SUBSTRING([field] from [start_position] for [number of chars])
which, in order to translate for T-SQL, we just need to replace the “from” and “for” words by a comma (“,”).
Everything was going well, work was progressing at a good pace until…
The pain strikes back
These databases make a heavy use of triggers. The main use of it is to generate a new ID to a column based on a Firebird DOMAIN object (SEQUENCES in T-SQL) on BEFORE INSERT
(INSTEAD OF in T-SQL) triggers.
Note: “But they can use IDENTITY
columns…” – Exactly what I have proposed, and for some tables they have implemented but for the majority it wasn’t possible.
Firebird make this somehow easy. This is a code example from an Firebird trigger:
CREATE OR ALTER TRIGGER Customers_BI FOR Customers ACTIVE BEFORE INSERT POSITION 0 as begin if (new.id is null) then new.id = gen_id(gen_Customers_id,1); end;
in T-SQL syntax, we need to set the whole INSERT
statement with all columns from the INSERTED
table. The equivalent code can be something like this:
CREATE TRIGGER dbo.Customers_BI ON dbo.Customers INSTEAD OF INSERT AS BEGIN IF EXISTS ( SELECT 1 FROM INSERTED WHERE Inserted.ID IS NULL ) BEGIN INSERT INTO dbo.Customers (ID, <other columns> SELECT NEXT VALUE FOR dbo.gen_Customers_id , <other columns> FROM INSERTED; END END;
“so you need to replace 1 line with an INSERT
statement with all columns from the table…”
Yes that much work..on almost 300 triggers! Feeling the pain, right?
Here is where the SQL Server metadata joins the party!
This is just an example, but with it you can imagine the others
I wrote a piece of T-SQL (not pretty but it works) that creates, for each table, the hashtable entry for my find/replace function.
SELECT A.TABLE_NAME , '''(FOR )(\b' + A.TABLE_NAME + '\b)((.|\r\n){1,})(\bACTIVE BEFORE INSERT POSITION 0\b)(\r\n)(AS)(\r\n)(BEGIN)(\r\n)((.|\n){1,})(gen_id\()(\w{1,})((.|\r\n)*?)(?=end;)(.*)'' = "ON `$2`$3INSTEAD OF INSERT`$3`$7`$8`$9`$10`$11`$12 `$3INSERT INTO `$2 ( ' + A.ColumnList + ' ) `$3SELECT NEXT VALUE FOR dbo.`$14, ' + A.ColumnList + ' FROM INSERTED `$3 --`$13`$14`$15`$16`$17"' , '''(FOR )(\b' + A.TABLE_NAME + '\b)((.|\r\n){1,})(\bACTIVE BEFORE UPDATE POSITION 0\b)(\r\n)(AS)(\r\n)(BEGIN)(\r\n)((.|\r\n){1,})((.|\r\n)*?)(?=end;)(.*)'' = "ON `$2`$3INSTEAD OF UPDATE`$3`$7`$8`$9 `$3UPDATE `$2 `$3SET ' + A.ColumnListUpdate + ', `$10`$11 FROM INSERTED `$3 `$15"' FROM ( SELECT DISTINCT ISC.TABLE_NAME , STUFF(( SELECT ', ' + ISC1.COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS AS ISC1 WHERE OBJECT_ID(ISC1.TABLE_NAME) = OBJECT_ID(ISC.TABLE_NAME) ORDER BY ISC1.ORDINAL_POSITION FOR XML PATH('') ) , 1 , 1 , '' ) AS ColumnList , STUFF(( SELECT ', ' + ISC1.COLUMN_NAME + ' = ' + ISC1.COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS AS ISC1 WHERE OBJECT_ID(ISC1.TABLE_NAME) = OBJECT_ID(ISC.TABLE_NAME) ORDER BY ISC1.ORDINAL_POSITION FOR XML PATH('') ) , 1 , 1 , '' ) AS ColumnListUpdate FROM INFORMATION_SCHEMA.COLUMNS AS ISC INNER JOIN sys.sysobjects AS so ON ISC.TABLE_NAME = so.name AND so.type = 'U' WHERE ISC.TABLE_NAME NOT IN ('sysdiagrams', 'Numbers') ) AS A ORDER BY A.TABLE_NAME;
Let’s say I have a Customers
table with 3 columns ID, NAME, COMPANY_NAME
the output will be:
Example of the full string for the INSTEAD OF INSERT trigger:
'(FOR )(\bCustomers\b)((.|\r\n){1,})(\bACTIVE BEFORE INSERT POSITION 0\b)(\r\n)(AS)(\r\n)(BEGIN)(\r\n)((.|\n){1,})(gen_id\()(\w{1,})((.|\r\n)*?)(?=end;)(.*)' = "ON `$2`$3INSTEAD OF INSERT`$3`$7`$8`$9`$10`$11`$12 `$3INSERT INTO `$2 ( ID, NAME, COMPANY_NAME ) `$3SELECT NEXT VALUE FOR dbo.`$14, ID, NAME, COMPANY_NAME FROM INSERTED `$3 --`$13`$14`$15`$16`$17"`
This way I can just copy/paste the result to my hashtable and run the PowerShell function on my triggers’ files.
All’s well that ends well
This made possible to automate, I would say, 95% of the work I have to do on each trigger object.
The 5% left are purposeful and intended to format the code using Redgate SQL Prompt (formatted the code using the great CTRL + K, Y
shortcut) once I open the script on SQL Server Management Studio and test the object compilation.
Other examples?
I applied the same recipe on scripts with user-defined data types (UDDT). I picked the system data type and created entries to the hashtable with the proper replace.
SELECT '''\b' + T2.name + '\b'' = "' + T1.name + CASE WHEN T2.system_type_id IN (231, 239) THEN '(' + CASE WHEN T2.max_length = -1 THEN 'MAX' ELSE CAST(T2.max_length / 2 AS VARCHAR(10)) END + ')' WHEN T2.system_type_id IN (175, 167) THEN '(' + CASE WHEN T2.max_length = -1 THEN 'MAX' ELSE CAST(T2.max_length AS VARCHAR(10)) END + ')' ELSE '' END + '"' FROM sys.types AS T1 INNER JOIN sys.types AS T2 ON T1.user_type_id = T2.system_type_id WHERE T2.user_type_id > 256;
Practical example:
UDDT named INT_VALUE
, that represents an INT, used in the following way CAST(column AS INT_VALUE)
needs to be replaced asCAST(column as INT)
.
Why we need to do this replace? I have written about it on my blog post Using CAST() function with User-Defined Data Types…Did you know… take a look.
Wrap up – Life saver and time saved
Leverage on SQL Server metadata to generate quick code and use it inside SQL Server or outside like I did it here!
I did some math and I have estimated that I would need several months to finish this task doing everything by hand.
Everything could go wrong doing that way:
- This is an heavely repetitive task
- highly prone to errors
- a witch hunt when problems arise
After all, with this approach I have made in less than one month! And, half of those days were to tweak the regular expressions/hashtable.
Automation? You gotta to love it!
Thanks for reading!