When we deploy code to SQL Server we do it through either one or more T-SQL scripts. Whether these scripts are manually created or automatically generated using a tool like SSDT or SQL Compare we should make sure that where possible the scripts are re-runnable or easily re-generated. What we gain when scripts are re-runnable is the ability to remove a lot of uncertainty from releases and the ability to quickly recover from failure or mistake and it makes the job of actually performing a release easier on the person who does it.
Idempotent
The first thing that we need to do is to start thinking about writing scripts so that they are idempotent which means that no matter how many times you do something you always end up with the same result.
This is an example of an idempotent script in T-SQL:
if object_id('abc.def') is not null begin drop procedure abc.def end GO create procedure abc.def as select 1 GO
What happens is that we check if the stored procedure exists and if it does then we drop it so when we get to the create statement we know that it will succeed (given that the code compiles).
Types of objects
There are different types of objects in SQL Server and different types of changes we can make to them. Top level objects such as tables or stored procedures all have an entry in sys.objects and return a valid object_id when you call object_id in the database that contains them. Other changes such as adding a column to a table cannot be verified using object_id as it is not a top level object.
To handle non-top level objects we need to query the database to see if the object exists and we should really be using the INFORMATION_SCHEMA views.
For example if we want to see if a column called last_name exists on the person table and add the column if it does not exist we can do:
if not exists( select * from INFORMATION_SCHEMA.COLUMNS where TABLE_NAME = 'person' and COLUMN_NAME = 'last_name') begin alter table person add last_name varchar(200) null end
Further if we want to change the definition of a column then we can check that to see if we need to do it or not:
if not exists( select * from INFORMATION_SCHEMA.COLUMNS where TABLE_NAME = 'person' and COLUMN_NAME = 'last_name' and CHARACTER_MAXIMUM_LENGTH = 255) begin alter table person alter column last_name varchar(255) null end
If we use this verification before making any modifications then we ensure that we can keep re-running the script and we will get the same results and when you are under pressure knowing that you can re-run any script is a benefit that you only really appreciate when you are under pressure.
Manual Scripts / Automatic Scripts
When we use tools to generate scripts anything that the tool generates such as create / alter steps can be regenerated quickly using those tools so if you have deployed half of a script generated with SQL Compare then the best thing to do is to run another compare and then run the updated script. Even if you use a tool to generate your deployment scripts then you may well still be using a T-SQL script to deploy your reference data.
What is Reference Data?
This is the data that is in your database that is not part of the business data, the valuable data. The reference data is typically used to ensure referential integrity so if you have an employee_type table with an employee_id of 1 with an employee_type of 'Full Time''. Having reference data means we can ensure referential integrity and often should match the same values that exist in code enums.
How can we ensure Reference Data is Idempotent?
The simplest way to ensure that data deployment scripts is to use what is arguably the most complicated DML statement, the MERGE statement. If we look at our employee_type table:
create table hr.employee_type( employee_type_id int identity(1,1) not null, employee_type varchar(12) not null, constraint [pk_employee_type] primary key clustered (employee_type_id) )
To make deployments to this table idempotent, I would suggest removing the identity on the employee_type_id especially when you need to match the same id that exists in the application code. If you do not remove it then you need to identity insert on and off again when you have finished and this now means unless you take some sort of lock on the table or some other object you cannot run two deployment scripts at the same time as you could get into a classic race condition such as:
- Script 1 - set identity insert on
- Script 1 - run data modification
- Script 2 - set identity insert on
- Scrtipt1 - set identity insert off
- Script 2 - run data modification <-- this will fail
If we remove the identity for this column and because we are inserting known values we really do not need the identity column the table definition becomes:
create table hr.employee_type( employee_type_id int not null, employee_type varchar(12) not null, constraint [pk_employee_type] primary key clustered (employee_type_id) )
Of course to remove the identity from an existing system we can do it in an idempotent way:
begin tran --does the column exist with an identity column if exists (select * from sys.columns where object_id = object_id('hr.employee_type') and name ='employee_type_id' and is_identity = 1) begin --add temp column - in a separate batch so the 2nd batch compiles alter table hr.employee_type add type_id_temp int null; end go --has the temp column been created? if exists(select * from sys.columns where object_id = object_id('hr.employee_type') and name ='type_id_temp') begin exec sp_executesql N'update hr.employee_type set type_id_temp = employee_type_id;'; --if the primary key is still there then remove it if exists(select * FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS where TABLE_NAME = 'employee_type' and CONSTRAINT_NAME = 'pk_employee_type') begin alter table hr.employee_type drop constraint [pk_employee_type]; end --re-add constraint on out new column, make it not null first if exists(select * from [INFORMATION_SCHEMA].[COLUMNS] where TABLE_NAME = 'employee_type' and COLUMN_NAME = 'type_id_temp' and IS_NULLABLE = 'YES') begin alter table hr.employee_type alter column type_id_temp int not null; end --drop the original identity column if 2 = (select count(*) from [INFORMATION_SCHEMA].[COLUMNS] where TABLE_NAME = 'employee_type' and COLUMN_NAME in ('type_id_temp', 'employee_type_id')) begin alter table hr.employee_type drop column employee_type_id; end --rename the temp columnt to the original name if exists (select * from [INFORMATION_SCHEMA].[COLUMNS] where TABLE_NAME = 'employee_type' and COLUMN_NAME = 'type_id_temp') begin exec sp_rename 'hr.employee_type.type_id_temp', 'employee_type_id', 'COLUMN'; end --put the primary key constraint back on if not exists(select * FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS where TABLE_NAME = 'employee_type' and CONSTRAINT_NAME = 'pk_employee_type') begin alter table hr.employee_type add constraint [pk_employee_type] primary key clustered (employee_type_id); end end commit tran
Now we have our table without the identity constraint we can use a MERGE statement without having to enable identity inserts:
SET NOCOUNT ON MERGE INTO
.[employee_type] AS Target USING (VALUES ('Full Time',1) ,('Contractor',2) ,('Part Time',3) ) AS Source ([employee_type],[employee_type_id]) ON (Target.[employee_type_id] = Source.[employee_type_id]) WHEN MATCHED AND ( NULLIF(Source.[employee_type], Target.[employee_type]) IS NOT NULL OR NULLIF(Target.[employee_type], Source.[employee_type]) IS NOT NULL) THEN UPDATE SET [employee_type] = Source.[employee_type] WHEN NOT MATCHED BY TARGET THEN INSERT([employee_type],[employee_type_id]) VALUES(Source.[employee_type],Source.[employee_type_id]) WHEN NOT MATCHED BY SOURCE THEN DELETE; GO
The alternative to using a MERGE statement would be to have 3 separate statements, one to insert any rows that are missing, one to update all the rows to make sure the names are the same - perhaps you want to rename "Full Time" to "Permanent" and a third statement to delete any rows that have been removed from your code. You would also need to do something to make sure that the statements were idempotent so either use if exists or join the data in the script against the live table but it is quite a lot of work to so I would recommend using the MERGE statement.
To help create the MERGE statements you can use the stored procedure sp_generate_merge which is on github: https://github.com/readyroll/generate-sql-merge/blob/master/master.dbo.sp_generate_merge.sql
Writing deployments scripts or making changes to SQL Server databases is often considerably simpler for the person or tool doing the release if they can re-run them or at least re-generate a new script that can be re-run.
Get in the habit of making your scripts idempotent and you will set yourself down a path of more reliable deployments.