In computer science, a semaphore is a variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment. As long as we are working with a RDBMS ( Relational DataBase Management System), the simplest way to implement a semaphore is to create a specific table and to use the lock mechanism, provided by the RDBMS, on the whole semaphore table or on a row of the semaphore table.
This article, Implementing a T-SQL semaphore, shows how a semaphore is raised inside a T-SQL script. Now we can show how to use this principle in SSIS. We are going to show how to acquire a semaphore before running a Data Flow task and how to release it after the Data Flow is completed .
First, let us create the needed tables :
-- Create semaphore table drop_fi_semaforo use dwhsk_stage_prod go drop table drop_fi_semaforo create table drop_fi_semaforo ( uno char(1) , constraint pk_drop_fi_semaforo primary key clustered ( uno) on dwhsk_stage_data ) on dwhsk_stage_data /* The table will contain 3 semaphores, but for our example one will be enough */insert into drop_fi_semaforo values ( '1') insert into drop_fi_semaforo values ( '2') insert into drop_fi_semaforo values ( '3') -- Create table to perform a long running bulk insert on USE [DWHSK_TARGET_PROD] GO -- drop table [drop_fi_POP_PROVA] CREATE TABLE [dbo].[drop_fi_POP_PROVA]( [Event_ID] [bigint] NOT NULL, [Create_Time] [datetime] NOT NULL, [Current_Flag] [bit] NULL, [File_ID] [int] NULL, [Network_Start_Time] [datetime] NOT NULL, [Start_Time] [datetime] NOT NULL, [Contract_ID] [int] NOT NULL, [Customer_ID] [int] NOT NULL, [Target_Customer_ID] [int] NOT NULL, [Network_ID] [smallint] NOT NULL) ON [PS_PORAEV]([Network_Start_Time]) WITH ( DATA_COMPRESSION = PAGE )
Next we create a new SSIS package. The SSIS package and its components must have the property "Isolation Level" set to "Serializable".
We add to the package a Sequence Container, named for example SC, with the TransactionOption property set to "Required". We then add into the Sequence Container a new ExecuteSQLTask named GetSemaphore, with TransactionOption=Required, that tries to acquire the semaphore using this TSQL statement:
/* Acquiring the semaphore means simply putting a lock on a specific row of table drop_fi_semaforo : no data is modified. */begin tran select * from drop_fi_semaforo with ( rowlock, UPDLOCK, holdlock) where Uno = '1'
We then add a DataFlow Task that must have the property TransactionOption set to "NotSupported". This task performs a long running bulk insert, into table drop_fi_POP_PROVA, in a separate session and a separate transaction from the session that holds the lock.
When the DataFlow completes, an ExecuteSQLTask named Release Semaphore, with property TransactionOption set to Required, resumes the transaction opened by GetSemaphore and releases the semaphore simply issuing a commit.
The SSIS package will look like this way :
To display the locks during the running of the package, we can use the following T-SQL statement:
/* The semaphore table is named drop_fi_semaforo. The table, drop_fi_POP_PROVA, is the table on which runs the long running bulk insert performed in DataFlow */select o.name table_name , l.request_mode lock_type , l.request_type , l.request_session_id , coalesce(tt.transaction_id , tt2.transaction_id ) transaction_id , coalesce(tt.database_transaction_log_bytes_reserved , tt2.database_transaction_log_bytes_reserved ) transaction_log_use_bytes from sys.dm_tran_locks l join ( select * from ( select * from dwhsk_stage_prod.sys.objects union ALL select * from dwhsk_target_prod.sys.objects ) oo where name in ( 'drop_fi_semaforo' , 'drop_fi_POP_PROVA' ) ) o on l.resource_associated_entity_id = o.object_id left join sys.dm_tran_session_transactions t on l.request_session_id = t.session_id and l.request_session_id > 0 left join sys.dm_tran_database_transactions tt on t.transaction_id = tt.transaction_id and tt.database_id = l.resource_database_id and l.request_session_id > 0 left join sys.dm_tran_database_transactions tt2 on tt2.transaction_id = request_owner_id
The locks when Get Semaphore has completed and Data Flow has not begun will be :
table_name | lock_type | request_type | request_session_id | transaction_id | transaction_log_use_bytes |
drop_fi_semaforo | IX | LOCK | 80 | 202 | 0 |
The locks when Get Semaphore has completed and Data Flow is running will be :
table_name | lock_type | request_type | request_session_id | transaction_id | transaction_log_use_bytes |
drop_fi_POP_PROVA | IX | LOCK | 83 | 201 | 643050 |
drop_fi_semaforo | IX | LOCK | 80 | 202 | 0 |
We see that the session of the DataFlow runs in a different transaction from the transaction that is holding the semaphore, and that holding the semaphore does not use any log space.
When the package finishes to run, all locks will be released.
In conclusion, we have showed how to implement a semaphore that can be acquired by a generic DataFlow of a SSIS package, using a SequenceContainer appropriately designed.