Many a time we need copies of an object but with different names. I also got a requirement to process the files in parallel using SSIS. We had already implemented the parallelism, but at the entity level. For example, if there are 5 entities such as Customer, Product, Order, Transaction, Balance etc. then we had a different thread for each entity. It means we had 5 parallel threads.
We were getting the multiple files against each entity. Each files were of 1 GB size and each entity could have hundreds of the files. In totality the package was expected to process thousands of the files. We were getting the benefits with the parallelism we had implemented, but the overall execution time was not satisfactory.
We decided to implement the parallelism even at the individual entity level. But there was a catch. Each entity was being processed in a loop container with set of steps being run in a sequence. The steps were:
- Truncate the staging table
- Data flow to load the flat file data to staging table
- Validation on the staging table
- Loading the data from staging to final table
- File archival
There was no choice for us, other than having the multiple copies (up-to the the number of parallel threads at individual level) of the staging tables, validation procedures, and the procedures to move the data from staging to final table. We could have used the dynamic SQL, but there are few limitations and performance overhead to do so.
We decided to create the desired copies of the tables and procedures, instead of using the dynamic SQL. But there was huge challenge of maintaining the multiple copies of the same object. Situation could worsen further if there are multiple entities. Even a small change in data type would require changes at multiple places. It was not feasible both from development as well as support and maintenance perspective.
I came up with a solution to programmatically create the desired copies of the objects such as tables, and procedures. We just need to maintain a single copy (model copy) of each object such as table, procedure, trigger, view, function etc.
Here is a simple T-SQL script to create copies of the existing programmability objects by changing the table name it refers. Since every parallel thread will have it’s own staging table. Validation procedure and procedure to move the data from staging to final table will also refer to the staging table dedicated for the thread. Except the table name all other things will remain same.
DECLARE @Model_Object_NameVARCHAR(100)
, @Model_Table_NameVARCHAR(50)
, @Object_NameVARCHAR(100)
, @Table_NameVARCHAR(50)
SELECT @Model_Object_Name='usp_Test'
, @Model_Table_Name='sys.objects'
, @Object_Name='usp_Test_Thread_3'
, @Table_Name='sys.tables'
DECLARE @ScriptVARCHAR(MAX)
SET @Script = REPLACE(REPLACE(OBJECT_DEFINITION(OBJECT_ID(@Model_Object_Name)), @Model_Table_Name, @Table_Name), @Model_Object_Name, @Object_Name)
IF OBJECTPROPERTY(OBJECT_ID(@Model_Object_Name), 'IsProcedure') = 1 EXECUTE ( 'DROP PROCEDURE IF EXISTS ' + @Object_Name );
IF OBJECTPROPERTY(OBJECT_ID(@Model_Object_Name), 'IsTrigger') = 1 EXECUTE ( 'DROP TRIGGER IF EXISTS ' + @Object_Name );
IF OBJECTPROPERTY(OBJECT_ID(@Model_Object_Name), 'IsScalarFunction') = 1 EXECUTE ( 'DROP FUNCTION IF EXISTS ' + @Object_Name );
IF OBJECTPROPERTY(OBJECT_ID(@Model_Object_Name), 'IsTableFunction') = 1 EXECUTE ( 'DROP FUNCTION IF EXISTS ' + @Object_Name );
IF OBJECTPROPERTY(OBJECT_ID(@Model_Object_Name), 'IsView') = 1 EXECUTE ( 'DROP VIEW IF EXISTS ' + @Object_Name );
IF @Script IS NOT NULL
BEGIN
EXECUTE ( @Script );
END
The definition of existing procedure usp_Test used in the script was as follows. It is solely for illustration of working of the above script.
CREATE procedure usp_test
AS
BEGIN
-- SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT o.name
, i.type_desc
, SUM(p.rows) AS TotalRows
, SUM(a.total_pages) * 8 AS TotalSpaceKB
, SUM(a.used_pages) * 8 AS UsedSpaceKB
, (SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM [SRVDEV01].master.sys.indexes i
INNER JOIN sys.objects o
ON i.object_id = o.object_id
INNER JOIN sys.partitions p
ON p.index_id = i.index_id
AND p.object_id = i.object_id
INNER JOIN sys.allocation_units a
ON p.partition_id = a.container_id
WHERE o.type_desc = 'USER_TABLE'
GROUP BY o.name
, i.type_desc
END
The query created a new procedure usp_Test_Thread_3 with definition as follows.
CREATE procedure usp_Test_Thread_3
AS
BEGIN
-- SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT o.name
, i.type_desc
, SUM(p.rows) AS TotalRows
, SUM(a.total_pages) * 8 AS TotalSpaceKB
, SUM(a.used_pages) * 8 AS UsedSpaceKB
, (SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM [SRVDEV01].master.sys.indexes i
INNER JOIN sys.tables o
ON i.object_id = o.object_id
INNER JOIN sys.partitions p
ON p.index_id = i.index_id
AND p.object_id = i.object_id
INNER JOIN sys.allocation_units a
ON p.partition_id = a.container_id
WHERE o.type_desc = 'USER_TABLE'
GROUP BY o.name
, i.type_desc
END
We had followed the discipline as far as the object, variable, temporary table naming is concerned. Hence we didn’t faced any challenge with this script. But if you’ve declared a variable or temporary table etc. with name having the supplied @model_table_name then you may face challenges.
The purpose of this article is not to help you with a ready-made script for all your use-cases, but instead to make you aware of the approach. You can modify or extend the script as your need.
This script will not work for the tables. If you need to create the copies of a table then you can use SELECT * INTO, or a dynamic SQL, or a custom logic like this (I’ve also used it) to get the create script for your table.