Building DTS Packages with a Perl Script
Moving data from one SQL Server to another sounds like a fairly easy process. Microsoft is kind enough to supply things like DTS to make these tasks simple. On the surface, DTS provides a nice interface through Enterprise Manager called DTS Designer to allow someone to construct powerful data Extraction, Transformation and Loading (ETL) processes. As always, there’s a little more "under the hood" than what is exposed through DTS Designer. By coding with the DTS object model, you can create extremely powerful and portable ETL processes that can be executed fairly simply. With scripting, you can build the packages dynamically from a shell script, and execute them by passing any number of command line flags.
I like Perl. Perl is a simple "glue" language that brings together the best of many of the compiled languages such as Pascal and C. ActiveState ActivePerl for Windows is a Perl interpreter that interfaces with ActiveX so that Perl scripts can be written to perform any of the functions that can be performed with VBScript, CScript, ASP, etc. Therefore using ActivePerl, you can create and execute DTS packages just as you can with any other ActiveX scripting language. Below is a fairly easy to understand script written in Perl to do just that. I will go through the script in detail from defining the package objects and properties, to package execution, and finally collecting errors.
In the Beginning...
use strict; use Win32::OLE; use Getopt::Std; use Win32::OLE::Const 'Microsoft DTSPackage Object Library'; use Win32::OLE::Const 'Microsoft DTS Custom Tasks Object Library'; use Win32::OLE::Const 'Microsoft DTSDataPump Scripting Object Library'; use Win32::OLE::Variant; my $Err = 0; my $ErrMsg = "";
The beginning of the Perl script sets the "ground rules" for the rest of the script. It identifies any packages and libraries that will be referenced by the rest of the script. Of course, you can do this pretty much throughout your script, but you want to make things easy for your maintenance programmer (or in my case, me a year later trying to figure out what I was trying to do!).
The "use strict;" line tells Perl that we want to use strict variable declarations. This is comparable to the "Option Explicit" option in VB and VBScript. It’s good coding practice to identify the variables that you will use throughout your code up-front. This makes things a little easier on the maintenance programmer. We also indicate that we want to use the Win32::OLE object library, making COM object and variable declaration and use possible in the rest of the script. We use Getopt::Std, which is a library that comes with Perl to parse the command line (i.e. perl dtsmov.pl -SsourceServer where -S is the argument). We define what constants that we will reference throughout our code, making references to Microsoft defined flags for objects easier to code and debug. For DTS packages, three libraries contain constants we will use; the 'Microsoft DTSPackage Object Library', the 'Microsoft DTS Custom Tasks Object Library', and the 'Microsoft DTSDataPump Scripting Object Library'. In this particular example, we really only use constants from one of these libraries, however I have included them all here in case you want to modify my sample and add additional features. We use the Win32::OLE::Variant to define "hard coded" variant types, since Perl does not pre-declare the variable type. Finally, we set the error handlers to their default values to be used throughout the code.
#Set the arguments my (%Args, $Args); getopts('EU:P:S:D:O:s:d:o:', \%Args); my $SourceServer= $Args{S}; my $SourceDb = $Args{D}; my $SourceObj = $Args{O}; my $DestServer = $Args{s}; my $DestDb = $Args{d}; my $DestObj= $Args{o}; my $UserId= $Args{U}; my $Password = $Args{P};
Here, we use the "getops" function from the Getopt::Std library to grab the arguments from the command line. I usually try to follow the Microsoft convention for command line arguments, and use arguments like -E to indicate integrated (Windows) security, or -U and -P for passing the SQL Server login and password. The rest of the arguments are pretty self explanatory. Note the arguments are CASE SENSITIVE, as Perl is a case-sensitive language like C, C++, or C#.
Creating the Package Object
#Create the package my $Package = new Win32::OLE 'DTS.Package'; $Package->{Name} = "DataMover"; $Package->{FailOnError} = 1;
Here we create the instance of the DTS.Package object. The next two lines set object parameters such as the Name of the package and FailOnError, which stops the package on the first error encountered. Pretty much all Win32::OLE COM object parameters are assigned in this way through Perl.
Adding Connections to the Package
#Create Connection objects and add them to the package my $Conn = $Package->Connections->New("SQLOLEDB"); $Conn ->{Name} = "Source"; $Conn ->{ID} = 1; $Conn ->{DataSource} = $SourceServer; $Conn ->{Catalog} = $SourceDb; if ($Args{E}) { $Conn->{UseTrustedConnection} = 1; } else { $Conn->{UserID} = $UserId; $Conn->{Password} = $Password; } $Package->Connections->Add($Conn); $Conn=undef; $Conn= $package->Connections->New("SQLOLEDB"); $Conn->{Name} = "Destination"; $Conn->{ID} = 2; $Conn->{DataSource} = $DestServer; $Conn->{Catalog} = $DestDb; if ($Args{E}) { $Conn->{UseTrustedConnection} = 1; } else { $Conn->{UserID} = $UserId; $Conn->{Password} = $Password; } $Package->Connections->Add($Conn); $Conn=undef;
This may look like a lot of code, but what it does is fairly simple. We create connection objects, set the necessary parameters, and add them to the package object using the Add() method. You will notice that we set the security used by each connection by simply modifying the parameters based on the command line arguments passed. The connections themselves are SQLOLEDB connections (SQL Server) but you can define these however you would like. However to use IAFastLoad (which we will see next) the destination must be a SQLOLEDB connection.
Defining the Task Object
#Create the DataPumpTask with IAFastLoad enabled (blazing fast) my $Task = $Package->Tasks->New("DTSDataPumpTask"); $Task ->{Name} = "CopyColumns"; my $CustomTask = $task->{CustomTask}; $CustomTask ->{SourceConnectionID} = 1; $CustomTask ->{SourceObjectName} = $SourceObj; $CustomTask ->{DestinationConnectionID} = 2; $CustomTask ->{DestinationObjectName} = $DestObj; $CustomTask ->{UseFastLoad} = 1; $CustomTask ->{FastLoadOptions} = 1;
Here we define the task and a custom task. The custom task is a special type of parameter that is an object itself. Therefore we create this object slightly differently. This will all roll up to the package in a moment. The task is defined as a DTSDataPumpTask, which is a basic table-to-table copy. We define the source and destination connection IDs (set when we defined the connection above), and passed the object variables to indicate which tables and views to copy from and to.
We also set the UseFastLoad and FastLoadOptions to true, indicating to the SQLOLEDB provider that we want to use IAFastLoad. IAFastLoad is a provider-level option that basically places the connection into "write only" mode. The connection is optimized to write to SQL Server (It knows its not returning a record set). The net result is that the SQL Server OLE provider writes data as fast as it can. BCP and BULK INSERT also use this type of connection to SQL Server. The only other way to do this type of a write to SQL Server is to compile your own OLE DB provider using C++. Of course, Microsoft was kind enough to provide sample code to do this. DTS has this capability "built in" so it can switch between the normal and the IAFastLoad method.
Defining the Transformation
#Create the Transformation and add it to the task my $Transform = $CustomTask ->Transformations->New("DTSPump.DataPumpTransformCopy"); $Transform ->{Name} = "TransformData"; $Transform ->{TransformFlags} = DTSTransformFlag_AllowLosslessConversion; $CustomTask ->Transformations->Add($Transform);
Defined here is the actual data transformation that will be performed. We can add individual column mappings by creating new column objects and adding them to the transformation object. Here, however, I have not indicated any columns; therefore DTS naturally assumes I am doing a one-to-one table mapping of columns.
Here, you can also see how a constant is passed as a parameter (DTSTransformFlag_AllowLosslessConversion). This particular constant sets a flag to indicate how DTS will handle the conversion of the data from the source to the destination. A list of these constants can be found at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dtsprog/dtspconst_6m5v.asp or in the SQL Server Books Online under the topic DTSTransformFlags in the DTS programming section.
Finally, we add the transformation to the custom task object using the familiar Add() method.
Adding the Task Object and Defining the Steps
#Add the task to the package $Package->Tasks->Add($Task); $Task=undef;
After it’s all said and done, we finally add the task to the package.
#Create the package step and add it to the package my $Step = $Package->Steps->New(); $Step->{Name} = "Step1"; $Step ->{TaskName} = "CopyColumns"; $Step ->{ExecuteInMainThread} = 1; $Package->Steps->Add($Step); $step=undef;
Here we create the step or steps that will define the precedence and order of each task in the package. Since we only have one task defined, naturally there is only one step. We also indicate to DTS that we want to execute the package in the main thread. This way, DTS will run in the memory space of the executing process, which is helpful in a shell scripting environment.
Running the Package
#Execute the package $Package->Execute(); print Win32::OLE->LastError ."\n";
Executing the package is a snap. Just call the Execute() method directly. I also threw in a printout of Win32::OLE->LastError to return the last error encountered up to this point. You can use this throughout your code when debugging OLE errors. Additionally you can run Perl with the -w flag, which tells Perl to run with warnings turned on. You can also do this in your code by placing a "use warnings;" statement in the code itself. In this way, any OLE errors will be printed out when the script runs.
Error Handling and Exiting
#Check for errors ($ErrMsg, $Err) = &StepErrors; if ($Err == -1) { print "$ErrMsg\n"; } else { print "Package Completed Successfully\n"; } $Package=undef; exit $Err;
Error checking is a must. DTS has many moving parts, and any one of those parts can fail for many reasons. Here we tell Perl to execute a subroutine called StepErrors (seen below) and return the values from that subroutine call to the $ErrMsg and $Err variables. Once we collect all the errors, we print any errors and tell the package to unload from RAM (good programming practice). We also return the integer for $err using the exit command so that proper error handling at the command line is established.
#Loop through the steps and look for errors based on the ExecutionResult sub StepErrors { my $ErrorCode=Variant(VT_I4|VT_BYREF, "-1"); my $Source=Variant(VT_BSTR|VT_BYREF, ""); my $Description=Variant(VT_BSTR|VT_BYREF, " "); foreach my $Steps (in {$Package->{Steps}}) { if ($Steps->{ExecutionStatus} == 4) { if ($Steps->{ExecutionResult} == 1) { $Err = -1; $Steps->GetExecutionErrorInfo($ErrorCode,$Source,$Description); $ErrMsg = "$ErrMsg \n Step $steps->{Name} failed, error:\n $ErrorCode \n $Description \n"; } } } return $ErrMsg, $Err; }
This is the StepErrors subroutine. This subroutine collects all of the error messages from each step in the package and prints them to the command line screen (STDOUT). The GetExecutionErrorInfo method is special in that it expects variables passed by reference. The method changes the variables passed to it by reference, therefore the variables must be pre-defined data types. We define these variables above using the Variant method exported by the Win32::OLE::Variant Perl library. For example, my $ErrorCode=Variant(VT_I4|VT_BYREF, "-1"); tells Perl to create a variable called $ErrorCode and set its type to VT_I4 (Long), create a reference to the variable, and assign it a dummy value of -1. This variable is then passed, along with the $Source and $Description references to the GetExecutionErrorInfo method. This method reassigns the values dynamically to the references as its return result.
Conclusions
Microsoft DTS provides a powerful platform for data Extraction, Translation, and Loading (ETL). The DTS Object Model allows you to "fine tune" particular aspects of DTS. The advantage of building DTS packages with a scripting language such as Perl is that you can quickly view properties, make changes, and execute the package without recompiling source code. ActiveState Perl provides additional features such as calling variables by reference so that certain OLE method calls can properly return values by changing these variables.
Using operating system scripts (in this case, Perl) and DTS to quickly move table data over to another server has several advantages. One advantage can be seen in terms of disk I/O and storage. The data is only read once, and then the data is written to the target server only once. No flat files or backups are needed. Linked servers can do this as well, however the IAFastLoad capability that DTS offers is not available through a linked server. Larger sets of data require something a little more "beefy". Another advantage is that the DTS package properties can be clearly seen in the Perl script, therefore they can be quickly changed.
Another advantage is that any package property can be clearly seen and altered in the Perl script code. Since the actual package is being built by the script, any of DTS package properties can be easily changed by altering the script with a text editor. DTS properties can also be assigned to a value passed in by a command line flag (or by any other mechanism such as an ini file, a driver table, an XML file, etc.) at runtime. The DTS Designer in SQL Server Enterprise Manager allows you to set runtime variables as well, however the process of defining these variables can be somewhat clumsy. Perl scripts provide rather elegant alternatives in these cases.
In my experience, this passes the "2:00 AM test". Typically large data sets of production data are moved around during a time of low end-user activity. Production support DBAs already know the value of diagnosing and altering code with a simple text editor in the middle of the night. In my particular case, I am required to provide both production and developer support. Therefore my code has to be able to be easily read, diagnosed, and changed at any time by staff with varying degrees of expertise. My personal expertise tends to waver at 2:00 AM. Especially when the sound of the pager is still ringing in my ears and I'm still bleary eyed from being so rudely awakened!