In the first post in this series, I covered the basics of object typed variables in SQL Server Integration Services, along with a brief examination of some potential use cases. In this installment, I’m going to illustrate the most common use of object typed variables in SSIS: using an object variable as an ADO recordset within a loop container to perform iterative logic.
Before we examine the how, let’s talk about the why. Although this is not a design pattern you’ll have to use every day, there are any number of cases that would lend themselves to building and using an ADO recordset enumerator:
- You need to create a series of export files – one per client – showing that client’s charges for a given period.
- You’re dealing with a very large set of data, and/or your processing hardware has limited resources. You want to explore breaking up the workload into smaller chunks to be processed serially.
- You are performing a data load operation, and want to design the package in such a way that the loaded data can be immediately used as a validation source in the same package execution.
For cases such as these (among others), using this design pattern can be an excellent way to address your ETL needs.
Design Pattern
At a high level, this design pattern will have three moving parts:
- A relational query used to populate the object variable (thus transforming its internal type into an ADO recordset)
- A For Each Loop container to loop through the list stored in this variable
- Some business logic for each value (or set of values) in each row of the object variable
Note that while the first two moving parts I mentioned will be relatively consistent from one package to another, the business logic component will, by nature, vary greatly from one package to another. For the purposes of this post, I’m purposefully keeping my business logic piece simple so as to not distract from the larger design pattern.
For my sample data, I’m going to deal with a data domain that is near and dear to my heart: baseball. In this case I want to get a list of all postseason baseball games, and for each game, create an export file detailing the at-bat statistics for that game. Because I don’t know at design time how many games will be played in the postseason, I can’t simply hard-code my outputs – I need design the ETL in such a way that the data will dictate, at runtime, the number of output files and their respective filenames.
Configuring and Populate the Object Variable
The first thing I’ll do in my demo package is set up an SSIS variable, giving it the data type of Object. As shown below, I’m using the SSIS variable named [GameList] as the object typed variable, which will store the ADO recordset list of playoff game IDs that should be processed. Also included is a variable specifying the directory to which the output files will be written, as well as a variable to store the individual game ID for each iteration of the loop.
Next up, I’m going to add an instance of the Execute SQL Task to the control flow of my package, typing in my query to select the IDs of the playoff games from the database. In the settings for this task shown below, you’ll also see in the highlighted portion that I’ve changed the behavior of the Result Set to use Full result set (remember the default is None, which would expect no data rows to be returned). By setting this behavior, I’m configuring the task to expect a result set to be returned.
When I configure the Result Set setting in this way, I also need to indicate where those results should end up – specifically, I have to indicate which object typed variable will store these results. In the Result Set tab of the same task, I’ll set the variable name to use the [GameList] variable I set up in the previous step. Also note that the result set name should always be 0 in this case.
What I’ve done here is quite simple, and required no code (other than the SQL statement, of course). What’s happening behind the scenes is a little more complex, however. At runtime when the Execute SQL Task is executed, the [GameList] variable will be instantiated as a new object of type ADO recordset. Note that this action will not change the data type shown in SSIS; even though the in-memory object will be configured as an ADO recordset, it will still show up as an object type variable in the designer. This ADO recordset object will then be loaded with the resulting records, if any, from the query I used in the Execute SQL Task.
Using the SSIS Variable as an Enumerator
My next step will be to consume that list, processing each game ID in turn to extract the data I need. To handle this, I’ll add a For Each Loop container to the control flow, and connect the previously configured instance of Execute SQL Task to this new container. When I configure the properties for the loop container, in the Collection tab I’m presented with several different options for the enumerator (the list that controls how many times the logic within the loop will be executed). Since I’m working from the ADO recordset list created in the previous step, I’m going to select Foreach ADO Enumerator, and use the variable drop down list to select the [GameList] object variable. I also set the Enumeration Mode to use Rows in the first table, which is the only option I can use when working with a ADO recordset (note that we have more options when working with an ADO.NET recordset, which I plan to cover in a future post).
With the collection tab set to use my object variable as an enumerator, I’ll next jump over to the Variable Mappings tab. It is on this tab where I will align fields in the record set with variables in the package. As shown below, I’m only expecting one column to be returned, and for each iteration of the loop, this value will be stored in the variable named [ThisGameID]. As you can see, I’m using index [0] to indicate the position of this value; if the record set is expected to return more than one column, I could add those in as additional column/variable mappings, using the ordinal position of each column to map to the proper SSIS variable.
With that done, I’ll add an instance of the Data Flow Task to the loop container configured above, which will complete the work on the control flow:
Configure the Business Logic in the Data Flow
Now it’s time to dive into the data flow I just created. Within that data flow, I’ll add a new OLE DB Connection component, the purpose of which will be to retrieve the at-bat statistics for each playoff game. To the output of that source, I will attach an instance of the Flat File Destination, which will be used to send each game’s data to the respective output file.
Within the data source, I need to configure the query such that it retrieves data for one and only one game at a time. Since the current game ID value is stored in the [ThisGameID] SSIS variable, I can simply map that variable as a query parameter, so that each execution of this SELECT query will limit the results to only include statistics for the current game ID. As shown below, I’m using a parameter placeholder (the question mark in the query) to indicate the use of a parameter:
… and when I click the Parameters… button, I can map the SSIS variable containing the game ID to that query parameter:
I have already configured an instance of the Flat File Destination (and by extension, set up the Flat File Connection Manager) to allow me to write out the results to a file, but how will I create a separate file per game? It’s actually quite easy: by using a simple SSIS expression on the ConnectionString property of the Flat File Connection Manager, I can configure the output file name to change on each iteration of the loop by using the game ID value as part of the file name. As shown below, I’m accessing the Expressions collection within my Flat File Connection manager, overriding the static value of the ConnectionString property with an interpreted value using the amalgamation of two variables – the directory location I specified earlier, along with the current game ID. Remember that since SSIS variables are evaluated at runtime, the value of the variables can change during execution, thus allowing the use of a single Flat File Connection Manager to write out multiple files during each package execution.
Finally, when I execute the configured package, I end up with a few dozen output files – one per playoff game. As a side note, my Texas Rangers were only represented in one of those playoff games from last year. We’ll get ‘em this year. As shown below, each output file is distinctified with the game ID as part of the file name.
Conclusion
Use of the SSIS object typed variable can be a very powerful tool, but it need not be complex. As shown in this example, we can easily leverage the object variable for iteration over a result set without writing a single line of programmatic code.
In the next post in this series, I’ll dig further into object typed SSIS variables, and will explore how to use and manipulate other types of objects not natively represented in SSIS.