In my most recent post in this series, I talked about how to use the SSIS object variable as an ADO recordset as a source in a data flow. By loading the result set of a query into this variable, the contents of the variable can be read by an SSIS script component and sent out through the SSIS pipeline.
In working up the example in the previous blog post, I wondered to myself: What happens if the source query returns more than one result set?
Multiple Result Sets with the Execute SQL Task
In reality, data retrieval queries are typically configured to return exactly one result set. If two different result sets are needed, two separate queries are built, each with its own source-to-target flow of data. However, I’ve seen more than a few queries (typically in the form of vendor stored procedures, which, of course, cannot be edited) which return more than one set of results in a single execution. Even some of the built-in SQL Server stored procedures do this (sp_help comes to mind). From wherever they come, any ETL process that consumes data generated by these types of queries or stored procedures must be configured to handle multiple result sets.
Thinking through the previous example, I wanted to experiment with what would happen if the query in an execute SQL task returns more than one result set. To test this, I created a stored procedure that purposefully returns two different result sets, as shown below:
Predictably, the stored procedure above returns two sets of results.
To continue the test, I’ve added an execute SQL task to the control flow, using the stored procedure above as the query. I set the ResultSet to Full Result Set.
Specifying a ResultSet value of Full Result Set requires the use of an object-typed variable to store the results.
Next, I’ll use a script component, configured as a source, to programmatically extract the information from the ADO recordset object. In my last post, I used an OleDbDataAdapter to fill a DataTable with the data from the object variable. In this case, I’ve changed that just a bit, instead populating a DataSet object which should allow us to capture more than one result set (since the DataSet can contain one or more DataTables). I’ve also added a second output, setting up the appropriate columns for the added result set. The snippet below shows the example of looping through each result set to build the data for each output. (Teaser: Before you try to execute this code, don’t execute this code. It won’t work. I’ll explain why below.)
// Set up the DataAdapter to extract the data, and the DataSet object to capture both output tables OleDbDataAdapter da = new OleDbDataAdapter(); DataSet ds = new DataSet(); // Extract the data from the object variable into the DataSet da.Fill(ds, Variables.vResults, "myTable"); // Process the first result set foreach (DataRow dr in ds.Tables[0].Rows) { // Create a new, empty row in the output buffer SalesOutputBuffer.AddRow(); // Now populate the columns SalesOutputBuffer.SalesOrderID = int.Parse(dr["SalesOrderID"].ToString()); SalesOutputBuffer.RevisionNumber = int.Parse(dr["RevisionNumber"].ToString()); SalesOutputBuffer.OrderDate = DateTime.Parse(dr["OrderDate"].ToString()); SalesOutputBuffer.ShipDate = DateTime.Parse(dr["ShipDate"].ToString()); SalesOutputBuffer.Status = int.Parse(dr["Status"].ToString()); SalesOutputBuffer.TotalDue = decimal.Parse(dr["TotalDue"].ToString()); } // Process the second result set foreach (DataRow dr in ds.Tables[1].Rows) { // Create a new, empty row in the output buffer SalesDetailOutputBuffer.AddRow(); // Now populate the columns SalesDetailOutputBuffer.SalesOrderID = int.Parse(dr["SalesOrderID"].ToString()); SalesDetailOutputBuffer.SalesOrderDetailID = int.Parse(dr["SalesOrderDetailID"].ToString()); SalesDetailOutputBuffer.OrderQty = short.Parse(dr["OrderQty"].ToString()); SalesDetailOutputBuffer.LineTotal = decimal.Parse(dr["LineTotal"].ToString()); }
When I update the script component source with this code and execute the package, it fails with the following error:
This scripting error is a bit cryptic, but it tells me all I need to know: “Cannot find table 1”. The second code block, which attempts to iterate over ds.Tables[1].Rows, fails because there is no table at index 1 (the second table, or what should be the second result set). I can infer from this that the execute SQL task has loaded only the first result set into the SSIS object variable, even though the stored procedure returned two different result sets. If I comment out the section that attempts to process the second table, the package runs fine and loads just the data from the first result set.
So if the execute SQL task does work for this scenario, how exactly does one retrieve data from a stored procedure or query which produces multiple result sets? The easiest method, as it turns out, is to use the script component to connect directly to the database and execute the query.
Handling Multiple Result Sets in the Script Component
All other things being equal, the simplest approach is usually the best. In this case, I can eliminate the middleman entirely (the execute SQL task) and connect directly to the source database from within the script component, retrieving the necessary data from all of the result sets. In testing this solution, I came across an excellent article written some years back by Bob Pearson, in which he describes and demonstrates (in VB.NET) the solution to the multiple result set quandary. With Bob’s permission, I have adapted the design pattern he described to C#.
In this pattern, the entire source operation – querying the source database and splitting out the result sets into their appropriate outputs – is handled in the script component. I should point out that this example uses the ADO.NET connection manager rather than the OleDB connection manager. Although it is technically possible to access the OleDB connection manager in script, the syntax to do so is a bit quirky, so I’ll keep it simple and use ADO.NET.
#region Namespaces using System; using System.Data; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using Microsoft.SqlServer.Dts.Runtime.Wrapper; #endregion // Add in the appropriate namespaces using System.Data; using System.Data.SqlClient; [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent { public override void CreateNewOutputRows() { // Use the existing package connection manager IDTSConnectionManager100 connMgr = Connections.AdventureWorksSource; SqlConnection conn = (SqlConnection)connMgr.AcquireConnection(null); // Execute the stored procedure that will generate the two result sets SqlCommand cmd = new SqlCommand("EXEC [dbo].[usp_GetSalesData]", conn); SqlDataReader reader = cmd.ExecuteReader(); while (reader.Read()) // Read() will return false once we've processed all rows in the first result set. { // Create a new, empty row in the output buffer SalesOutputBuffer.AddRow(); // Now populate the columns SalesOutputBuffer.SalesOrderID = int.Parse(reader["SalesOrderID"].ToString()); SalesOutputBuffer.RevisionNumber = int.Parse(reader["RevisionNumber"].ToString()); SalesOutputBuffer.OrderDate = DateTime.Parse(reader["OrderDate"].ToString()); SalesOutputBuffer.ShipDate = DateTime.Parse(reader["ShipDate"].ToString()); SalesOutputBuffer.Status = int.Parse(reader["Status"].ToString()); SalesOutputBuffer.TotalDue = decimal.Parse(reader["TotalDue"].ToString()); } // The NextResult() function moves us to the next result set reader.NextResult(); // Process the second result set while (reader.Read()) { // Create a new, empty row in the output buffer SalesDetailOutputBuffer.AddRow(); // Now populate the columns SalesDetailOutputBuffer.SalesOrderID = int.Parse(reader["SalesOrderID"].ToString()); SalesDetailOutputBuffer.SalesOrderDetailID = int.Parse(reader["SalesOrderDetailID"].ToString()); SalesDetailOutputBuffer.OrderQty = short.Parse(reader["OrderQty"].ToString()); SalesDetailOutputBuffer.LineTotal = decimal.Parse(reader["LineTotal"].ToString()); } } }
After making this change to bypass the SSIS object variable method and simply query the database directly from the script component source, the script component source runs successfully. I can then send the two outputs, each containing a distinct result set from the source stored procedure, to their respective output tables.
Tl;dr: In the rare cases when you have to process a query with multiple result sets, skip the SSIS variable and use a direct connection through the script component source.
Thanks again to Bob Pearson for allowing me to adapt his VB.NET code into C# for this example.
The post The SSIS Object Variable and Multiple Result Sets appeared first on Tim Mitchell.