Anyone who has used SSIS error handling has been frustrated that it returns an Error Code with no description. I decided that I needed a solution. Googling the problem led me to this blog by Jamie Thomson which had a link to a Script Component that gets the Error Description. Excellent! While this was I good start, I also wanted to see the data that caused the error. I started by including the data and outputting to a text file, but, being a database pro, this did not sit well with me. I started looking for a solution that would allow me to store the data in a table and be useable in each dataflow task, and, once again, the internet provided a solution, this article by jwelch on agilebi.com which I highly recommend reading.
My solution is uses the Input0_ProcessInputRow method and reflection, as shown by jwelch’s article, to loop through the columns, building an XML string (name-value pairs) which I insert into a table. I can then query the table to see the errors and the data. I can then verify that I fixed the errors from earlier loads.
Here is the solution:
First, I built this table in my SQL Server DB:
CREATE TABLE [data_xfer].[load_errors] ( [load_error_id] [int] IDENTITY(1,1) NOT NULL, [package_run_date] [smalldatetime] NULL CONSTRAINT [DF_load_errors_error_date] DEFAULT (getdate()), [error_task] [varchar](100) NULL, [error_step] [varchar](100) NULL, [error_code] [int] NULL, [error_desc] [varchar](100) NULL, [error_details] [xml] NULL, CONSTRAINT [PK_load_errors] PRIMARY KEY NONCLUSTERED ( [load_error_id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
Then, whenever configuring error output, add a script component following these steps:
1. On the Input screen add the columns that could be the source of the error
2. On the Inputs and Outputs screen add the following columns to the Output:
a. ErrorStep string(100) – this is going to be the Name of the Script Component which I name using the step which is the source of the error
b. ErrorDataDetails Unicode String(4000) – this is going to be an XML string that will have the data from the Input Columns and will be inserted into an XML Type column in SQL Server
c. ErrorDesc string(100) – The error description gotten from the ErrorCode
d. ErrorTask string(100) – The SSIS Task that the Step is in. In my case it is a DataFlow Task. This is retrieved using the TaskName package variable that is created by SSIS.
e. PackageTime Date – The start time of the package. This is retrieved using the StartTime package variable.
3. On the Script Screen add the following to the ReadOnlyVariables:
a. TaskName – a default SSIS variable that has the task name
b. StartTime – a default SSIS variable that is the start time of the package
This completes the setup so we can now Design the script in Visual Studio for Applications. Here is the script I am using:
ImportsSystem ImportsSystem.Data ImportsSystem.Math ImportsMicrosoft.SqlServer.Dts.Pipeline.Wrapper ImportsMicrosoft.SqlServer.Dts.Runtime.Wrapper ' added this import ImportsSystem.Reflection PublicClass ScriptMain Inherits UserComponent Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) Dim column As IDTSInputColumn90 Dim rowType As Type = Row.GetType() Dim columnValue As PropertyInfo Dim strErrorDetails As String = "<fields><field " Dim strAttributeName As String Dim strValue AsString ' Get the error description Row.ErrorDesc = ComponentMetaData.GetErrorDescription(Row.ErrorCode).Trim() ' This actually gets the script component name Row.ErrorSource = ComponentMetaData.Name.Trim() ' get the task name and the package start time Row.ErrorTask = Variables.TaskName.Trim() Row.PackageTime = Variables.StartTime Try ' loop through the columns in the input to create a generic xml doc For Each column In ComponentMetaData.InputCollection(0).InputColumnCollection ' remove illegal xml characters in the name If column.Name.IndexOf(".") > -1 Then strAttributeName = column.Name.Replace(".", "") Else strAttributeName = column.Name End If If strAttributeName.IndexOf("_") > -1 Then strAttributeName = strAttributeName.Replace("_", "") End If If strAttributeName.IndexOf(" ") > -1 Then strAttributeName = strAttributeName.Replace(" ", "") End If ' Skip the Error Columns IfNot (strAttributeName.Contains("ErrorCode") Or strAttributeName.Contains("ErrorColumn")) Then ' get the value for the column columnValue = rowType.GetProperty(strAttributeName) Dim objColumn As Object = columnValue.GetValue(Row, Nothing) ' if the value is null set to empty string If IsNothing(objColumn) Then strValue = String.Empty Else strValue = objColumn.ToString() End If ' append the node to the xml string strErrorDetails = strErrorDetails & "name=" & _ ControlChars.Quote column.Name.Trim & ControlChars.Quote & _ " value=" & ControlChars.Quote & strValue & ControlChars.Quote & _ " /><field " End If Next 'finish the xml string strErrorDetails = strErrorDetails.Substring(0, strErrorDetails.Length - 7) & _ "</fields>" Catch ex As Exception ' if an error occurs log it and keep processing instead of killing the load strErrorDetails = "<fields><field name=" & ControlChars.Quote & "error" & _ ControlChars.Quote & " value=" & ControlChars.Quote & ex.ToString.Trim & _ " " & strAttributeName.Trim & ControlChars.Quote & "/></fields>" End Try ' add the xml output to the output row Row.ErrorDetails = strErrorDetails End Sub EndClass
One way I have used this data is to verify I did successfully load one of a group of duplicate records. For example, in my source system I may have a duplicate record for Joe Smith. The records would consist of:
Person No. | First Name | Middle Name | Last Name | Birth Date |
000001 | Joe | David | Smith | 1/17/1967 |
100001 | Joe | David | Smith | 1/17/1967 |
As part of my process I have created a table with all duplicates and assign one of the person no.’s as the primary person no. for the transfer. When loading the addresses for this person I will likely get at least 1 for each record meaning my second address insert will fail, violation of data integrity rules. I can use my load_errors table to find that yes I did get each address for Joe Smith loaded once.
Here is an example of how I query the error data:
Select L.error_details, L.package_run_date, L.error_task, L.error_step, L.error_code, L.error_desc, P.person.value('@value','char(6)') as error_person_id from data_xfer.load_errors L Cross Apply error_details.nodes('.//field[@name="personno"]') as P(person)
I do need to write specific queries based on the task that created the error as while the XML generated is a generic name – value document the names do vary.
It is a simple solution that logs errors and data. The meat of the code is not original, but found from the resources noted.
Here are some areas where it can be improved:
- If there is binary data in the row the XML output has “System.Byte[]” as the value.
- Stripping illegal characters from the XML. I am sure this can be handled better.
I welcome any comments or suggestions on how to improve the process.
References:
SSIS Junkie – SSIS Nugget: Get Error Descriptions - http://blogs.conchango.com/jamiethomson/archive/2005/08/08/1969.aspx
BI Thoughts and Theories – Address Columns Generically in a Script Component - http://agilebi.com/cs/blogs/jwelch/archive/2007/10/21/address-columns-generically-in-a-script-task.aspx
Jack Corbett