The Problem
In an Integration Services (SSIS) package, we are able to redirect the error rows to an error path/sql table in an SSIS data flow. By default, the error code and the ID of the column will be available. But how can we get the error column name in SSIS?
Solution
In earlier version of SSIS, there was no easy way to get the error column name. In SSIS 2016 and later versions, we can achive this by using a method called 'GetIdentificationStringByID' in the Script component in Data Flow.
An Example
Let's take a scenario where we have source data that has customer information, like CustomerNo, Name of the customer, and Pincode. We need to load the data into a target table. The structure of the target table is the same as the source table except for these two items:
- A dataType mismatch: The dataType of the column, CustomerNo, is nvarchar in the source but an integer in the target table
- A Check Constraint: The column, Pincode, should be 6 digits on the target table.
The package will throw an error if any one of the above validation items fails while transferring the data. By default, we can get the error column code and error description by redirecting the error rows in the data flow. But using the rrror code (ID) of the column, it is very difficult to identify where or which column caused the error.
Let me show the above scenario in brief with this test.
Using this SQL script, create a table, called tblCustomerSource, which would be source table in the data flow task:
CREATE TABLE [dbo].[tblCustomerSource]( [Row_Id] [int] NULL, [CustomerNo] [nvarchar](8) NULL, [Name] [nvarchar](35) NULL, [PinCode] [int] NULL ) ON [PRIMARY] GO
Insert some sample records (9 records) into the source table using the below SQL script:
INSERT INTO [dbo].[tblCustomerSource] ([Row_Id] ,[CustomerNo] ,[Name] ,[PinCode]) VALUES (1002885,'Z1370','Sales Ledger Control',610028), (1002888,'Z1390723','Direct Deliveries For Trafford',610028), (1009260,'28711','Office Supplies Express Ltd',610092), (1011137,'28712','Woodway UK Ltd',610111), (1021147,'28713','Cromwell 07 (Birmingham Branch)',610211), (1021148,'28714','Cromwell 02 (Coventry Branch)',610211), (1021149,'28715','Cromwell 09 (Derby Branch)',610211), (1021155,'28721','Cromwell 11 (Peterborough Branch)',10211), (1021162,'28730','Cromwell 63 (Smallman - Worcester)',10211)
Using below SQL script, create a target table, called tblCustomerTarget, which is the destination table in the data flow task. Here the column, [CustomerNo], has the data type as an integer, whereas it is an nvarchar in tblCustomerSource.
CREATE TABLE [dbo].[tblCustomerTarget]( [Row_Id] [int] NULL, [CustomerNo] [int] NULL, [Name] [nvarchar](35) NULL, [PinCode] [int] NULL ) ON [PRIMARY] GO
Next we create a check constraint, called [chk_PinLength], for the column, PinCode, on the target table to check the length is 6 digits.
ALTER TABLE [dbo].[tblCustomerTarget] ADD CONSTRAINT [chk_PinLength] CHECK ((len([PinCode])=(6))) GO
Using below SQL script, create a table, called tblCustomerInvalidRecords, where we can store the invalid records that would be redirected in the data flow task.
CREATE TABLE [dbo].[tblCustomerInvalidRecords]( [RowID] [int] NULL, [errorColumnName] [varchar](512) NULL, [errorDescription] [varchar](4000) NULL, [errorLogDate] [datetime] ) ON [PRIMARY] GO
Next we need to create a simple SSIS package. Below is the data flow task to load the data from source table (tblCustomerSource) to the destination table (tblCustomerTarget). We redirect the error rows to a Script component, called GetErrorDetails, and move the invalid records into tblCustomerInvalidRecords
In the OLE DB Source Editor page, select the OLEDB connection manager (LocalHost.test_db), the data access mode as 'Table or view', and the select the table, tblCustomerSource.
To Configure the destination component in the data flow task, right click and edit as shown below:
In the OLE DB Destination Editor page, select the OLEDB connection manager (LocalHost.test_db), the data access mode as 'Table or view', and the select the table, tblCustomerTarget.
Go to the Mapping tab and map the souce and destination columns as below:
Go to the Error Output tab, and select 'Redirect row' in the Error column as below and click OK.
Go back to the Data Flow Task to configure the Script Component. Right click on the GetErrorDetails task and select the Edit option.
Pick the Input Columns item on the left and select the columns we require. This is shown below where I have selected Row_Id, ErrorCode, and ErrorColumn. To avoid confusion I added the alias name, RowNo, for the column Row_Id
Selecct the Inputs and Outputs tab and select Output 0. Then add 4 columns: ErrorDescription, ErrorColumnName, RowNumber, and ErrorLogDate. These will be used to store the error row details.
Go to the Connection Managers tab and select LocalHost.test_db in the Connection Manager (or whatever your database is named).
Go to the Script tab and select Edit Script. Add the C# code shown below to get the error details.
Add the below C# code under the method, public override void Input0_ProcessInputRow(Input0Buffer Row), to get the ErrorDescription, ErrorLogDate, and ErrorColumnName.
If the error is due to the check constraint, the error column code would be 0. In this case, we will not get any value for 0, so the ErrorColumnName would be updated as 'An error that affects the entire row'.
If the error is due to the data type in the CustomerNo column, then the error column code will be returned. Then we can capture the ErrorColumn Name using the method, called 'GetIdentificationStringByID'.
Save the code and Click OK in the Script component Editor tab.
public override void Input0_ProcessInputRow(Input0Buffer Row) { try { Row.RowNumber = Row.RowNum; Row.ErrorDescription = this.ComponentMetaData.GetErrorDescription(Row.ErrorCode); Row.ErrorLogDate = DateTime.Now; 'If an error occurred due to Check Constraint, the ErrorColumn would be 0, and that error affects the entire row. Hence there is no specific column for that error if (Row.ErrorColumn == 0) { Row.ErrorColumnName = "An error that affects the entire row"; } 'If an error occurred due to Data type, then errorcolumn name would be updated. else { var componentMetaData130 = this.ComponentMetaData as IDTSComponentMetaData130; if (componentMetaData130 != null) { Row.ErrorColumnName = componentMetaData130.GetIdentificationStringByID(Row.ErrorColumn).Replace("Customer Target.Inputs[OLE DB Destination Input].Columns[","").Replace("]",""); } } } catch (Exception ex) { Row.ErrorColumnName = "Unable to get Error Column Name"; } }
Go to the Data Flow Task and right click on the Destination task, 'CustomerInvalidRecords'. Select Edit to configure the Invalid records output.
Select the OLEDB connection manager (LocalHost.test_db), the data access mode as 'Table or view', and the select the table, tblCustomerInvalidRecords.
Go to the Mapping tab and map the souce and destination columns as shown below, then click OK.
Go back to the Data Flow Task and execute the package. The package should execute successfully. We had 9 rows in the source table, and there are now 4 error rows that have been moved into the invalid records table.
The 5 rows, which have no errors, have been moved into the target table, tblCustomerTarget.
There are 4 error rows in the invalid records table. The first 2 rows have the errorcolumn name, CustomerNo, and the error description as 'The data value cannot be converted for reasons other than sign mismatch or data overflow.'
The last 2 rows have the errorcolumn name as 'An error that affects the entire row' and the error description as 'The data value violates integrity constraints'. If the error is due to a constraint, then the entire row will be treated as an error.
Conclusion
The error column is an important piece of information in SSIS package error diagnostics. In previous versions of SSIS, we don't have any easy way to achieve this. Using this article, you can add one of the easiest ways to find the Error Column Name information to your packages using the method, called 'GetIdentificationStringByID', introduced in SSIS 2016.