April 4, 2012 at 12:30 am
Recently I was designing SSIS for loading data from OLTP to DW database. As per the requirement the bad data has to be captured and the count should be checked. I decided to capture the SQL exception also that occurs in OLED Destination by redirecting the rows to error table. But I din’t get the expected result of error records. For example, if there are 1000 records and 50 records are bad after running SSIS and error record count exceeds 50. I am:w00t: surprised when some error records from error table are selected and inserted in SSMS and record inserted successfully. My second test of loading the same set of data in SSIS returned successful. Again when I tried to load the whole 1000 records in SSIS, same error count exceeds 50 as before. So I decided to investigate more is SSIS properties and recalled the effect of defining Maximum Insert commit Size in OLEDB Destination. This setting is important in terms of performance and accurate error data capture.
The demo illustrates the above scenario.
Create SQL Objects.
USE [Test]
GO
create table Department(
deptid int not null primary key,
name varchar(100)
)
-- I am using AdventureWorks DB to quickly populate the data
INSERT INTO Department
SELECT [DepartmentID]
,[Name]
FROM [AdventureWorks].[HumanResources].[Department]
WHERE DepartmentID <= 10
--=======================================
--Source table
-- Source data has 120 records with 20 records no reference key
-- ie. 20 records with deptid > 10 is considered as error records
create table SourceEmp (
Empid int,
EmpName varchar(100),
Deptid int
)
INSERT INTO SourceEmp
SELECT TOP 120 [EmployeeID],'AAAA'
,[DepartmentID]
FROM [AdventureWorks].[HumanResources].[EmployeeDepartmentHistory]
--=======================================
--Destination table
create table Employee (
Empid int,
EmpName varchar(100),
Deptid int references Department(deptid)
)
--========================================
--ErrorLog table
create table Error_Employee (
Empid int,
EmpName varchar(100),
Deptid int,
ErrorCode varchar(10),
ErrorDescription varchar(2000)
)
--========================================
Now create SSIS package with Dataflow task. Define OLEDB source with SQL command select * from SourceEmp order by 3
Drop OLEDB destination and connect the output from source with default settings. Point the destination table to Employee and assign column mapping
Drop another OLEDB destination (rename to error log) and connect the red arrow from OLEDBDestination and in the Configure error output popup window select “Redirect” in the Error column. Point the destination table to Error_Employee and assign column mapping
Now run the SSIS package and you will be surprised to see all 120 records are pushed to error table
Now right click OLEDB Destination and select Edit. Change the default value for Maximum Insert commit Size(MICZ) from 2147483647 to 15 and execute the package.
Now surprise to see 30 records are pushed to error table. The reason is data is buffered to count 15 and bulk inserted to destination. Records are loaded into destination from 1 to 90. But the batch 90 to 105 there are 5 error records which gets SQL exception and the whole buffer data is redirected to error log.
Now experiment with different MICZ and check the output.
The result shows that the minimum MICZ size the accurate error log. But this becomes bottleneck in performance.
Appreciate to know if there is any other best way to track the error records from OLEDB destination.
November 15, 2012 at 11:00 am
I recreated your example and I got the same results, but I found an interesting difference when the failure is a result of a Not NULL constraint.
Alter table Employee to make the column Deptid not null and then change one value of Deptid in the source table to NULL so that when you try to insert to the destination violate the not null constraint. If you have MICZ set to 15, the first commit (rows 1-15) will send the 1 NULL row to the error log destination and will commit the other 14. The foreign key violations will send the whole group of 15 to the error log.
Can someone shed some light as to why there is a difference in the way SSIS handles the NULL violation vs the foreign key violation?
These are the errors I get when running the insert in SSMS:
Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the FOREIGN KEY constraint "FK__Employee__Deptid__060DEAE8". The conflict occurred in database "Test", table "dbo.Department", column 'deptid'.
The statement has been terminated.
Msg 515, Level 16, State 2, Line 5
Cannot insert the value NULL into column 'Deptid', table 'Test.dbo.Employee'; column does not allow nulls. INSERT fails.
The statement has been terminated.
Specific to your question, others have used two Destination components. The first destination with Fast Load and then row redirect to the second destination. The second would have normal load with row redirect to the error destination. This would attempt to fast load and then if it fails would try to normal load, which should leave you with just the true error rows not the entire MICZ. You would have to determine the best MICZ to use though and you wouldn't want to use the default of 2147483647. I'm obviously no expert and I have not tried this setup but I think this might be a good solution for some.
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply