June 30, 2015 at 5:44 am
Hi dear SQL Folks,
This time I have a weird situation where I am getting unexpected duplicates in an incremental load.
The control flow is like in the picture below:
Inside the DTF “Insert – Updates” I am using a well-known pattern in order to detect the existing records in the target table, using a lookup component and then redirecting the matched records to a temp table and the non-matching direct in the target table. Then I update the target table with the data in the temp table.
The temp table is a temporary table also in the target server.
I set the property RetainSameConnection = True for all connection manager
This pattern is running within a foreach loop that iterates by country.
DTF Task:
The lookup is also parameterized by country.
The surrogated key from the source table is also used as the primary key in the target table and also in the temporary table.
Sometime I am getting primary key violations due to duplicates in both, the target table and the temporary table (the last also has a primary key constraint). I cannot understand how is that possible, because the lookup should avoid that an existing record goes in the insert branch. I checked the parameterized lookup and is working, I also check the records coming from the OLEDB Source and there are no duplicates in it.
I read that the RetainSameConnection 0 True is needed in order to work with transactions and that could be disturbed by using the same connection manager in parallel steps (like lookups, and OLEDB destinations in the figure above).
Any idea about it?
Any comment would be highly appreciated.
Kind Regards,
June 30, 2015 at 5:50 am
What is the datatype of the PK?
Are you certain that only one instance of the job can be running at any one time?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
June 30, 2015 at 6:07 am
Hi Phil,
The PK is a bigint. In the source is a surrogated key, in the target is just the PK.
We have a tool to avoid multiple executions, instead they are queued in case that the job is triggered multiple times.
June 30, 2015 at 6:08 am
Another detail,
I am using with (nolock) in all the queries, including the lookup query
June 30, 2015 at 6:30 am
Paul Hernández (6/30/2015)
Another detail,I am using with (nolock) in all the queries, including the lookup query
When I have had this problem in the past, it has usually been because there are duplicates in the source and I am doing a fully cached lookup. Are you certain that this cannot happen?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
June 30, 2015 at 6:59 am
I cannot find duplicates in the source, therefore I am trying to find another cause for the duplicates. But what I also want to find out with this post is the possible side effects of using manual transactions and a single connection manager for this SSIS pattern.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply