March 28, 2013 at 1:59 am
Hi ,
I have been given an task to load a text file to a database which is simple but the file may contain the duplicate records which are already present in the DB table. So i want to load the correct records and keep other records which are not loaded to table in another text file.
Example:
text file with source data:
ID,Name
1,john
2,maya
3,amir
4,neha
Database table:
ID(primary key),name(varchar(50))
101,sneha
102,michael
1,john
4,neha
Now i want the rows in text file with ID=2,3 get loaded to database table and new text file is created with data 1,john and 4,neha...
Can you please help me in this .
March 28, 2013 at 2:31 am
You've got two options:
(1) Load the whole file into a staging table and do the processing in the database engine, using SSIS to do the final export back out to the second test file; or
(2) Use the transformations provided in the Data Flow to redirect the rows for you. I think a Merge Join and/or a Conditional Split may be what you're looking for, but I usually go for option 1 myself so I'm not all that familiar with those transformations.
John
March 28, 2013 at 2:59 am
I'd do it like this:
1) Inside the dataflow, add a lookup. Configure the lookup to redirect on no match (rather than failing).
2) Use the lookup to check whether the input row already exists.
3) Send the lookup's match output to a text file
4) Send the lookup's no-match output to the table as usual.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 28, 2013 at 3:32 am
Use the Lookup transformation to do such type of task. This scenerio is also called as Upsert Pattern.
Thanks ,
Sumit Joshi (Lead Engineer - HCL Technologies)
Moblie:- 0-9650899699, Skype: sumit.joshij
March 28, 2013 at 3:48 am
John Mitchell-245523 (3/28/2013)
You've got two options:(1) Load the whole file into a staging table and do the processing in the database engine, using SSIS to do the final export back out to the second test file; or
(2) Use the transformations provided in the Data Flow to redirect the rows for you. I think a Merge Join and/or a Conditional Split may be what you're looking for, but I usually go for option 1 myself so I'm not all that familiar with those transformations.
John
Stick with option 1 🙂
The MERGE JOIN needs sorted data and since the source is a flat file you cannot sort the data at the source. Hence, you need to do it in the dataflow, leading to a blocking component.
The typical scenario used is the one described by Phil, aka the Lookup component. At least, if the destination table isn't too large.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 28, 2013 at 4:04 am
sumit.joshij (3/28/2013)
This scenerio is also called as Upsert Pattern.
/
Not really - as no updates are required here, I would call it a conditional INSERT.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 28, 2013 at 4:08 am
Phil Parkin (3/28/2013)
... I would call it a conditional INSERT.
Patent pending...
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 28, 2013 at 4:19 am
Koen Verbeeck (3/28/2013)
At least, if the destination table isn't too large.
If there can be duplicate IDs in the source data file, it means the lookup has to be used in non-cached mode, which is also a performance killer. But in cached mode it should run quite fast, subject to the above condition.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 28, 2013 at 4:25 am
Thanks every one!!! ...
March 28, 2013 at 5:39 am
Yes you are right.
But I mean to say here that, in case of update and insert we will call it as "Upsert Pattern".
Any ways Thanks. 🙂
Thanks ,
Sumit Joshi (Lead Engineer - HCL Technologies)
Moblie:- 0-9650899699, Skype: sumit.joshij
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply