Hello,
How can I import a CSV file in SSIS (2016) at the same comparing a value in the destination table if the record is new?
The CSV file is a continuous record-appending file.
I want to avoid a staging table if possible.
Thank you,
Vinay
May 17, 2021 at 12:25 pm
Yep... you can avoid a staging table if you'd like but... if something goes wrong, there's going to be a long rollback and your target table might be close to useless during such a roll back. Are you sure you want to take such a chance with unknown non-prevalidated data?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 17, 2021 at 12:32 pm
Thank you Jeff,
The target table is to be used for only one processing and won't be used by any other users at all. it's like a collection bucket.
That's why I don't want to insert into there records already processed (duplication).
I think having two tables for such a small task is over the kill, hence I need to validate the records during SSIS stage prior to inserting new records only.
There is nothing in the Data transform stage to compare with a table.
Vinay
May 17, 2021 at 12:43 pm
What do you mean by 'continuous record-appending file'? Is this a file which just keeps on growing forever? I'd be looking quite vigorously for a way to avoid that, because you don't want to be needlessly processing the same data over and over again.
If you are looking for a way in SSIS to do INSERTs of new rows (but no updates or deletes), and assuming you have a suitable unique key in place, in both the file and the target table, this is easy to do. Use a Conditional Split within your data flow, matching on the unique key. Send the 'unmatched' output to your target table and ignore the 'matched' output.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 17, 2021 at 12:56 pm
Thank you Phil,
The CSV file is out my control. It will be a growing file, yes but limited to few records once a month. It won't grow too big (not this year or next).
I will try Conditional Split. Is there a way to then compare the incoming file source with the target table for unique values?
Vinay
May 17, 2021 at 12:58 pm
Thank you Jeff,
The target table is to be used for only one processing and won't be used by any other users at all. it's like a collection bucket.
That's why I don't want to insert into there records already processed (duplication).
I think having two tables for such a small task is over the kill, hence I need to validate the records during SSIS stage prior to inserting new records only.
There is nothing in the Data transform stage to compare with a table.
Vinay
Are you saying that the source file contains all the rows (new rows and old rows and is frequently updated with new rows) all the time and that you have to keep all of the old rows in the table forever and that's why you just want to load new rows into the table?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 17, 2021 at 1:04 pm
Are you saying that the source file contains all the rows (new rows and old rows and is frequently updated with new rows) all the time and that you have to keep all of the old rows in the table forever and that's why you just want to load new rows into the table?
Yes, in summary that's right. The destination file will have all the records, the CSV file will have the same + new ones appended.
Vinay
May 17, 2021 at 1:34 pm
I will try Conditional Split. Is there a way to then compare the incoming file source with the target table for unique values?
Vinay
What do you mean by 'unique values'?
The CS allows you to match rows in your source file against rows in the target table. I am assuming that you have a unique key in both places?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 17, 2021 at 1:56 pm
Phil,
Yes, there is a unique non-numeric key (username). Can the conditional split be used against one field to check for duplication?
V
May 17, 2021 at 2:27 pm
Phil,
Yes, there is a unique non-numeric key (username). Can the conditional split be used against one field to check for duplication?
V
If the key is unique, how can there possibly be any duplication? Please provide an (anonymised) example.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 17, 2021 at 6:16 pm
Phil,
Yes, there won't be any duplication in the file within itself because of the unique values(username).
However, regular importing the same CSV file with older data and new data, will make duplication in the target table.
Which is what I want to avoid (using runtime check).
Vinay
Thank you all for your inputs.
I have resolved the issue using Merge Join (Left Outer join) with Conditional Split (for column = NULL)
which produces exactly the new records I need to insert back in the target table.
V
May 18, 2021 at 9:32 am
Phil,
Yes, there won't be any duplication in the file within itself because of the unique values(username).
However, regular importing the same CSV file with older data and new data, will make duplication in the target table.
Which is what I want to avoid (using runtime check).
Vinay
'Checking for duplication' and 'avoiding duplication' are different things. The first suggests that duplicates may already exist, unlike the second.
I did, however, make a mistake when recommending the CS for this task. I meant the Lookup, my apologies. This will probably perform significantly faster than the JOIN. Configure the Lookup to use
SELECT <unique key> from TargetTable
after selecting 'Use results of an SQL query' in the Connection node.
Use FULL caching (cache mode), if you have sufficient memory.
and then match the unique key of the incoming rows against it, sending the unmatched rows to the target table.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply