February 9, 2015 at 10:17 am
Hi All,
I would like to know, what would be the best practice here,
My SSIS Package is very simple, but I would like to use/keep best practice.
My Source is FLAT FILE (the data around, 2 million, Package runs once a day at 3:00 am)
Destination is SQL Table.
First Option:- Retrieve the data from flat file and do some transformation as needed and dump the data into SQL Table.
Second Option:- Retrieve the data from Flat File and do some transformation as needed and dump the data into Staging SQL Table and then from Staging to Target SQL Table.
My questions are
1) Which option is best and why..
2) Which Option is fast.
Need advise/thoughts from SSIS Gurus.
Thanks in advance.
February 9, 2015 at 10:22 am
What sort of transformations do you have in mind?
Will you be doing just inserts, or inserts and updates, or inserts, updates and deletes?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 9, 2015 at 10:25 am
Opps I should mention.
I am using so far two transformations ( Derived Column, Data Conversion)
Plus I am using SCD..
That would be next question. If I pick Second Option then SCD would be fast or Merge Statement ? ( My little exp says Merge)
What you think?
February 9, 2015 at 10:29 am
rocky_498 (2/9/2015)
Opps I should mention.I am using so far two transformations ( Derived Column, Data Conversion)
Plus I am using SCD..
That would be next question. If I pick Second Option then SCD would be fast or Merge Statement ? ( My little exp says Merge)
What you think?
SCD is slow, don't use it.
Can you estimate the % of INSERTs vs UPDATEs, approximately?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 9, 2015 at 10:30 am
I would say, 70/30 (Insert/Update).
February 9, 2015 at 10:37 am
Here's how I would do it.
Add a lookup to your package and use it to determine Insert vs Update.
If it's an Insert, send it directly to the target table.
If it's an update, send it to a (truncated) work table and run an update stored proc just after the dateflow to process those.
Your main problem here is that your lookup cache needs to be in-memory for optimum performance. Millions of rows in your target table = lots of RAM needed.
What sort of column will you be matching on when deciding whether to insert or update ... what does the data look like?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 20, 2015 at 1:49 am
rocky_498 (2/9/2015)
Opps I should mention.I am using so far two transformations ( Derived Column, Data Conversion)
Plus I am using SCD..
That would be next question. If I pick Second Option then SCD would be fast or Merge Statement ? ( My little exp says Merge)
What you think?
Dump your flat file data into a staging table. Then use merge to load data into your main table from staging table. This is the best and reliable way to achieve this.
____________________________________________________________
APFebruary 20, 2015 at 2:42 am
anshulparmar (2/20/2015)
rocky_498 (2/9/2015)
Opps I should mention.I am using so far two transformations ( Derived Column, Data Conversion)
Plus I am using SCD..
That would be next question. If I pick Second Option then SCD would be fast or Merge Statement ? ( My little exp says Merge)
What you think?
Dump your flat file data into a staging table. Then use merge to load data into your main table from staging table. This is the best and reliable way to achieve this.
There are some issues with MERGE:
Use Caution with SQL Server's MERGE Statement[/url]
It's also hard to track how many rows you actually inserted and updated.
Personally I use the Lookup pattern described by Phil.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
February 20, 2015 at 2:43 am
Phil Parkin (2/9/2015)
Your main problem here is that your lookup cache needs to be in-memory for optimum performance. Millions of rows in your target table = lots of RAM needed.What sort of column will you be matching on when deciding whether to insert or update ... what does the data look like?
Depends a bit on the lookup columns used.
I have used lookup transformations on data sets over 60 million rows without an issue, because I used integer columns.
Do not try this at home with GUIDS though 😀
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply