November 3, 2017 at 11:12 am
We have a DFT with over 10 million records from 2 sources which goes through a merge join and sorted and stored using OLE DB Destination with Data Access Mode Table or View and a redirect row to capture data exceptions. However, this process takes at least 20 minutes to an hour to execute. We tried using fast load option but it had issues with not capturing the data exceptions. Is there another way of improving the performance without losing the redirect row capability?
November 3, 2017 at 11:16 am
sadie - Friday, November 3, 2017 11:12 AMWe have a DFT with over 10 million records from 2 sources which goes through a merge join and sorted and stored using OLE DB Destination with Data Access Mode Table or View and a redirect row to capture data exceptions. However, this process takes at least 20 minutes to an hour to execute. We tried using fast load option but it had issues with not capturing the data exceptions. Is there another way of improving the performance without losing the redirect row capability?
What are the sources?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 3, 2017 at 11:31 am
Phil Parkin - Friday, November 3, 2017 11:16 AMsadie - Friday, November 3, 2017 11:12 AMWe have a DFT with over 10 million records from 2 sources which goes through a merge join and sorted and stored using OLE DB Destination with Data Access Mode Table or View and a redirect row to capture data exceptions. However, this process takes at least 20 minutes to an hour to execute. We tried using fast load option but it had issues with not capturing the data exceptions. Is there another way of improving the performance without losing the redirect row capability?What are the sources?
One is a staging table and the other is the target table. I failed to mention we are splitting the output of the merge join to a work table for updates and the target table for insert and both destinations have a redirect row.
November 3, 2017 at 11:43 am
sadie - Friday, November 3, 2017 11:31 AMPhil Parkin - Friday, November 3, 2017 11:16 AMsadie - Friday, November 3, 2017 11:12 AMWe have a DFT with over 10 million records from 2 sources which goes through a merge join and sorted and stored using OLE DB Destination with Data Access Mode Table or View and a redirect row to capture data exceptions. However, this process takes at least 20 minutes to an hour to execute. We tried using fast load option but it had issues with not capturing the data exceptions. Is there another way of improving the performance without losing the redirect row capability?What are the sources?
One is a staging table and the other is the target table. I failed to mention we are splitting the output of the merge join to a work table for updates and the target table for insert and both destinations have a redirect row.
Cool, then I suggest doing the SORT operation as part of the select from the data sources and removing it from your package. You'll have to set the IsSorted property for the MERGE JOIN input to true (if you can't find where that is, post back and I'll hunt it down). Sorts in SSIS are notoriously slow and are known as 'blocking' operations, because all of the data has to be sorted before it can be piped out of the sort component.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 3, 2017 at 2:19 pm
Phil Parkin - Friday, November 3, 2017 11:43 AMsadie - Friday, November 3, 2017 11:31 AMPhil Parkin - Friday, November 3, 2017 11:16 AMsadie - Friday, November 3, 2017 11:12 AMWe have a DFT with over 10 million records from 2 sources which goes through a merge join and sorted and stored using OLE DB Destination with Data Access Mode Table or View and a redirect row to capture data exceptions. However, this process takes at least 20 minutes to an hour to execute. We tried using fast load option but it had issues with not capturing the data exceptions. Is there another way of improving the performance without losing the redirect row capability?What are the sources?
One is a staging table and the other is the target table. I failed to mention we are splitting the output of the merge join to a work table for updates and the target table for insert and both destinations have a redirect row.
Cool, then I suggest doing the SORT operation as part of the select from the data sources and removing it from your package. You'll have to set the IsSorted property for the MERGE JOIN input to true (if you can't find where that is, post back and I'll hunt it down). Sorts in SSIS are notoriously slow and are known as 'blocking' operations, because all of the data has to be sorted before it can be piped out of the sort component.
The source queries have an order by and the IsSorted property is already set on both OLE DB Source.
November 3, 2017 at 2:23 pm
sadie - Friday, November 3, 2017 2:19 PMThe source queries have an order by and the IsSorted property is already set on both OLE DB Source.
So why are you using the SORT in the package? Or am I misreading your post?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 3, 2017 at 2:34 pm
Phil Parkin - Friday, November 3, 2017 2:23 PMsadie - Friday, November 3, 2017 2:19 PMThe source queries have an order by and the IsSorted property is already set on both OLE DB Source.So why are you using the SORT in the package? Or am I misreading your post?
We are not using the SORT operation in the package. When I mentioned in my first post that the source data goes through a merge join and sorted, I meant that the data is ordered by within the merge join using the sort order from data source query. Sorry for the confusion. I hope this makes sense.
November 3, 2017 at 3:01 pm
Why not do the whole merge, sort and exception handling on the database itself?
even if the 2 tables are in 2 different SQL Instances in many cases it will be faster to load one of the tables onto the same instance as the other one and do the processing there instead of doing it on SSIS.
Even copying both tables to the final server and processing them there may be faster.
November 3, 2017 at 3:27 pm
frederico_fonseca - Friday, November 3, 2017 3:01 PMWhy not do the whole merge, sort and exception handling on the database itself?
even if the 2 tables are in 2 different SQL Instances in many cases it will be faster to load one of the tables onto the same instance as the other one and do the processing there instead of doing it on SSIS.Even copying both tables to the final server and processing them there may be faster.
Thank you for you response but we need the solution in SSIS since this is part of our ETL standard process.
November 3, 2017 at 6:17 pm
sadie - Friday, November 3, 2017 3:27 PMfrederico_fonseca - Friday, November 3, 2017 3:01 PMWhy not do the whole merge, sort and exception handling on the database itself?
even if the 2 tables are in 2 different SQL Instances in many cases it will be faster to load one of the tables onto the same instance as the other one and do the processing there instead of doing it on SSIS.Even copying both tables to the final server and processing them there may be faster.
Thank you for you response but we need the solution in SSIS since this is part of our ETL standard process.
And? Executing SP's or straight code as part of a SSIS step is still part of an ETL process.
Just because SSIS can do it with particular tasks does not mean it should do it all - specially for performance and clarity/easier coding doing it on SQL is in many cases the correct option.
November 3, 2017 at 6:31 pm
frederico_fonseca - Friday, November 3, 2017 6:17 PMsadie - Friday, November 3, 2017 3:27 PMfrederico_fonseca - Friday, November 3, 2017 3:01 PMWhy not do the whole merge, sort and exception handling on the database itself?
even if the 2 tables are in 2 different SQL Instances in many cases it will be faster to load one of the tables onto the same instance as the other one and do the processing there instead of doing it on SSIS.Even copying both tables to the final server and processing them there may be faster.
Thank you for you response but we need the solution in SSIS since this is part of our ETL standard process.
And? Executing SP's or straight code as part of a SSIS step is still part of an ETL process.
Just because SSIS can do it with particular tasks does not mean it should do it all - specially for performance and clarity/easier coding doing it on SQL is in many cases the correct option.
We were able to improve the execution time by using one source and taking the merge join out and using a lookup to split into two OLE DB destinations with a fast load option and default setting on MICS which allowed the data exception to flow in the redirect rows.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply