February 22, 2012 at 5:43 am
Prior to upgrading to SSIS 2008, I had a data warehouse that was populated by two separate processes, an SSIS 2005 package (populating one subset of tables) and an old vb.net application that populated the second subset of tables.
Once I upgraded to SSIS 2008, I thought I'd use the opportunity to create a new SSIS 2008 package to combine the functionality of the two previous applications. I thought this would be quicker and more efficient. Little did I know that it would take an absolute age to run!
The old vb.net application can transfer over a million records in around 5 minutes! The old SSIS 2005 package transferring 470,000 records in 20 minutes.
I set my new SSIS 2008 package to run on my development server via a sql agent job at 11am yesterday, and it was still running this morning having only just got around half way! Something clearly wrong here.
The vb.net package used the same odbc dsn for Sybase (SQL Anywhere 11) and used a SqlClient.SqlBulkCopy command to complete the transfer. Is this the same as a Fast Load OLE DB destination transfer in SSIS?
My SSIS 2008 package uses a .net providers\odbc data provider source for the sybase dsn, and an ole db destination created in connection managers (Native OLE DB\Microsoft OLE DB Provider for SQL Server) for the SQL 2008 data warehouse table. The only difference I can see with the SSIS 2005 package is that the data source of that package was created in connection managers from a "new connection from data source..." set up from an existing data source in solution explorer, that being a .net providers\odbc data provider data source. Can't see why that would have an effect really?
I'd be very grateful if anyone can assist me with a way to speed this process up. As it stands, it's unworkable as the package would need to be scheduled to run on a daily basis.
February 22, 2012 at 5:50 am
Just to check, you've selected "Table or View - Fast Load" for the Data Access Mode of the OLE DB destination?
It's usually blisteringly fast going to SQL Server destinations as long as the source is quick enough (and it doesn't sound like anything's changed there)
February 22, 2012 at 6:22 am
Yes, they're all Table or view - fast load data access mode.
February 22, 2012 at 6:26 am
Ok, for all intents and purposes, it's the same underlying technology. I'd start by removing the OLE DB destination and just sticking in a row count transformation instead to see if it's the source or destination slowing it down.
February 22, 2012 at 6:32 am
Aha. Good ploy. I've just tried that and it's the same speed on my development machine. Well that's something I guess. Any ideas on the odbc source then?
February 22, 2012 at 6:38 am
My advice in this situation would be the following:
1. Make sure the delay is not at the source. To test this, remove the pipe between your source and destination and add something like a row count transformation as destination. You may need to tweak your buffer sizes. Another way to speed things up (at the source), is to only retrieve the data you need...write sql statements to return wanted columns only, as opposed to connecting to the source table directly.
2. Once you've established it's not the source, put the destination back in place and run a trace while executing the package. You may also have to tweak the buffer sizes of your destination.
3. Check for indexes/triggers/constraints etc. on your destination table. These may slow things down considerably. To test if this is the case, look at the execution plans of the queries executed (from the trace obtained)...or simply create a heap and use that as destination to see the difference.
February 22, 2012 at 7:05 am
I've tried the sql statement approach already unfortunately. I've just tried to run the ssis 2005 package in VS 2008 and it runs the same speed on my PC. Weird! Our old ssis 2005 package is running on a SQL server 2005 server at the moment and I'm unable to open the solution on that server as it only has VS 2005 (I had to upgrade the package to SSIS 2008 in order to be able to view the design on my VS 2008 pc). Is there any other way to compare the data access on that server to either my development PC or production server? It wouldn't be something to do with 32-bit and 64-bit systems may be? (clutching at straws here!)
There's no indexes on the destination table by the way.
Martin Schoombee (2/22/2012)
My advice in this situation would be the following:1. Make sure the delay is not at the source. To test this, remove the pipe between your source and destination and add something like a row count transformation as destination. You may need to tweak your buffer sizes. Another way to speed things up (at the source), is to only retrieve the data you need...write sql statements to return wanted columns only, as opposed to connecting to the source table directly.
2. Once you've established it's not the source, put the destination back in place and run a trace while executing the package. You may also have to tweak the buffer sizes of your destination.
3. Check for indexes/triggers/constraints etc. on your destination table. These may slow things down considerably. To test if this is the case, look at the execution plans of the queries executed (from the trace obtained)...or simply create a heap and use that as destination to see the difference.
February 22, 2012 at 7:12 am
Griffster (2/22/2012)
I've tried the sql statement approach already unfortunately. I've just tried to run the ssis 2005 package in VS 2008 and it runs the same speed on my PC. Weird! Our old ssis 2005 package is running on a SQL server 2005 server at the moment and I'm unable to open the solution on that server as it only has VS 2005 (I had to upgrade the package to SSIS 2008 in order to be able to view the design on my VS 2008 pc). Is there any other way to compare the data access on that server to either my development PC or production server? It wouldn't be something to do with 32-bit and 64-bit systems may be? (clutching at straws here!)There's no indexes on the destination table by the way.
It could be...by default SSIS runs 32-bit in designer mode (as far as I remember), so if that is the way in which you tested the package it could be a possible cause.
You could also maybe try and recreate your connection manager to the source. The upgrade from 2005 to 2008 could have caused the use of a different driver (i.e. .NET instead of pure ODBC/OLEDB). It's a longshot, but worth a try.
Unless you look at the source xml for both or have 3rd party tools, I don't know of another way to compare...
February 22, 2012 at 8:01 am
Think that's a bit of red herring looking in to it a bit further. I created the main body of the new SSIS 2008 package using the import/export wizard. I changed the table or view to sql command on all of them afterwards. The import/export wizard seems to create a number of data flow tasks with multiple transformations in them, guess they run in parallel. Can this cause things to slow down rather than running sequentially?
February 22, 2012 at 8:08 am
Griffster (2/22/2012)
Think that's a bit of red herring looking in to it a bit further. I created the main body of the new SSIS 2008 package using the import/export wizard. I changed the table or view to sql command on all of them afterwards. The import/export wizard seems to create a number of data flow tasks with multiple transformations in them, guess they run in parallel. Can this cause things to slow down rather than running sequentially?
It depends...unfortunately. If all the parallel data flows access the same resource (i.e. table) it could cause some performance issues if there are i/o or resource bottlenecks. The opposite is also true...if your system is low on resources, parallel processing may also slow things down (paging, etc.)
This brings me to another point...you may have to check the amount of memory available/allocated to SSIS on that machine.
February 22, 2012 at 8:08 am
Also, having run a trace on the production server, by running the new package in design mode, it seems to be running a series of insert bulk [dbo].[Table_Name] statements with (TABLOCK,CHECK_CONSTRAINTS,ROWS_PER_BATCH = 1000). Any way of optimising these?
February 22, 2012 at 8:13 am
Check the settings on your OLE DB destination. Remove any value from Rows Per Batch and make sure Maximum commit size is zero.
Also, it might be worth looking at the properties tab of the data flow as a whole to make sure the DefaultBufferMaxRows and DefaultBufferSize parameters are set high enough
February 22, 2012 at 8:18 am
HowardW (2/22/2012)
Check the settings on your OLE DB destination. Remove any value from Rows Per Batch and make sure Maximum commit size is zero.Also, it might be worth looking at the properties tab of the data flow as a whole to make sure the DefaultBufferMaxRows and DefaultBufferSize parameters are set high enough
+1...that's what I meant before by "tweak the buffer sizes"
February 22, 2012 at 8:19 am
How do I check the amount of memory available/allocated please?
Martin Schoombee (2/22/2012)
Griffster (2/22/2012)
Think that's a bit of red herring looking in to it a bit further. I created the main body of the new SSIS 2008 package using the import/export wizard. I changed the table or view to sql command on all of them afterwards. The import/export wizard seems to create a number of data flow tasks with multiple transformations in them, guess they run in parallel. Can this cause things to slow down rather than running sequentially?It depends...unfortunately. If all the parallel data flows access the same resource (i.e. table) it could cause some performance issues if there are i/o or resource bottlenecks. The opposite is also true...if your system is low on resources, parallel processing may also slow things down (paging, etc.)
This brings me to another point...you may have to check the amount of memory available/allocated to SSIS on that machine.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply