August 28, 2013 at 11:12 am
I need to copy a number of tables from an ERP system's Progress database to a SQL Server database that will be used for reporting. I have a working package, but was wanting recommendations on how to improve the performance.
The Progress database has tables grouped by function, for example AR (Accounts Receivable) and AP (accounts Payable). I have strung a number of Sequence Containers together, each containing the transfer steps for each grouped function. They are linked together and are set to continue on success. Some tables will be completely replaced. Other large tables that contain data that can be grouped by year, like order history, are usually partially replaced, but may be fully replaced. For example, I will normally delete the last 2 years of order data and replace it. However, once a month I will do a complete wipe and refresh.
Each Sequence Container contains the following steps:
1. Drop Indexes - An Execute SQL task that drops any non-primary indexes from the tables getting transferred
2. Truncate Tables - An Execute SQL task that truncates any tables that are being fully replaced
3. Delete Tables - An Execute SQL task that deletes data from tables that will be partially loaded, like the order data.
4. A series of Data Flow tasks for each table. Each task does an ADO Net Source to an ADO Net Destination. The source being a SQL query that pulls only the columns I need. The destination being a SQL Server table. For large tables, I will use a For Loop Container to load data in chunks, for example, order data by year.
5. Recreate Indexes - An Execute SQL task that creates any non-primary indexes for the tables getting transferred
Nothing overly complicated here. I do have a couple of tables that are driven off of other tables, like the order detail table. That mainly affects the way the ADO Net Source query is generated. The Truncate/Delete tasks may also be enabled/disabled based on whether it is the once a month run or the daily run.
Like I said, this works fine. It just takes over an hour and a half to run. I did try doing a couple of tests with one of the tables. The first test was to write the source results to a flat file and then using Bulk Insert task. I also tried changing the ADO Net Destination to a SQL Server Destination. This one required me to add a data conversion transformation step due to the way the Progress ODBC returns string and date data. Both attempts took near or slightly longer than my original method.
Should I break all this up into separate packages? Should I try to flow these so there are multiple tables being concurrently loaded? Anything else I should be looking at?
Thanks.
August 28, 2013 at 1:41 pm
Why ADO.NET for a SQL Server Destination? I normally use OLE DB Destination with Fast Load option.
Is the destination database in simple recovery model?
Are the database files correctly sized?
Any transformations in the data flows?
Did you try out different buffer sizes in the data flows?
About how much data are we talking here?
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
August 28, 2013 at 2:14 pm
Koen Verbeeck (8/28/2013)
Why ADO.NET for a SQL Server Destination? I normally use OLE DB Destination with Fast Load option.Is the destination database in simple recovery model?
Are the database files correctly sized?
Any transformations in the data flows?
Did you try out different buffer sizes in the data flows?
About how much data are we talking here?
No particular reason. I do have the option for Use Bulk Insert checked.
Yes, simple model.
Not sure I know what you mean by database files being correctly sized.
No transformations.
How do you change buffer sizes? Any recommendations on what to base them off of?
The database is 10 GB. Largest tables have row counts under 10 million.
August 28, 2013 at 2:15 pm
By the way, I am a developer. We don't have a dba, so I make do the best I can.
August 28, 2013 at 3:45 pm
jplankenhorn (8/28/2013)
Koen Verbeeck (8/28/2013)
Why ADO.NET for a SQL Server Destination? I normally use OLE DB Destination with Fast Load option.Is the destination database in simple recovery model?
Are the database files correctly sized?
Any transformations in the data flows?
Did you try out different buffer sizes in the data flows?
About how much data are we talking here?
No particular reason. I do have the option for Use Bulk Insert checked.
Yes, simple model.
Not sure I know what you mean by database files being correctly sized.
No transformations.
How do you change buffer sizes? Any recommendations on what to base them off of?
The database is 10 GB. Largest tables have row counts under 10 million.
A typical database has two files: the data file (mdf) and the log file (ldf).
Let's say for example your data file is currently 1GB big and is at 90% capacity (so 100MB free space).
Suddenly you want to pump data in some tables for 2GB worth. This means the data file has to autogrow to at least 1.9GB.
The default autogrow setting for a database is incremental steps of 10MB. This means the database has to grow 190 times. This wastes a lot of time and is thus not really efficient.
If your database file would have a correct size of 3GB, the file wouldn't need to grow at all, so inserts would be faster.
Regarding the data flow buffers; you have two settings in the data flow property: DefaultBufferMaxRows and DefaultBufferSize. Making them larger (so they can handle more rows at the same time) might improve performance. More info: Data Flow Performance Features.
Other performance tips: Top 10 SQL Server Integration Services Best Practices
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
August 29, 2013 at 1:10 pm
Thanks for the recommendations. I will look into those.
August 30, 2013 at 2:42 pm
I was reading the Stairway on Indexes. For the tables that I am loading in chunks by year, would I be better off creating a clustered index that has the date in the first part of the index to facilitate the load and delete of the table? Or should I set the clustered index to use columns that report queries may more often use and create a nonclustered index on the date? I understand that the second option would cause the database to reorder rows during the load.
September 1, 2013 at 1:31 pm
A clustered index on the date makes sense.
You can always add nonclustered indexes if needed to support the reports.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply