December 19, 2007 at 1:33 pm
I apologize for the lengthy explanation, but I don’t know how to explain the issue otherwise.
I am having some problems with SSIS performance and I would appreciate any help.
Our process is basically the import of data from various external sources. We then update the production data with individual logic to address each situation.
We have been using stored procedures and agent jobs (sometimes a large volume of individual steps) to accomplish this and it actually works rather well (we had not used the DTS process provided in SQL-2000 to any great extent).
There are a couple of relatively minor issues with the current practice that I was hoping to use SSIS to resolve. They are:
•The use of individual stored procedures (steps) I believe avoids the potential parallel execution of steps where such can be helpful.
•The use of successive steps terminates the job on failure of one step.
oThis sometimes happens when a varchar field has expanded on the source but not the resulting SQL table. While this is a rare event, the entire job terminates where it is really not necessary to do.
The preferred action would be to identify and isolate the record that caused the error, bypass it and continue.
I have successfully built an SSIS package to accomplish the functionality of the prior jobs and include the desired functions (e.g. parallel processing and identify and bypass bad records) using SSIS.
My problem, however, is that the resulting test SSIS process (14 steps primarily of reference tables) takes 4+ minutes to execute while the current process (job with stored procedures) takes 16 seconds to execute.
Can anyone tell me if this would be considered “normal” or if I am missing something in the design of the SSIS package? I am learning to use SSIS and would appreciate any help. At this point, it appears that the validation and recognition of bad records overrides the value of parallel processing.
December 19, 2007 at 2:37 pm
One thing that I would suggest is to implement error handling in your packages. Instead of the default fail component on an error, select the redirect option, and route the offending row into an exception table for analysis. This will prevent the failure of an entire package due to a handful of bad values.
Are you running your packages in interactive mode in BIDS, or via dtexec? I use the former only for development and testing, then use scheduled jobs with the latter for large production runs.
December 19, 2007 at 2:52 pm
Thanks,
I do use the redirect of the fundamental row key information in the package (rather than the default error). I then add the package and task using audit and write the data to a SQL Server table with the key information so I can tell the row(s) in the source that caused the problem.
I am not familiar with dtexec. I have run it in the jobs usig the wizard of running an Integrated Services package. It seems that running the package from a file location or server location makes little difference.
I will look into dtexec and see if that helps.
December 20, 2007 at 5:56 am
I tried using dtexec for execution, but there was no improvement.
December 20, 2007 at 6:46 am
Without actually seeing your SSIS package, it is going to be difficult to troubleshoot performance for you, but after having used it extensively, I will give you a few tips that may help.
In addition to this, there are lots of posted articles about SSIS performance. Read a few. SSIS and SQL compliment each other. They work very differently and you need to alter your way of thinking when you switch from one to the other.
1) Think like an application developer. This is probably the hardest thing for a DBA to do. SSIS is good at row-by-row operations and SQL is not. Deal with things in your workflow as if it is ok to handle one record at a time and you will find SSIS performs better.
2) Use queries for selecting data rather than selecting a table and checking off the columns you want. This will reduce the initial recordset before SSIS gets it rather than ignoring the fields from the recordset
3) Carefully deal with your connections. By default, your connection manager will connect to the database as many times as it wants to. You can set the RetainSameConnection property so it will only connect once. This can allow you to manage transactions using an ExecuteSQL task and BEGIN TRAN / COMMIT TRAN statements avoiding the overhead of DTC.
4) Use lots of sequence containers. They give you the control over what processes will run in parallel. If something can be run in parallel, let it.
5) Watch your variable scope. Variables that are evaluated as expressions seem to evaluate to their value when they come into scope, not when underlying information changes. This can be tricky to figure out when you scope something wrong.
6) Avoid script components. If you find yourself writing a complicated script component, stop and make sure there is not a control or group of control already there to do what you want.
7) Manage your pre-execute and validation phases. Lots of packages seem to run slow but are really getting stuck in validation and pre-execute phases of components. If components cache data, they cache it in the pre-execute. If you see components getting stuck in this phase, look at the properties and see if there is some way to manage caching.
8) Try to keep things in SSIS. Using staging tables in DTS was important. In SSIS, I have found them to just slow things down in most cases. Try to let SSIS manage the data and keep from moving it in and out of tables.
December 21, 2007 at 1:28 pm
A few add-ons to the previous post...
1.) While running the package with in BIDS ensure you set the package to run in optimized mode. This can be useful if you get stuck pulling in more data then you want from your source due to limitations on your select and filtering ability. Setting that option to true will allows SSIS to discard any data contained in columns you wont use later on. This can free up a considerable amount of memory.
2.) Consider the machine your running the packages on. No matter how good your package might be, if the hardware underneath it is struggling to keep up you pretty much stuck. Simple items to look at are the amount of free memory prior launching the package, both in development and on the server hosting SSIS. Remember that SSIS and SQLServer memory spaces are distinct from each other. So if SQLServer is sitting there hording most of the physical ram (as it usually does), it's going to leave SSIS to use what little is left then it's going to turn to the page file which will kill performance. If your SQL Server instance isn't doing much you might consider limiting the amount of ram it can get a hold of, possibly even reducing the affinity to only select processors leaving the remainder to the OS which is how SSIS gets access to system resources.
3.) Keep in mind what is running in parallel at the time of execution. Are there other packages, agent jobs, other applications consuming resources at the same time?
4.) It's recommended when pulling data in from a server to get it by executing a SPROC on the target server. This allows the database engine where the data lives to optimize the query and make use of the available indexes (of course make sure the column(s) you reference in your WHERE clause are indexed).
5.) When loading data back into the tables it's helpful to use the "Fast Load option" or in some cases Bulk Inserting the data (by passing the T-Log)
6.) Ensure the Source and Destination servers T-Logs for the DB's your running against are not constantly having to expand. The default setting isn't optimized for performance when running large transactions. Setting a larger intial size usually results in better performance (being mindful of the available disk space where the Tlog resides.)
Hope this helps...Happy Holidays
Eric
January 3, 2008 at 1:25 pm
Just to close this topic (for now), I really appreciate the assistance provided and it has helped me greatly. After some of the suggestions and additional reading, it appears that the main issue is with the memory status between SSIS and SQL Server. I am not sure at this point that I want to decrease the memory available for SQL Server and will need to address this at a later time.
In the meantime, I have been able to use the Try-Catch logic to identify data format exceptions and generate a notification so I can then address the issue. This process is not the ultimate, but using it I can use the "execute SQL" control task in SSIS and accomplish the parallel run productivity and actually not lose any functionality I currently have.
Again, thanks for your suggestions.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply