November 2, 2006 at 8:56 am
I have an ssis package that in effect is bulk copying records from one database to another, both located on the same server, along with ssis. Each table is in the 8 million record range, with the ssis package using a data flow task with multiple source and destination flows to allow parallel loading.
During testing with 1 million records, the system seemed to perform adequately. Now, running the full data load, multiple source/destination boxes are showing no progress in record loading, with the following symptoms:
Some of the packages still executing show approx 2 million records loaded, although sql server table properties dialog shows all records have been transferred.
Some of the packages still executing show approx 2 million records loaded, although sql server table properties dialog shows 0 records loaded
Task manager reports all tasks are running, however, the .ldf (log) file of both source and target databases are not showing any activity, ant the timestamps are hours old.
Any ideas would be greatly appreciated-including how to stop the flow without corrupting either database.
November 2, 2006 at 11:31 pm
SSIS processing is ram memory intensive. I bet you are running out of ram, particularly as you are loading up the system trying to run multiple threads. Check the memory performance counters - if the processor loading is low, that means you are IO or memory bound. if you can batch your job, that might be the way to go.
November 3, 2006 at 7:43 am
Hello,
These are the counters I am currently monitoring:
Performance Object: SQL Server:SSIS Pipeline
Buffer memory
Buffers in use
Buffers spooled
Flat buffer memory
Flat buffers in use
Private buffer memory
Private buffers in use
What can I monitor to check processor loading ?
I don't believe this job can be batched, as they are parent-child tables.
November 3, 2006 at 8:15 am
Hi ferigod,
Like most people, I'm new to SSIS, but I've noticed something about the packages. When you execute a package from BIDS, SSIS likes to perform "validation checks" as part of its loading process and this validation seems to be tied to the size of the data source being used. So when the data set is large, the validation seems to take longer.
In addition, SSIS hates when you don't map columns from your data source to your destination. For example, if your data source includes a column that doesn't exist in your destination, then SSIS generates a warning for this occurrence. Those warnings don't prevent a pkg from executing, however they REALLY slow down the execution of a package.
One of the easiest ways to boost performance of your package is to use a stored procedure as your data source. Your stored procedure can contain a select statement that isolates just the rows that are required by the destination. In addition, you could also add a parameter so that if you wanted to treat the records as a batch, you could. Just select a column with a limited set of distinct values and use that column as the basis for your partioning.
Once you have generated your stored procedure on your database, open up your pkg in BIDS and double click on the OLE DB Source Data Flow component. In the Data Access Mode box, select SQL Command. Then enter your stored procedure in the Text Editor (make sure you include the word "exec"). If you have parameters, just include a ? separated by a comma for each parameter. (for ex., EXEC ReturnData_sp ?,?)
That may help to speed up your packages a bit. If you have more questions about how to pass a package variable (either a hard code value or a dynamically generated value) to a SQL Command in an OLE DB Source, just let me know.
Also if you have data from a single source that you are parsing out to multiple destinations, then I highly recommend using the Multicast Data Flow Transformation in combination with the OLE DB Source.
I had a package that had to import 4,000,000+ records, and prior to using the stored procedure and multi-cast, the package took over 25 minutes to execute. After making those changes, the execution time was reduced to 3.5 minutes.
November 3, 2006 at 8:29 am
Thanks for the quick response.
Currently I am using Exec SQL Task , with the SQLSourceType as direct input, SQLStatement with the actual T-SQL that gives me the recordset I want, for example: "Insert Into <destination table> from ( Select field1, field2 from <staging table> )", and the IsQueryStoredProcedure option set to false.
Should I convert these SQL statements to sprocs and set the option to True, would that make a huge difference ?
I am using SQL Destinations as the Destination in any Data Flow tasks that I use. The reason for the multi-stage approach is that I need to ensure the primary/foreign keys are generated correctly prior to bulk insert into the final destination, which has over 100 million records.
I am still unclear as to how to batch this. I appreciate your comments.
November 3, 2006 at 9:57 am
One of the most confusing things for me about SSIS is which data flow task is the most efficient in a particular situation.
I generally use the Execute SQL Task when I am trying to populate a variable dynamically because I can use the Parameter Mapping and Result Set options to pass a value to my user definite packager variables.
Outside of that, I'm not convinced that running Execute SQL is really that much more efficient than using the OLE DB Source of the Data Flow Component.
"Should I convert these SQL statements to sprocs and set the option to True, would that make a huge difference ?"
I highly recommend changing those statements to stored procs. First of all, you can execute the stored proc on the database to see how long it takes to run and load your staging table without getting SSIS involved. It may be that you need to optimize the table (source and/or destination) with a few indexes.
Then use your Execute SQL Task or other SSIS task to run the same stored proc. if you use the Execute SQL task, you don't need to change the IsQuery property. Just enter EXEC NameOfYourSP in the SqlStatement box and click okay. Since your stored proc contains an INSERT statement, it doesn't need any more information to populate the table.
If the run time is significantly increased by SSIS, then you may want to consider using another program to exec the stored proc or try to figure why SSIS seems to causing it to hang.
I know that you need a multi-layered approach to load your data, and based on what you've written it sounds like you are loading data into a table that generates keys for you. Then you are taking the data from that table and inserting it into its final destinations.
In order to simplify things for yourself (and take advantage of SSIS's powerful error handling features), you could use two Data Flow Components instead of using the execute SQL Task.
The first Data Flow Component could load the Staging Table. Then using the Flow component you could connect it to a second Data Flow Component that would load the Final Destination table only if the Staging table loaded successfully.
Within the data flow components you could build a stored procedure that contains only a SELECT statement as your OLE DB source. Then use the OLE DB Destination to map the source columns to the destination columns.
If you wanted to run this as a quasi batch job, you could select a column from your data that has limited set of values. For ex., if you are trying to load an Orders table, you may have an OrderType column that contains just a few values like 'Catalog', 'Phone', and 'Internet'.
In your stored proc, you could make that column a variable so that instead of running the package for all 6 million rows, it only runs it for those rows that contain that variable. (You could also use date values as your variable).
In the variable window for SSIS, generate a variable and hard code the value (alternately you can populate this dynamically but that takes much longer to explain. if you want to know how to do this maybe you can email me and I can send you some of the code that I have that makes this work).
In the OLE DB source, when you select SQL Command, use the stored proc + the ? mark and then click the Parameters button.
In first box, enter the name of the parameter as it appears in your stored proc. In the second box, enter the name of the package variable that contains that value then hit okay.
I'm more of a visual person, so if you need an example I can email it to.
Now, especially for testing, you can limit your data to a particular subset that hopefully will load a bit faster.
November 3, 2006 at 11:09 am
You gave me two good ideas to try here. The first is the conversion of the sql statements into stored procedures. That will give me a good metric for sql tuning.
The second is using data flow tasks, with the sql statement as the source, and the mapping into a sql destination.
The batch idea probably won't work in this situation, as the parent-child relationships are produced among a number of tables, not just one.
Let me get started so I can guage what the metrics will look like...
November 3, 2006 at 3:05 pm
Good luck!
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply