SSIS inserts are very slow

Question

SSIS inserts are very slow

Sioban Krzywicki

One Orange Chip

Points: 27770
More actions
November 3, 2011 at 11:15 am

#251065

I have three tables I'm trying to insert to in an SSIS package.
Each has a GUID primary key.
I'm selecting from 2 other tables to do the inserts and on 2 of the three inserts, there's no transformation of the data at all.
When I run the package into empty tables the entire package takes under a second. The first table has 100,000 rows on average and the second 800,000 rows.
My problem occurs when I try to run the same package against the tables that already have data in the destination. This part of the package now takes an hour for the 100,000 rows and 2 hours for the 800,000 rows. The indexes on these tables aren't horribly fragmented before the inserts begin and aren't too bad by the time they're done.
I have the destination OLE DB connection set as fast load with a Maximum insert commit size of 20,000 for the smaller table and 80,000 for the larger. I have the three tables being loaded in parallel and there's a connection manager for the source for each task and one for the destination for each task.
I don't think this should be taking this long. There are three destination tables, as I mentioned. The first has 190 million rows (loads from the 100,000 row table, takes an hour), the second has 245 million rows (loads from the 100,000 row table, takes an hour) and the third has 895 million rows (loads from the 800,000 row table, takes 2 hours)
Because they're running in parallel, they take 2 hours total, but that seems far too long. I originally had them in serial and the load took around 2 hours, I would have thought changing it to parallel would have sped things up. I changed the commit size from the max to the values stated above and that didn't help speed either.
Does anyone have any ideas for other things to try or articles to read that might help? I keep reading how fast SSIS is supposed to be, but 2 hours to put under a million rows into a table with only 895 million rows seems awfully slow.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Viewing 15 posts - 1 through 15 (of 40 total)

You must be logged in to reply to this topic. Login to reply

Brandie Tarvin SSC Guru Points: 173061 More actions · Answer 1

For the two instances you don't require any transformations, don't use the Data Flow task. Using a stored procedure in an Execute SQL Task is much faster.

BTW, what type of load are you using on the destination tables?

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 2

Brandie Tarvin (11/3/2011)
For the two instances you don't require any transformations, don't use the Data Flow task. Using a stored procedure in an Execute SQL Task is much faster.
BTW, what type of load are you using on the destination tables?

Really? A T-SQL stored procedure is much faster? Huh. I'll have to give it a try. I thought SSIS was supposed to be fastest for this kind of thing.

I'm not sure what you mean "what type of load". I'm using Table or View - fast load. Is there some other type you're asking about?

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Brandie Tarvin SSC Guru Points: 173061 More actions · Answer 3

Stefan Krzywicki (11/3/2011)
Brandie Tarvin (11/3/2011)
For the two instances you don't require any transformations, don't use the Data Flow task. Using a stored procedure in an Execute SQL Task is much faster.
BTW, what type of load are you using on the destination tables?
Really? A T-SQL stored procedure is much faster? Huh. I'll have to give it a try. I thought SSIS was supposed to be fastest for this kind of thing.

When you use it as an INSERT INTO..., yes, it is. In fact, most everything on SSIS is faster if you use T-SQL in the tasks where you can. Brian Knight (the king of BI) even admitted as much to me once.

Also, if you're running the package in BIDS instead of on the server, BIDS will be X times slower so can't give you a true feel for performance. It's all about network bandwidth and pulling the data down from the source, processing it, and pushing it back to the destination.

I'm not sure what you mean "what type of load". I'm using Table or View - fast load. Is there some other type you're asking about?

Fast Load is exactly what I mean. Table or View (without anything after) is "normal load" which ... Okay, I'm about to say something stupid that someone will probably brain me for, but here's my best understanding of the two different loads.

You know the difference between FULL recovery and BULK-LOGGED? I believe the difference between "Table or View" and "Table or View - Fast Load" is a similar thing. Fast Load, obviously, is better for performance. But it can really bog you down compared to a good ol' fashioned T-SQL INSERT.

Other things that can cause Data Flow performance issues:

1) Choosing more columns in the Source then is actually needed for processing

2) Using Transformations (Lookups, Merge, etc.) that can slow down processing

3) Batch size (the bigger the "rows per batch", the longer it takes)

4) Maximum Insert Commit Size (which should be set at 2147483647, not 0, for SQL 2008). You might fiddle with this a bit to see if a smaller commit size speeds things up for you.

5) Validate External Metadata set to True on Source (although, I prefer to use this setting myself) - Check the Advanced Editor -> Component Properties

6) Data Flow Task's or Connection Manager's DelayValidation property set to True (which you may need for various reasons.

There are a bunch of others, including database issues, table indexes, fragmentation, etc. that can cause issues, but the ones listed above are usually the first ones I check when I'm having issues. The biggest is SSIS choosing more data than it needs and trying to force it through the pipe all at once.

Does that help?

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

ricer Right there with Babe Points: 722 More actions · Answer 4

Is the GUID your clustered index key as well?

If so you are probably seeing lots of page splits when you are doing inserts.

Best way of speeding that up would be to change the clustered key to a different column or even using identity column so all inserts gets appended to the end.

What did you set the fillfactor on the clustered index on those tables?

-------------------------------------------------
Will C,
MCITP 2008 Database Admin, Developer

Brandie Tarvin SSC Guru Points: 173061 More actions · Answer 5

ricer (11/3/2011)
Is the GUID your clustered index key as well?
If so you are probably seeing lots of page splits when you are doing inserts.

I fail to see how having a GUID PK clustered index would automatically cause page splits. Could you please clarify what you mean by this statement?

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 6

Brandie Tarvin (11/3/2011)
Stefan Krzywicki (11/3/2011)
Brandie Tarvin (11/3/2011)
For the two instances you don't require any transformations, don't use the Data Flow task. Using a stored procedure in an Execute SQL Task is much faster.
BTW, what type of load are you using on the destination tables?
Really? A T-SQL stored procedure is much faster? Huh. I'll have to give it a try. I thought SSIS was supposed to be fastest for this kind of thing.
When you use it as an INSERT INTO..., yes, it is. In fact, most everything on SSIS is faster if you use T-SQL in the tasks where you can. Brian Knight (the king of BI) even admitted as much to me once.

Huh. Not what I would have guessed, but that's why you ask people and do the research. 🙂

Also, if you're running the package in BIDS instead of on the server, BIDS will be X times slower so can't give you a true feel for performance. It's all about network bandwidth and pulling the data down from the source, processing it, and pushing it back to the destination.

I'm running it as a package in a job on the server.

I'm not sure what you mean "what type of load". I'm using Table or View - fast load. Is there some other type you're asking about?
Fast Load is exactly what I mean. Table or View (without anything after) is "normal load" which ... Okay, I'm about to say something stupid that someone will probably brain me for, but here's my best understanding of the two different loads.
You know the difference between FULL recovery and BULK-LOGGED? I believe the difference between "Table or View" and "Table or View - Fast Load" is a similar thing. Fast Load, obviously, is better for performance.

That's also what I'd gathered from the reading I'd been doing.

But it can really bog you down compared to a good ol' fashioned T-SQL INSERT.

I look forward to having the chance to test this. I appreciate your giving me something to try.

Other things that can cause Data Flow performance issues:
1) Choosing more columns in the Source then is actually needed for processing

Only bringing back exactly what I need

2) Using Transformations (Lookups, Merge, etc.) that can slow down processing

I do all of this before setting the data into interim storage tables. I'll do that for the third one as well so I can try it with the INSERT INTO

3) Batch size (the bigger the "rows per batch", the longer it takes)

I'd only been setting the Max Commit. I'll set the rows per batch to the same values as the Max Commit and see if it helps.

4) Maximum Insert Commit Size (which should be set at 2147483647, not 0, for SQL 2008). You might fiddle with this a bit to see if a smaller commit size speeds things up for you.
5) Validate External Metadata set to True on Source (although, I prefer to use this setting myself) - Check the Advanced Editor -> Component Properties

I checked and it is set to that.

6) Data Flow Task's or Connection Manager's DelayValidation property set to True (which you may need for various reasons.
There are a bunch of others, including database issues, table indexes, fragmentation, etc. that can cause issues, but the ones listed above are usually the first ones I check when I'm having issues. The biggest is SSIS choosing more data than it needs and trying to force it through the pipe all at once.
Does that help?

If nothing else, you've given me new things to try, which is a great help. Thanks Brandie! I'll let you know how it goes.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 7

ricer (11/3/2011)
Is the GUID your clustered index key as well?
If so you are probably seeing lots of page splits when you are doing inserts.
Best way of speeding that up would be to change the clustered key to a different column or even using identity column so all inserts gets appended to the end.
What did you set the fillfactor on the clustered index on those tables?

Yup, it is. There's no way to make that change. I'm constrained by the structure of the rest of the system.

I left it at the default.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 8

Brandie Tarvin (11/3/2011)
ricer (11/3/2011)
Is the GUID your clustered index key as well?
If so you are probably seeing lots of page splits when you are doing inserts.
I fail to see how having a GUID PK clustered index would automatically cause page splits. Could you please clarify what you mean by this statement?

Because GUIDs are not sequential (at least not in the environment I have to work with), SQL Server has to search for where to place each record as it does the insert. Since this means it isn't simply adding to the end, like it would with IDENTITY, page splits frequently occur.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Brandie Tarvin SSC Guru Points: 173061 More actions · Answer 9

Stefan Krzywicki (11/3/2011)
Brandie Tarvin (11/3/2011)
ricer (11/3/2011)
Is the GUID your clustered index key as well?
If so you are probably seeing lots of page splits when you are doing inserts.
I fail to see how having a GUID PK clustered index would automatically cause page splits. Could you please clarify what you mean by this statement?
Because GUIDs are not sequential (at least not in the environment I have to work with), SQL Server has to search for where to place each record as it does the insert. Since this means it isn't simply adding to the end, like it would with IDENTITY, page splits frequently occur.

Aahhh. <and the light goes on>. Forgot about that. Thank you for clarifying, Stefan.

EDIT: Obviously I don't play with GUIDs very often. @=)

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Jack Corbett SSC Guru Points: 184394 More actions · Answer 10

I have not read the white paper yet (I do plan to) and yes, it is probably going to get you some spam so use your spam email account, but you might want to check out this white paper, http://pragmaticworks.com/downloads/TroubleshootingSSISWhitepaper/, by Mike Davis. The PragmaticWorks guys now their stuff and supposedly this talks about performance.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 11

Brandie Tarvin (11/3/2011)
Stefan Krzywicki (11/3/2011)
Brandie Tarvin (11/3/2011)
ricer (11/3/2011)
Is the GUID your clustered index key as well?
If so you are probably seeing lots of page splits when you are doing inserts.
I fail to see how having a GUID PK clustered index would automatically cause page splits. Could you please clarify what you mean by this statement?
Because GUIDs are not sequential (at least not in the environment I have to work with), SQL Server has to search for where to place each record as it does the insert. Since this means it isn't simply adding to the end, like it would with IDENTITY, page splits frequently occur.
Aahhh. <and the light goes on>. Forgot about that. Thank you for clarifying, Stefan.
EDIT: Obviously I don't play with GUIDs very often. @=)

No problem. For this reason and many others, I wish I didn't have to deal with GUIDs too. I think they're more trouble than they're worth.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Sioban Krzywicki One Orange Chip Points: 27770 More actions · Answer 12

Jack Corbett (11/3/2011)
I have not read the white paper yet (I do plan to) and yes, it is probably going to get you some spam so use your spam email account, but you might want to check out this white paper, http://pragmaticworks.com/downloads/TroubleshootingSSISWhitepaper/, by Mike Davis. The PragmaticWorks guys now their stuff and supposedly this talks about performance.

Cool, I'll take a look.

--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams

Jack Corbett SSC Guru Points: 184394 More actions · Answer 13

Stefan Krzywicki (11/3/2011)
ricer (11/3/2011)
Is the GUID your clustered index key as well?
If so you are probably seeing lots of page splits when you are doing inserts.
Best way of speeding that up would be to change the clustered key to a different column or even using identity column so all inserts gets appended to the end.
What did you set the fillfactor on the clustered index on those tables?
Yup, it is. There's no way to make that change. I'm constrained by the structure of the rest of the system.
I left it at the default.

You could try changing the FILL FACTOR on the clustered index. You might also want to check wait stats to see if there are IO waits because of page splits. Take snapshots before and after and do a diff.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Brandie Tarvin SSC Guru Points: 173061 More actions · Answer 14

I forgot to mention something.

If push comes to shove and you can't use an Execute SQL Task, and absolutely must have a Data Flow Task instead, use a stored procedure as your OLE DB data source instead of a table. It is still a lot faster than pulling your data from a table, view, or plain query in the Source Transformation.

(Change the data access mode to SQL command, and call the proc with EXECUTE <myprocname>)

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.