Loading to table with clustered PK / Order by in source

Question

Loading to table with clustered PK / Order by in source

Phil Parkin

SSC Guru

Points: 246963
More actions
March 25, 2015 at 4:53 am

#302512

Here's a question which I feel like I should know the answer to.
If I'm using SSIS to load a table with a clustered PK, will the data load faster if the source has been sorted to match the target's clustered PK?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Martin Schoombee SSCoach Points: 19128 More actions · Answer 1

Phil Parkin (3/25/2015)
Here's a question which I feel like I should know the answer to.
If I'm using SSIS to load a table with a clustered PK, will the data load faster if the source has been sorted to match the target's clustered PK?

I don't think it will necessarily load faster, because (imo) the query engine will still include a sort operation in the execution plan of the insert. I do think it could help reduce index fragmentation though.

Would be an interesting scenario to put to the test...

Samuel Vella SSCrazy Eights Points: 8055 More actions · Answer 2

Bulk loads are quickest for tables which are either empty or have no indexes.

If you are inserting a lot of data into an already large table then rebuilding the indexes and PK after a load might not be a good use of CPU resources.

I assume from the nature of your question that the new data your loading is appended to the clustered PK and wouldn't include any data that falls in between existing PK values.

If this is the case it might be worthwhile investigating partitioning strategies. With partitioning you can quickly bulk load into a empty table and then swap the newly populated table in for the next empty partition.

Phil Parkin SSC Guru Points: 246963 More actions · Answer 3

Samuel Vella (3/27/2015)
Bulk loads are quickest for tables which are either empty or have no indexes.
If you are inserting a lot of data into an already large table then rebuilding the indexes and PK after a load might not be a good use of CPU resources.
I assume from the nature of your question that the new data your loading is appended to the clustered PK and wouldn't include any data that falls in between existing PK values.
If this is the case it might be worthwhile investigating partitioning strategies. With partitioning you can quickly bulk load into a empty table and then swap the newly populated table in for the next empty partition.

I am loading empty work tables in a DW from a staging database. As all of the data is within SQL Server, sorting the source data is trivial. After this load is complete, I MERGE the data into the target tables. I need the PK to be in place to assist the MERGE.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin