Loading to table with clustered PK / Order by in source

  • Here's a question which I feel like I should know the answer to.

    If I'm using SSIS to load a table with a clustered PK, will the data load faster if the source has been sorted to match the target's clustered PK?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (3/25/2015)


    Here's a question which I feel like I should know the answer to.

    If I'm using SSIS to load a table with a clustered PK, will the data load faster if the source has been sorted to match the target's clustered PK?

    I don't think it will necessarily load faster, because (imo) the query engine will still include a sort operation in the execution plan of the insert. I do think it could help reduce index fragmentation though.

    Would be an interesting scenario to put to the test...

  • Bulk loads are quickest for tables which are either empty or have no indexes.

    If you are inserting a lot of data into an already large table then rebuilding the indexes and PK after a load might not be a good use of CPU resources.

    I assume from the nature of your question that the new data your loading is appended to the clustered PK and wouldn't include any data that falls in between existing PK values.

    If this is the case it might be worthwhile investigating partitioning strategies. With partitioning you can quickly bulk load into a empty table and then swap the newly populated table in for the next empty partition.

  • Samuel Vella (3/27/2015)


    Bulk loads are quickest for tables which are either empty or have no indexes.

    If you are inserting a lot of data into an already large table then rebuilding the indexes and PK after a load might not be a good use of CPU resources.

    I assume from the nature of your question that the new data your loading is appended to the clustered PK and wouldn't include any data that falls in between existing PK values.

    If this is the case it might be worthwhile investigating partitioning strategies. With partitioning you can quickly bulk load into a empty table and then swap the newly populated table in for the next empty partition.

    I am loading empty work tables in a DW from a staging database. As all of the data is within SQL Server, sorting the source data is trivial. After this load is complete, I MERGE the data into the target tables. I need the PK to be in place to assist the MERGE.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply