Beyond clustered then nonclustered, does index creation order matter?

  • I'm truncating/reloading a 50M row reporting table every night. My index creation alone is taking 40 minutes. I'm looking to reduce the time.

    It sounds basically accepted that a clustered index should be the first index on a table (and that it should exist before the data is even loaded into the table). What's not discussed is what order the nonclustered indexes should be added.

    I noticed that if I add an index using (COL1, COL2), it takes a little over 2 minutes. If I add the exact same index again, it takes 45 seconds! Sounds like SQL Server is consulting my new index to speed the creation of the next index.

    Based on this observation, do I rightly conclude that I should create by "broad" indexes first, and then create my smaller ones? Say, create an index on (COL1, COL2, COL3) and THEN create an index on (COL2)?

    Thanks for any help,

    Sam

  • Sam Phillips (6/30/2010)


    Sounds like SQL Server is consulting my new index to speed the creation of the next index.

    It can. It'll use the best option it has for getting all the data it needs for an index.

    Based on this observation, do I rightly conclude that I should create by "broad" indexes first, and then create my smaller ones? Say, create an index on (COL1, COL2, COL3) and THEN create an index on (COL2)?

    Play around and see. It may not make much of an impact, or it may. Depends on table size, index, other stuff.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Sam Phillips (6/30/2010)


    It sounds basically accepted that a clustered index should be the first index on a table (and that it should exist before the data is even loaded into the table).

    Let's break this down into two parts:

    1. Clustered index should be first: absolutely. Because, once you create a clustered index, all other existing indexes need to be rebuilt to reference the CI columns as it's pointer to the row.

    2. If you can insert into the table in the order that the CI will be, then it may be faster to do the insert first, then add the CI afterwards. You'll have to test this. Also, even if you can't insert in this order, it may still be faster to build the index after inserting into the table - sort it all once instead of for every record being added. Again, this will need to be tested.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Historically, I've had better luck with the clustered index pre-existing...

    I tried loading up the table "in order", and then adding the clustered index, but it performed worse.

    Shame the executition plan of "create index" doesn't mention wether other indexes are consulted.

    Thanks for the thoughts,

    Sam

  • Sam Phillips (6/30/2010)


    Shame the executition plan of "create index" doesn't mention wether other indexes are consulted.

    It does. See below. Index1 was a table scan, Index2 SQL used an existing index to create the new one.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • No, I meant for my non-clustered index creation statements (put on after my data is already loaded up).

    Thanks anyway,

    Sam

  • Sam Phillips (6/30/2010)


    No, I meant for my non-clustered index creation statements (put on after my data is already loaded up).

    Yes, so did I.

    Both of those exec plans were nonclustered indexes created on a populated table. One SQL used a table scan to get the data, one it scanned another nonclustered index.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Waddaya know! That's cool. I did the "estimated plan" (see attached). There's not much too it, but the "actual plan" does say a lot more.

    Shame the estimated plan doesn't work (I feel I'm about to be corrected again). 😛

    Any idea why my "actual plan" says "MAXDOP 1"? My server settings are set to MAXDOP 4.

    Thanks for the help,

    Sam

  • Nevermind...

    Per the BOL:

    Parallel index operations are available only in SQL Server Enterprise, Developer, and Evaluation editions.

    🙁

    I'm using SQL Standard.

    Newman!

    Sam

  • Sam Phillips (6/30/2010)


    Shame the estimated plan doesn't work (I feel I'm about to be corrected again). 😛

    It usually works a lot better than it does here.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply