Introduction to Indexes: Part 2 – The clustered index

  • GilaMonster (3/11/2010)


    if it's a vendor app that you can't change, then that's pretty much that, you can't change it.

    Well, we can't change the width or contents of that field, but we can make its index clustered or not, and we might be able to find a smaller field to use as a clustered index.

    Something else has started bothering me: If there's a clustered index, then all nonclustered indexes must use it to find the record? That seems to be saying that SQL Server essentially has to do two separate index lookups. That would seem to imply that to avoid the extra work, maybe it's better to have no clustered index, in which case SQL Server can always use the row ID to go quickly and directly to the row without another lookup. Is there something wrong with this logic?

  • you can add included columns to your nonclustered indexes to minimize RID Lookups

  • wodom (3/11/2010)


    That would seem to imply that to avoid the extra work, maybe it's better to have no clustered index, in which case SQL Server can always use the row ID to go quickly and directly to the row without another lookup. Is there something wrong with this logic?

    If there's no cluster then SQL has to do a RID lookup to find the actual row. That's just about the same amount of work as for a clustered index seek. They're both very expensive operations and are done row by row.

    See part 3 for a discussion of nonclustered indexes. I think I talk about covering indexes in there.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Yes I read part 3, and covering indexes are a good thing for us to think about.

    But it's hard for me to imagine how traversing a (clustered) B-tree in a large table, which surely involves reading several pages which may be widely spaced, is just about the same as following a pointer which tells the system exactly which page to read without any other pages being accessed.

    The only way I can see a near-equivalency here is if many or most of the clustered tree pages are cached in memory, but even so, that only takes you so far.

  • It won't be several unless you're dealing with many millions of pages in the index. The upper levels of indexes are very small in relation to the leaf level and are in general in cache all the time (they're hot, frequently used pages).

    To give you an idea, let's take a 1 million row table with rows 8000 bytes each (so 1 to a page) and a 100 byte clustered index key (which is a huge clustered index key). (ignoring row headers and the like for a rough calc)

    Leaf level = 1 000 000 pages

    Level 1 = 12 000 pages

    Level 2 = 153 pages

    Level 3 = 2 pages

    Level 4 = 1 page - root.

    So it's a 5 level index for a million row table with huge rows and a rather large clustered index key. It's not 5 times less efficient than a RID lookup, because the upper levels will likely be in cache (where physical position on disk is irrelevant). It is a little less efficient, which is another reason why the small clustered index key guideline exists. The smaller the clustering key, the shallower the index tree and the more efficient the index seeks (which is what a lookup is) will be.

    Feel free to test it out. If you find that a heap is more efficient, and that couple milliseconds gained is critically important, and the downsides of a heap are not relevant, then go ahead and drop your cluster. There are specific cases (insert-heavy tables usually) where having no cluster is beneficial. It's not good in general though.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • OK now I have a better idea of what you meant. The depth of the tree is obviously very important.

    You referred to the downsides of heaps, and I wasn't sure what those were, other than some data & index fragmentation (which might be more or less cancelled out by cacheing). In a couple of cases in your 3-part article, heaps compared favorably to clustered DB's. However I did a search or two of this site, and came up with information that clarified it:

    http://www.sqlservercentral.com/blogs/aloha_dba/archive/2009/2/13/which-is-better-heaps-or-tables-with-clustered-indexes.aspx

    http://technet.microsoft.com/en-us/library/cc917672.aspx

    Thanks for your information.

  • Out of sheer curiosity: you mention that a hidden 4-byte column is added to the clustered index key to make it unique in the event that it is not. Let's say we had a really, really badly designed clustered index where there were more than 2^32 rows with the same key--I realise this is vanishingly unlikely in the real world. What would actually happen? Would SQL report an error, and if so, what?

  • It would throw an error.

    Error 666, Severity 16

    The maximum system-generated unique value for a duplicate group was exceeded for index with partition ID <Partition ID>. Dropping and re-creating the index may resolve this; otherwise, use another clustering key.

    http://robboek.com/2009/02/13/sql-server-uniqueifier-values-and-the-number-of-the-beast/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The guys who write SQL Server think of everything, it seems!

  • Thanks for writing this series, as apparent with all the responces, there is a lake of this knowledge in the development world.

    I was once like some of these respondees in my want to understand indexes and how they work.

    You did an excellent job of firing the minds of the SQL Server community. Especialy responding the all the questions.

    I realy like the covering index with include columns responce to the SQLGuru's wanting more than one clustered index. That was classic. 🙂

    Thanks and God Bless,

    Thomas

    TheSmilingDBA

    Thomas LeBlanc, MVP Data Platform Consultant

  • Nice 😎

  • Excellent article Gail! Very simple and easy to understand with the fundamentals that add up to a good understanding of the subject.

    These articles should be required reading for both newbies and old hands alike.

    Todd Fifield

  • Hi Gail,

    When you're performing an update on a non-clustered index column on a table with clustered index (CI), the execution plan indicated that CI is being Updated.

    Do you know why this behaviour happens, and is CI is really being updated although you're not updating the CI column?

    Thanks,

    Simon

    Simon Liew
    Microsoft Certified Master: SQL Server 2008

  • The clustered index is been updated. While you may not be updating the clustered index key, as I mentioned in the article "A clustered index is an index where the leaf level of the index contains the actual data rows of the table".

    Since the actual data rows are at the leaf of the clustered index, the cluster will be affected by all updates against that table. In essence, it is the table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • FYI,

    3 columns with an index on each one separately produced and index intersection plan on my 2008, unlike what the article said! - this is in reference to part 3.


    Kindest Regards,

    R

Viewing 15 posts - 91 through 105 (of 122 total)

You must be logged in to reply to this topic. Login to reply