Introduction to Indexes: Part 2 – The clustered index

  • Thanks for the reply Gail, I may indeed start a new post topic as I would like to understand the unused space issue better. Would you mind expanding on your comment that the problem is not SQL 2000 specific, is it fixed in SQL 2008 for instance?

    Cheers,

    Mark.

  • mark.stringer (11/12/2009)


    Would you mind expanding on your comment that the problem is not SQL 2000 specific, is it fixed in SQL 2008 for instance?

    Technically, it's not a bug. It's a side-effect of some of the architectural details of a heap and how it reuses space. It is not version-specific. (at least not in the recent versions)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I second that! Thanks Gail. I can't wait to read Part 3.

  • In OLTP environment, where read is more frequent than write, the Cluster index is your biggest gun and your major weapon. Over the decades its best usage is well known to most.

    If your cluster is like a giant sword and your other indexes are small knives, it will not be wise to use the giant sword to cut vegetables and use your small knives to defend yourself. Of course there are exceptions but to say I prefer to use that giant sword to do cut up small things or not use it at all or “organize” so it looks pretty just seems a bit over the top.

    Your statement for page splits, direct updates, and narrow index does not negate this.

    So it won’t be a totally negative criticism, I would say you should attempt to make one (as there can be but one) of your most frequent or most important (depends on your need) “to be” your cluster and you’ll be surprised with the result. Using non-intelligent keys, fix length, and not null defaulted columns should help you eliminate the page splits, direct updates, and narrow index. Of course there are exceptions, but in performance tuning it is always the maximum bang for the buck wins the day.

  • Then we are going to have to agree to disagree on this. I understand your approach and I've used it before but in general I prefer the one that I described in the article

    Generally, for OLTP (where there are lots of small queries run frequently and lots of data modification), I would want 2-3 (rarely more than 5) small indexes to handle those queries. Since the clustered index is the biggest index, I prefer to use nonclusters where possible. Won't be noticeable for singleton seeks, but range scans it usually is, if the nonclusters are covering and the queries are only returning small portions of the columns. Now, if I can get the cluster onto the column that's the most frequent access path (often the pk) and have the other considerations satisfied then that's a major win.

    A data warehouse, where reads greatly exceed writes and queries are very complex has other rules for indexing. With a data warehouse I'll happily put the cluster onto the most common access path regardless of what that is, because page splits and split updates don't matter so much if the data's read-only for the majority of the time.

    I said nothing about making the table look good. I don't care how it looks. I care that it's not having page splits frequently (very expensive operations), I care that the cluster is not deeper than necessary (slowing down clustered index seeks and lookups). I care that my nonclusters are not deeper than necessary. I care that rows are not moving all over the place (requiring an update to be split into a delete/insert pair, again slower than necessary).

    As I said, we're going to have to agree to disagree here.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail

    Very interesting and useful article on clustered index architecture. I've recently noticed a clustered index in our system with a high (43%) level of fragmentation. It's based on an identity column which is the PK, so fits your qualifications for a CI pretty well. We don't update the key column but rows can be deleted. Would this deletion be the cause of the fragmentation?

    Also, how is row space allocated on the page? If a row expands beyond the space available (e.g. due to updating varchar columns) can this cause splits & thus fragmentation?

    John

  • Johnc (11/19/2009)


    We don't update the key column but rows can be deleted. Would this deletion be the cause of the fragmentation?

    No. Deletes can't cause fragmentation. They can result in pages that are partially (mostly) empty, but they cannot cause logical fragmentation

    If a row expands beyond the space available (e.g. due to updating varchar columns) can this cause splits & thus fragmentation?

    Absolutely yes. The rows are stored at the leaf level of the clustered index, if a row grows (update of varchar or a null to non-null value) and the new row won't fit, it will cause a page split, resulting in fragmentation.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Great Article, Gail. Keep up the good work!

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • OK that makes sense. What about free space? I assume SS allocates a default % headroom to each row to allow for growth and that the % is adjustable so we could reduce the fragmentation caused by updates to column data?

  • Johnc (11/27/2009)


    What about free space? I assume SS allocates a default % headroom to each row to allow for growth and that the % is adjustable so we could reduce the fragmentation caused by updates to column data?

    It doesn't unless you tell it to. Free space on a page increases the number of IOs required to get a range of rows. Can do, but it's a trade off, page splits vs larger than necessary tables.

    The setting is Fill Factor.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail;

    I am not sure but another issue on FILL FACTOR. If you do not spacify PAD_INDEX then I discovered that the mechanism will not create a fill factor by the specified %age.

  • Would you like to expand on that? (details or example)

    PAD_INDEX controls whether the fill factor applies to just the leaf levels or the entire index (leaf and non-leaf levels)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Can do, but it's a trade off, page splits vs larger than necessary tables.

    True, but if you know there will be updates to varchars I guess it's a choice between using a fill factor or using chars instead of varchars?

  • Not any update to a varchar, one that increases the size of the varchar, hence increasing the size of the row.

    It's an option, just not a free one. Tradeoffs, just like with virtually everything else.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • bob 54859 (11/12/2009)


    Of course, dev manager vs architect... you know who wins. That's ok, I have bigger fish to fry. 😉

    BOb

    Actually, I don't know who wins. It will depend on whether the dev manager is (a) a real manager or (b) the result of the Peter principle, and in case (a) also on whether the architect is (a1) a real architect or (a2) the result of a post Peter "lateral promotion", and in case (a1) it will also depend on whether there is time for proper discussion and if there isn't on the strength of each person's convictions.

    Of course in case (a1) - the only sane case - the idea that there is a winner and a loser doesn't occurr to anyone: the only thing consideration is what is best for the project, for the whole team; so the answer to "who wins" is "eveyone".

    bob 54859 (11/11/2009)


    I'm am scared that the answer will be "it depends".

    For your particular question, of course, the answer realy IS "it depends" so all your fears have come true! - it depends on whether cost of the bookmark lookups caused by having by the clustered index added to the cost of the extra space required by the (meaningless) identity column outweigh the cost of having to update other indexes when records change when you don't have a clustered index.

    Tom

Viewing 15 posts - 61 through 75 (of 122 total)

You must be logged in to reply to this topic. Login to reply