Order of Clustered Index

  • Mike Scalise - Sunday, September 2, 2018 10:18 AM

    Sorry for the delay. My wife went into labor a week early so I've been busy with newborn things for my first child 🙂

    Totally missed this first time around.  Congratulations to you both!  Actually, all 3 of you! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The uniquifier

    While reading the uniquifier does not do a lot for efficiency. It is not part of the selection mechanism, it is not part of the ordering mechanismn. So rows are read and the uniquifier only makes the row a tad longer.

    On mutations of the row (updates, deletes), the uniquifier directly identifies the rows, so the rows not touched by the mutation are not 'touched' in the process. Offcourse the rows have to be read first and then a selection on the rows to be mutated has to be made, but the selection does not have to be on key fields. After the selection of the rows, the uniquifier can and is used to determine which rows should be touched. Especially if there are a large number of duplicates this does make a difference.

    We use almost exclusively clustered tables, almost always starting with a clientnumber as the first field. And then some other fields. Some tables the clustered key is unique, but sometimes it is not. For example when a client can have 'the same row' in different versions. Because of the clientnumber in front of almost all clustered keys, data of a client can be accessed efficiently (fast and with a small cache footprint), joins are performed efficiently, because multiple tables have the same cluster start (clientnumber), and SQL-server takes care that when there are duplicate indexes for a client that each duplicate gets an uniquifier.
    In most cases the clustered key is unique so no uniquifier is needed. Making sure that the clustered key is always unique would be expensive for a number of tables, because of the number of fields to be added to the clustered key, or and identity has te be created and added to the clustered key (at the end).
    So the uniquifier works far better, only there when it is needed. No real impact on reading or sorting, but for all mutations the rows are unique so they can be accessed 'directly'.

    Ben

    In the B-tree the unique identifier is needed and used to give an order to the rows. If a clustered index value is spread over more nodes, the unique identifier is needed to locate the correct node. (For mutations, for read actions all rows over multiple nodes have to be read).

  • Thanks for the kind words, guys!

    Mike Scalise, PMP
    https://www.michaelscalise.com

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply