Doubt on clustered index

  • Hi There,

    Cluster index stores the data on the leaf level. But non-clustered index will have the pointer to the data, but it stores the key columns and included columns in the leaf level.

    My doubt is, hence CI stores the data in leaf, will CI occupies any extra space other then actual data size?

  • My understanding is the the Clustered index defines how the data is physically stored on the drive and changes to values in the indexed columns could force the data to be re-written to a different location. There must be some small overhead to store the metadata about the columns in the CI; however if you have a CI on fields that are not sequentially inserted then the space required will be larger than the data because of page fill % and page splits. E.g. if your orders use a CI in date order, then orders are generally added sequentially so are added to the end of the index. Page fill can be very high (95% plus) and page splits will be rare. If your CI is by customer then page splits will be more common and page fill will need to be set much lower (e.g. 70%) and there will be more demand to re-index to clean up the leaf mapping.

    Think if the indexes literally as an index card box. You can only get 10 cards in a box.

    BOB/1001/7th Jun

    BOB/1250/9th Jul

    BOB/1251/10th Jul

    CHRIS/1055/8th Jun

    CHRIS/1056/10th Jun

    JOHN/1032/7th Jun

    JOHN/1033/7th Jun

    JOHN/1078/10th Jun

    JOHN/1083/11th Jun

    JOHN/1122/20th Jun

    Now if John places a new order, that's fine, it will go into a new box as this box is full

    If Chris places a new order, we can't insert it into the current box, but we need to do some maintenance. We will start a new box, put John' orders in the new box and add Chris's order to box 1. The space required to store 11 records is the same as the space to store 20 records. There is a significant cost involved in introducing a new box and moving records to it. Overall we will usually have some spare space. Lets say that 800 records are actually spread across 10 boxes which represents a page fill of 80%. We could 'reindex' and shuffle all the records up so that they take less space, but as soon as a new record needs to be inserted, we have the overhead of creating a new box. Instead we could look at our performance and make a decision that housekeeping (while the system is quiet) will create new boxes whenever a box is more that 60% full so that record inserts rarely require us to make a new box on-the-fly

    Alternatively if the records were indexed by order number or date (and time) then we would only ever be adding records to the end of the index so the cost of introducing a new box is significantly lower.

    I'm not sure if this answers your question; the amount of space required for the CI is dependent on the page fill % and the nature of the insert/update activity on the CI fields. The allocated space will [almost] always be larger than the actual data. Non CI indexes suffer from the same issue but will only hold the data for the index (and covering) fields.

  • The clustered index has the non-leaf levels that are additional to the data. They're small in comparison.

    The clustered index does not enforce the physical order of pages within the data file, much less data on the disk. It enforces the logical order of the pages. The logical and physical order within the data file may match or may not. The order on disk depends on what the OS does with the data files and on the underlying storage as well

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply