November 27, 2014 at 4:09 am
Hi There,
Cluster index stores the data on the leaf level. But non-clustered index will have the pointer to the data, but it stores the key columns and included columns in the leaf level.
My doubt is, hence CI stores the data in leaf, will CI occupies any extra space other then actual data size?
November 27, 2014 at 4:43 am
My understanding is the the Clustered index defines how the data is physically stored on the drive and changes to values in the indexed columns could force the data to be re-written to a different location. There must be some small overhead to store the metadata about the columns in the CI; however if you have a CI on fields that are not sequentially inserted then the space required will be larger than the data because of page fill % and page splits. E.g. if your orders use a CI in date order, then orders are generally added sequentially so are added to the end of the index. Page fill can be very high (95% plus) and page splits will be rare. If your CI is by customer then page splits will be more common and page fill will need to be set much lower (e.g. 70%) and there will be more demand to re-index to clean up the leaf mapping.
Think if the indexes literally as an index card box. You can only get 10 cards in a box.
BOB/1001/7th Jun
BOB/1250/9th Jul
BOB/1251/10th Jul
CHRIS/1055/8th Jun
CHRIS/1056/10th Jun
JOHN/1032/7th Jun
JOHN/1033/7th Jun
JOHN/1078/10th Jun
JOHN/1083/11th Jun
JOHN/1122/20th Jun
Now if John places a new order, that's fine, it will go into a new box as this box is full
If Chris places a new order, we can't insert it into the current box, but we need to do some maintenance. We will start a new box, put John' orders in the new box and add Chris's order to box 1. The space required to store 11 records is the same as the space to store 20 records. There is a significant cost involved in introducing a new box and moving records to it. Overall we will usually have some spare space. Lets say that 800 records are actually spread across 10 boxes which represents a page fill of 80%. We could 'reindex' and shuffle all the records up so that they take less space, but as soon as a new record needs to be inserted, we have the overhead of creating a new box. Instead we could look at our performance and make a decision that housekeeping (while the system is quiet) will create new boxes whenever a box is more that 60% full so that record inserts rarely require us to make a new box on-the-fly
Alternatively if the records were indexed by order number or date (and time) then we would only ever be adding records to the end of the index so the cost of introducing a new box is significantly lower.
I'm not sure if this answers your question; the amount of space required for the CI is dependent on the page fill % and the nature of the insert/update activity on the CI fields. The allocated space will [almost] always be larger than the actual data. Non CI indexes suffer from the same issue but will only hold the data for the index (and covering) fields.
November 27, 2014 at 4:52 am
The clustered index has the non-leaf levels that are additional to the data. They're small in comparison.
The clustered index does not enforce the physical order of pages within the data file, much less data on the disk. It enforces the logical order of the pages. The logical and physical order within the data file may match or may not. The order on disk depends on what the OS does with the data files and on the underlying storage as well
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply