Did you know that whatever columns you pick as your clustered index will be included in any non clustered indexes on the same table? But don’t take my word for it. Let’s take a look!
First things first I’m going to use some AdventureWorks2012 tables to make a test table.
-- Create a convinent composite table SELECT Pers.BusinessEntityID, Pers.Title, Pers.FirstName, Pers.MiddleName, Pers.LastName,Addr.AddressLine1, Addr.AddressLine2, Addr.City, Addr.StateProvinceID, Addr.PostalCode, Addr.SpatialLocation INTO People1 FROM AdventureWorks2012.Person.Address Addr JOIN AdventureWorks2012.Person.BusinessEntityAddress BEA ON Addr.AddressID = BEA.AddressID JOIN AdventureWorks2012.Person.Person Pers ON BEA.BusinessEntityID = Pers.BusinessEntityID GO -- Add indexes including a non-unique (important later) clustered index CREATE CLUSTERED INDEX ix1_People1 ON People1(AddressLine1, AddressLine2, City, StateProvinceID, PostalCode) CREATE INDEX ix2_People1 ON People1(BusinessEntityID) CREATE INDEX ix3_People1 ON People1(LastName, FirstName, MiddleName) GO
The clustered index (CI) is on the 5 address columns and there are non-clustered indexes (NCI) on the BusinessEntityID and the 3 name columns. We can look at the structure of a page from one of the indexes by using sys.dm_db_database_page_allocations and DBCC PAGE (links are in the code below).
-- Get the page id for NCI ix2_People1 -- Info on sys.dm_db_database_page_allocations: --http://www.jasonstrate.com/2013/04/a-replacement-for-dbcc-ind-in-sql-server-2012/ SELECT indexes.name, indexes.index_id, indexes.type_desc, pages.allocated_page_file_id, pages.allocated_page_page_id, pages.is_iam_page FROM sys.indexes JOIN sys.dm_db_database_page_allocations(DB_ID(), OBJECT_ID('People1'), NULL, NULL, NULL) pages ON indexes.object_id = pages.object_id AND indexes.index_id = pages.index_id WHERE indexes.name = 'ix2_People1'
In order to use DBCC PAGE I need the file id and the page id. I’m using page 288 instead of 638 because page 638 is the IAM page. All of my pages are in file 1 (I only have the one data file).
-- View index page -- Info on DBCC PAGE: --http://www.sqlskills.com/blogs/paul/inside-the-storage-engine-using-dbcc-page-and-dbcc-ind-to-find-out-if-page-splits-ever-roll-back/ -- Turn traceflag 3604 on so we can see the results DBCC TRACEON (3604); -- Take a look at the contents of one of the index pages DECLARE @DBID int SET @DBID = DB_ID() DBCC PAGE(@DBID, 1, 288, 3)
You can see the first 9 (of 217) rows in the page in the image above. You can see that while the BusinessEntityID is the only column I indexed on (and the only one that will show up if you look at sp_helpindex or anything similar) there are actually 6 additional columns in the index. The 5 columns from the CI and the UNIQUIFIER column. In case you are interested the UNIQUIFIER is added any time you have a CI that is non-unique (which is why I deliberately made this one non-unique).
I’m going to stop here and point out that I created a clustered index not a primary key. A primary key is by default a unique clustered index but it doesn’t have to be. It has to be unique but not clustered. Because it has to be unique if you create a primary key for your CI then you won’t see the uniquifier column when you look at the page information.
I deliberately created a long clustered index on columns that probably shouldn’t be used as the clustered index to demonstrate a couple of points. First your CI choice is going to affect the size of your indexes.
To start I’m going to create another table (exactly the same) with a different, smaller CI.
-- Create a convinent composite table SELECT Pers.BusinessEntityID, Pers.Title, Pers.FirstName, Pers.MiddleName, Pers.LastName,Addr.AddressLine1, Addr.AddressLine2, Addr.City, Addr.StateProvinceID, Addr.PostalCode, Addr.SpatialLocation INTO People2 FROM AdventureWorks2012.Person.Address Addr JOIN AdventureWorks2012.Person.BusinessEntityAddress BEA ON Addr.AddressID = BEA.AddressID JOIN AdventureWorks2012.Person.Person Pers ON BEA.BusinessEntityID = Pers.BusinessEntityID GO -- Add indexes including a non-unique (important later) clustered index CREATE CLUSTERED INDEX ix1_People2 ON People2(BusinessEntityID) CREATE INDEX ix2_People2 ON People2(AddressLine1, AddressLine2, City, StateProvinceID, PostalCode) CREATE INDEX ix3_People2 ON People2(LastName, FirstName, MiddleName) GO
I’m now going to use a modified version of a query I got off of Basit’s(b/t) blog.
Note that ix1 in both cases should be about the same. The CI IS the table. It contains all of the data for the table so there shouldn’t be any significant change in size. The second index (ix2) is also going to be about the same size. I just swapped the two sets of columns so both ix2s are going to contain BusinessEntityID, AddressLine1, AddressLine2, City, StateProvinceID, and PostalCode. The third index (ix3) on the other hand should show a fairly significant difference. The table People1 will have the columns LastName, FirstName, and MiddleName & AddressLine1, AddressLine2, City, StateProvinceID, and PostalCode while the table People2 will have columns LastName, FirstName, and MiddleName & BusinessEntityID.
SELECT OBJECT_NAME(i.object_id) AS TableName, i.[name] AS IndexName ,SUM(s.[used_page_count]) * 8 AS IndexSizeKB FROM sys.dm_db_partition_stats AS s INNER JOIN sys.indexes AS i ON s.[object_id] = i.[object_id] AND s.[index_id] = i.[index_id] WHERE OBJECT_NAME(i.object_id) IN ('People1','People2') GROUP BY OBJECT_NAME(i.object_id), i.[name] ORDER BY OBJECT_NAME(i.object_id), i.[name] GO
So exactly what I expected. ix1 & ix2 are about the same size in both tables. However ix3 for table People1 is about three times the size of ix3 on People2. Not a big deal with a small table with only 3 indexes. You get to a mm row table with 5 or 6 NCIs it could get rather significant.
Now on an up note with a larger clustered index you do get increased coverage.
SELECT FirstName, LastName, AddressLine1, AddressLine2, City, StateProvinceID, PostalCode FROM People1 WHERE LastName LIKE 'A%' AND AddressLine2 IS NOT NULL SELECT FirstName, LastName, AddressLine1, AddressLine2, City, StateProvinceID, PostalCode FROM People2 WHERE LastName LIKE 'A%' AND AddressLine2 IS NOT NULL
I realize it’s a bit of a goofy query but it does demonstrate the point. In People1 where the CI contains the address information the optimizer was able to use ix3 as a covering index. In People2 where the CI is the BusinessEntityID the optimizer had to use both ix2 and ix3 and ended up taking 95% of the combined time of the two queries. Since the columns in the CI are in all indexes they can always be used when determining if the index covers a query.
Now in my opinion these are not primary reasons for picking out a clustered index. They are more consequences of a CI choice. Important consequences admittedly. Hopefully though, this does point out some of the reasons why picking out the CI for a table is at once very important and very tricky.
Filed under: Index, Microsoft SQL Server, SQLServerPedia Syndication Tagged: index, microsoft sql server