In the last weblog
post I have talked about the difference of unique and non-unique clustered indexes.
As you have seen SQL Server uses an internal overhead of 4 bytes (the so-called uniquifier)
to make non-unique clustered index rows unique. Today I want to work out the difference
between unique and non-unique non-clustered indexes defined on a table with a unique
clustered index. As you already know SQL Server creates a unique clustered index when
you define the PRIMARY KEY constraint on a table. On the other hand
you can use the CREATE UNIQUE CLUSTERED INDEX statement to create
a unique clustered index on a table. The following listing creates our customers table,
creates a unique clustered index on it, and finally creates one unique- and one non-unique
non-clustered index on that table.
-- Create a table
with 393 length + 7 bytes overhead = 400 bytes
-- Therefore
20 records can be stored on one page (8.096 / 400) = 20,24
CREATE TABLE Customers
(
CustomerID INT NOT NULL,
CustomerName CHAR(100) NOT NULL,
CustomerAddress CHAR(100) NOT NULL,
Comments CHAR(189) NOT NULL
)
GO
-- Create a unique
clustered index on the previous created table
CREATE UNIQUE CLUSTERED INDEX idx_Customers ON Customers(CustomerID)
GO
-- Insert 80.000
records
DECLARE @i INT = 1
WHILE (@i <= 80000)
BEGIN
INSERT INTO Customers VALUES
(
@i,
'CustomerName' + CAST(@i AS CHAR),
'CustomerAddress' + CAST(@i AS CHAR),
'Comments' + CAST(@i AS CHAR)
)
SET @i += 1
END
GO
-- Create a unique
non clustered index on the clustered table
CREATE UNIQUE NONCLUSTERED INDEX idx_UniqueNCI_CustomerID
ON Customers(CustomerName)
GO
-- Create a non-unique
non clustered index on the clustered table
CREATE NONCLUSTERED INDEX idx_NonUniqueNCI_CustomerID
ON Customers(CustomerName)
GO
After the creation
of both non-clustered indexes you can use the DMV sys.dm_db_index_physical_stats to
get some information about the indexes. When you look into the DMV, you can see that
the unique non-clustered index has a record length of 107 bytes and the non-unique
non-clustered index has a record length of 111 bytes. So again, there must be a difference
in the internal storage format of both indexes! Let's analyze it and start with the
unique non-clustered index.
In my case the index
root page of the unique non-clustered index is 4370, so I can dump it out very easily
with the DBCC IND command:
DBCC PAGE(UniqueClusteredIndexStructure_NonClusteredIndex, 1, 4370, 3)
GO
As you can see from
the following figure each index record contains the non-clustered key (which is unique
in this case) – the column CustomerName:
When you examine
the byte by byte representation of the unique non-clustered index record, you can
see that SQL Server uses here the following bytes:
- 1 Byte: Status Bits
- n Bytes: Unique Non-Clustered
Index Key – in this case 100 bytes
- 4 Bytes: PageID
- 2 Bytes: FileID
In sum SQL Server
uses the above mentioned 107 bytes per each index record on each non-leaf level of
the unique non-clustered index. So again, the length of your non-clustered index key
has an impact on how many rows SQL Server can store on an index page. So a CHAR(100) –
like in this example – would be not a very good idea…
When you are walking
down the unique non-clustered index until you reach the leaf-level of the index always
stores the above mentioned 107 bytes per each index row – nothing more. When you finally
dump out the leaf-level of the non-clustered index, you get the following picture:
As you can see here,
SQL Server stores here at the leaf-level directly the clustered key – in our case
the value of the column CustomerID. This value is for SQL Server
the pointer to the corresponding record in the clustered index. With this value in
the hand, SQL Server can now find the record in the clustered index – through a Clustered
Index Seek operator. This is a big difference compared to non-clustered indexes
defined on a heap table. Because in a heap table, SQL Server uses at the leaf-level
the HEAP RID to point DIRECTLY to the corresponding data
page where the record is stored. Therefore SQL Server can directly read the correct
data page without accessing an additional index!
This also implies
that SQL Server can find a record through a non-clustered index on a heap table faster
than a record through a non-clustered index on a clustered table, because SQL Server
don't have to execute the additional Clustered Index Seek operator.
So the correct row can be found with less page reads on a heap table. But please don't
over estimate this detail, and think that you will get a performance benefit by using
non-clustered indexes on heap tables. The fact is that SQL Server always tries to
store the index pages in the Buffer Manager, so it's really very cheap for SQL Server
to do this additional Clustered Index Seek to get the correct record
from the clustered index back.
Let's now analyze
our non-unique non-clustered index. When you dump out the index root page, you can
see that SQL Server stores here the non-clustered index key and also the clustered
index key, which is different from the previous example with the unique non-clustered
index:
SQL Server needs
here the unique clustered index key to make each non-unique non-clustered index key
unique. This behavior is done on EACH LEVEL of the non-unique non-clustered
index, from the index root page down to the leaf-level. This means that you have a
huge storage overhead, because SQL Server stores in EACH INDEX RECORD also
your unique clustered key besides the non-unique non-clustered index key. So when
you have a badly chosen clustered key (like CHAR(100), etc.) it will
even get much more worse for you! When you analyze the index row you can see that
SQL Server uses the following bytes for the storage:
- 1 Byte: Status Bits
- n Bytes: Non-unique
Non-Clustered Index Key – in this case 100 bytes
- n Bytes: Unique Clustered
Index Key – in this case 4 bytes for the integer value
- 4 Bytes: PageID
- 2 Bytes: FileID
When you sum up those
bytes, you will get the 111 bytes mentioned earlier. So please keep this additional
storage overhead in your head when using non-unique non-clustered indexes, because
it impacts your non-clustered indexes on each level! You can download the T-SQL script
for this posting here.
In the
next installment of this series we will have finally a look into the differences of
unique/non-unique non-clustered indexes defined on a non-unique clustered index. Stay
tuned J
-Klaus