In the last blog post I have talked about unique/non-unique clustered indexes on a heap
table. A table without a clustered index is called a heap table in SQL Server.
When you define a clustered index on such a table, the table data gets structured
and is therefore referred as clustered table. In this blog post I
want to talk about the differences in unique and non-unique clustered indexes, and
what are the storage impacts between those 2 types of clustered indexes.
As a prerequisite I assume that you have a basic understanding of clustered indexes,
and that you know the difference between heap and clustered tables, and how your data
pages are structured when a clustered index is defined on a table.
Let's start by looking on a unique clustered index. With SQL Server you have several
possibilities to define a unique clustered index. The first way – the easy one – is
to define a PRIMARY KEY constraint on a column. SQL Server enforces
this PRIMARY KEY constraint through the creation of a unique clustered
index on that table and that column. The another option is to create a unique clustered
index through the CREATE CLUSTERED INDEX statement – but when you
don't specify the UNIQUE property, SQL Server will create a non-unique
clustered index by default for you! The following code fragment creates the Customers table
that you already know from the previous blog posting, but this time we create a PRIMARY
KEY constraint on the column CustomerID. Therefore SQL Server
creates a unique clustered index on that table and sorts the data pages in the leaf
level according the values in the column CustomerID.
--
Create a table with 393 length + 7 bytes overhead = 400 bytes
-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24
CREATE TABLE Customers
(
CustomerID INT NOT NULL PRIMARY KEY IDENTITY(1, 1),
CustomerName CHAR(100) NOT NULL,
CustomerAddress CHAR(100) NOT NULL,
Comments CHAR(189) NOT NULL
)
GO
-- Insert 80.000 records
DECLARE @i INT = 1
WHILE (@i <= 80000)
BEGIN
INSERT INTO Customers VALUES
(
'CustomerName' + CAST(@i AS CHAR),
'CustomerAddress' + CAST(@i AS CHAR),
'Comments' + CAST(@i AS CHAR)
)
SET @i += 1
END
GO
After we have identified the index root page (through the use of the DBCC
IND command), we can dump out that page with the DBCC PAGE command.
In my case the index root page is 775:
DBCC PAGE(UniqueClusteredIndexStructure, 1, 775, 3)
GO
As you can see from the following figure each index record contains the clustered
key, in this case the value of the column CustomerID.
When you examine the byte by byte representation of a clustered index record, you
can see that SQL Server uses here the following bytes:
- 1 byte: Status Bits
- n bytes: Clustered Key – in this case 4 bytes
- 4 bytes: PageID
- 2 bytes: FileID
As you can see the length of the clustered key has a direct relationship of the length
of an index record. This mean as smaller your clustered key is, the more index record
can be put onto an index page, and therefore your clustered index will be much more
compact and will perform faster and are easier to maintain. When you walk down your
clustered index you will see that all intermediate levels have the same storage format
as described above. There are no differences on each level, expect the index leaf
level, because this level contains your actual logically ordered data pages.
Let's have now a look onto non-unique clustered indexes in SQL Server and how they
differ from unique clustered indexes. To demonstrate this kind of indexes, I have
just recreated the Customers table and created a non-unique clustered
index on that table through the CREATE CLUSTERED INDEX statement:
--
Create a table with 393 length + 7 bytes overhead = 400 bytes
-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24
CREATE TABLE Customers
(
CustomerID INT NOT NULL,
CustomerName CHAR(100) NOT NULL,
CustomerAddress CHAR(100) NOT NULL,
Comments CHAR(181) NOT NULL
)
GO
-- Create a non unique clustered index
CREATE CLUSTERED INDEX idx_Customers_CustomerID
ON Customers(CustomerID)
GO
Finally I have inserted 80.000 records, where the column CustomerID (the
clustered key) is not unique anymore:
--
Insert 80.000 records
DECLARE @i INT = 1
WHILE (@i <= 20000)
BEGIN
INSERT INTO Customers VALUES
(
@i,
'CustomerName' + CAST(@i AS CHAR),
'CustomerAddress' + CAST(@i AS CHAR),
'Comments' + CAST(@i AS CHAR)
)
INSERT INTO Customers VALUES
(
@i,
'CustomerName' + CAST(@i AS CHAR),
'CustomerAddress' + CAST(@i AS CHAR),
'Comments' + CAST(@i AS CHAR)
)
INSERT INTO Customers VALUES
(
@i,
'CustomerName' + CAST(@i AS CHAR),
'CustomerAddress' + CAST(@i AS CHAR),
'Comments' + CAST(@i AS CHAR)
)
INSERT INTO Customers VALUES
(
@i,
'CustomerName' + CAST(@i AS CHAR),
'CustomerAddress' + CAST(@i AS CHAR),
'Comments' + CAST(@i AS CHAR)
)
>
SET @i += 1
END
GO
When you now dump out the root index page of the non-unique clustered
index, you get the following result:
As you can see, SQL Server returns here an additional column named UNIQUIFIER
(key). This column is used by SQL Server to make a non-unique clustered key
unique. Behind the scenes it is a 4 byte long integer value starting at 0. E.g. when
you have 2 customers with the ID 1380 the first record gets the uniquifier value 0
and the second one gets the uniquifier value of 1. But SQL Server only stores the
uniquifier in the navigation structure of an index (all levels above the leaf level),
when the uniquifier is not equal to 0. SQL Server only includes uniquifier values
of 0 in the navigation structure of a non-unique clustered index, which means that
the navigation structure will never store the uniquifier physically. The only place
where the uniquifier is stored in a non-unique clustered index is on the data pages,
where the actual data records are stored. The following figure shows a data page dump
of our clustered index, where you can also see the stored uniquifier.
So the only difference between a unique and non-unique clustered index is on the data
pages, because when using a non-unique clustered index, SQL Server will use the 4
byte long uniquifier to make them unique, which is a small storage overhead that you
have to keep in mind, when working with non-unique clustered indexes. You can download
the T-SQL script for this posting here.
In the next posting we will work out the differences between unique/non-unique non-clustered
indexes defined on unique clustered indexes. Stay tuned 🙂
-Klaus