Unique/Non-Unique Clustered Indexes

In the last blog post I have talked about unique/non-unique clustered indexes on a heap

table. A table without a clustered index is called a heap table in SQL Server.

When you define a clustered index on such a table, the table data gets structured

and is therefore referred as clustered table. In this blog post I

want to talk about the differences in unique and non-unique clustered indexes, and

what are the storage impacts between those 2 types of clustered indexes.

As a prerequisite I assume that you have a basic understanding of clustered indexes,

and that you know the difference between heap and clustered tables, and how your data

pages are structured when a clustered index is defined on a table.

Let's start by looking on a unique clustered index. With SQL Server you have several

possibilities to define a unique clustered index. The first way – the easy one – is

to define a PRIMARY KEY constraint on a column. SQL Server enforces

this PRIMARY KEY constraint through the creation of a unique clustered

index on that table and that column. The another option is to create a unique clustered

index through the CREATE CLUSTERED INDEX statement – but when you

don't specify the UNIQUE property, SQL Server will create a non-unique

clustered index by default for you! The following code fragment creates the Customers table

that you already know from the previous blog posting, but this time we create a PRIMARY

KEY constraint on the column CustomerID. Therefore SQL Server

creates a unique clustered index on that table and sorts the data pages in the leaf

level according the values in the column CustomerID.

Create a table with 393 length + 7 bytes overhead = 400 bytes

-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24

CREATE TABLE Customers

(

CustomerID INT NOT NULL PRIMARY KEY IDENTITY(1, 1),

CustomerName CHAR(100) NOT NULL,

CustomerAddress CHAR(100) NOT NULL,

Comments CHAR(189) NOT NULL

)

-- Insert 80.000 records

DECLARE @i INT = 1

WHILE (@i <= 80000)

BEGIN

INSERT INTO Customers VALUES

(

'CustomerName' + CAST(@i AS CHAR),

'CustomerAddress' + CAST(@i AS CHAR),

'Comments' + CAST(@i AS CHAR)

)

SET @i += 1

END

After we have identified the index root page (through the use of the DBCC

IND command), we can dump out that page with the DBCC PAGE command.

In my case the index root page is 775:

DBCC PAGE(UniqueClusteredIndexStructure, 1, 775, 3)

As you can see from the following figure each index record contains the clustered

key, in this case the value of the column CustomerID.

When you examine the byte by byte representation of a clustered index record, you

can see that SQL Server uses here the following bytes:

1 byte: Status Bits
n bytes: Clustered Key – in this case 4 bytes
4 bytes: PageID
2 bytes: FileID

As you can see the length of the clustered key has a direct relationship of the length

of an index record. This mean as smaller your clustered key is, the more index record

can be put onto an index page, and therefore your clustered index will be much more

compact and will perform faster and are easier to maintain. When you walk down your

clustered index you will see that all intermediate levels have the same storage format

as described above. There are no differences on each level, expect the index leaf

level, because this level contains your actual logically ordered data pages.

Let's have now a look onto non-unique clustered indexes in SQL Server and how they

differ from unique clustered indexes. To demonstrate this kind of indexes, I have

just recreated the Customers table and created a non-unique clustered

index on that table through the CREATE CLUSTERED INDEX statement:

Create a table with 393 length + 7 bytes overhead = 400 bytes

-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24

CREATE TABLE Customers

(

CustomerID INT NOT NULL,

CustomerName CHAR(100) NOT NULL,

CustomerAddress CHAR(100) NOT NULL,

Comments CHAR(181) NOT NULL

)

-- Create a non unique clustered index

CREATE CLUSTERED INDEX idx_Customers_CustomerID

ON Customers(CustomerID)

Finally I have inserted 80.000 records, where the column CustomerID (the

clustered key) is not unique anymore:

Insert 80.000 records

DECLARE @i INT = 1

WHILE (@i <= 20000)

BEGIN

INSERT INTO Customers VALUES

(

@i,

'CustomerName' + CAST(@i AS CHAR),

'CustomerAddress' + CAST(@i AS CHAR),

'Comments' + CAST(@i AS CHAR)

)

INSERT INTO Customers VALUES

(

@i,

'CustomerName' + CAST(@i AS CHAR),

'CustomerAddress' + CAST(@i AS CHAR),

'Comments' + CAST(@i AS CHAR)

)

INSERT INTO Customers VALUES

(

@i,

'CustomerName' + CAST(@i AS CHAR),

'CustomerAddress' + CAST(@i AS CHAR),

'Comments' + CAST(@i AS CHAR)

)

INSERT INTO Customers VALUES

(

@i,

'CustomerName' + CAST(@i AS CHAR),

'CustomerAddress' + CAST(@i AS CHAR),

'Comments' + CAST(@i AS CHAR)

)

SET @i += 1

END

When you now dump out the root index page of the non-unique clustered

index, you get the following result:

As you can see, SQL Server returns here an additional column named UNIQUIFIER

(key). This column is used by SQL Server to make a non-unique clustered key

unique. Behind the scenes it is a 4 byte long integer value starting at 0. E.g. when

you have 2 customers with the ID 1380 the first record gets the uniquifier value 0

and the second one gets the uniquifier value of 1. But SQL Server only stores the

uniquifier in the navigation structure of an index (all levels above the leaf level),

when the uniquifier is not equal to 0. SQL Server only includes uniquifier values

of 0 in the navigation structure of a non-unique clustered index, which means that

the navigation structure will never store the uniquifier physically. The only place

where the uniquifier is stored in a non-unique clustered index is on the data pages,

where the actual data records are stored. The following figure shows a data page dump

of our clustered index, where you can also see the stored uniquifier.

So the only difference between a unique and non-unique clustered index is on the data

pages, because when using a non-unique clustered index, SQL Server will use the 4

byte long uniquifier to make them unique, which is a small storage overhead that you

have to keep in mind, when working with non-unique clustered indexes. You can download

the T-SQL script for this posting here.

In the next posting we will work out the differences between unique/non-unique non-clustered

indexes defined on unique clustered indexes. Stay tuned 🙂

-Klaus

Unique/Non-Unique Clustered Indexes

Rate

Share

Share

Rate

Unique/Non-Unique Clustered Indexes

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts