Blog Post

Unique and non-unique non-clustered indexes on a unique clustered index

,

In the last weblog

post I have talked about the difference of unique and non-unique clustered indexes.

As you have seen SQL Server uses an internal overhead of 4 bytes (the so-called uniquifier)

to make non-unique clustered index rows unique. Today I want to work out the difference

between unique and non-unique non-clustered indexes defined on a table with a unique

clustered index. As you already know SQL Server creates a unique clustered index when

you define the PRIMARY KEY constraint on a table. On the other hand

you can use the CREATE UNIQUE CLUSTERED INDEX statement to create

a unique clustered index on a table. The following listing creates our customers table,

creates a unique clustered index on it, and finally creates one unique- and one non-unique

non-clustered index on that table.

-- Create a table

with 393 length + 7 bytes overhead = 400 bytes

-- Therefore

20 records can be stored on one page (8.096 / 400) = 20,24

CREATE TABLE Customers

(

CustomerID INT NOT NULL,

CustomerName CHAR(100) NOT NULL,

CustomerAddress CHAR(100) NOT NULL,

Comments CHAR(189) NOT NULL

)

GO

-- Create a unique

clustered index on the previous created table

CREATE UNIQUE CLUSTERED INDEX idx_Customers ON Customers(CustomerID)

GO

-- Insert 80.000

records

DECLARE @i INT = 1

WHILE (@i <= 80000)

BEGIN

INSERT INTO Customers VALUES

(

@i,

'CustomerName' + CAST(@i AS CHAR),

'CustomerAddress' + CAST(@i AS CHAR),

'Comments' + CAST(@i AS CHAR)

)

SET @i += 1

END

GO

-- Create a unique

non clustered index on the clustered table

CREATE UNIQUE NONCLUSTERED INDEX idx_UniqueNCI_CustomerID

ON Customers(CustomerName)

GO

-- Create a non-unique

non clustered index on the clustered table

CREATE NONCLUSTERED INDEX idx_NonUniqueNCI_CustomerID

ON Customers(CustomerName)

GO

After the creation

of both non-clustered indexes you can use the DMV sys.dm_db_index_physical_stats to

get some information about the indexes. When you look into the DMV, you can see that

the unique non-clustered index has a record length of 107 bytes and the non-unique

non-clustered index has a record length of 111 bytes. So again, there must be a difference

in the internal storage format of both indexes! Let's analyze it and start with the

unique non-clustered index.

In my case the index

root page of the unique non-clustered index is 4370, so I can dump it out very easily

with the DBCC IND command:

DBCC PAGE(UniqueClusteredIndexStructure_NonClusteredIndex, 1, 4370, 3)

GO

As you can see from

the following figure each index record contains the non-clustered key (which is unique

in this case) – the column CustomerName:

image

When you examine

the byte by byte representation of the unique non-clustered index record, you can

see that SQL Server uses here the following bytes:

  • 1 Byte: Status Bits

  • n Bytes: Unique Non-Clustered

    Index Key – in this case 100 bytes

  • 4 Bytes: PageID

  • 2 Bytes: FileID

In sum SQL Server

uses the above mentioned 107 bytes per each index record on each non-leaf level of

the unique non-clustered index. So again, the length of your non-clustered index key

has an impact on how many rows SQL Server can store on an index page. So a CHAR(100)

like in this example – would be not a very good idea…

When you are walking

down the unique non-clustered index until you reach the leaf-level of the index always

stores the above mentioned 107 bytes per each index row – nothing more. When you finally

dump out the leaf-level of the non-clustered index, you get the following picture:

image

As you can see here,

SQL Server stores here at the leaf-level directly the clustered key – in our case

the value of the column CustomerID. This value is for SQL Server

the pointer to the corresponding record in the clustered index. With this value in

the hand, SQL Server can now find the record in the clustered index – through a Clustered

Index Seek operator. This is a big difference compared to non-clustered indexes

defined on a heap table. Because in a heap table, SQL Server uses at the leaf-level

the HEAP RID to point DIRECTLY to the corresponding data

page where the record is stored. Therefore SQL Server can directly read the correct

data page without accessing an additional index!

This also implies

that SQL Server can find a record through a non-clustered index on a heap table faster

than a record through a non-clustered index on a clustered table, because SQL Server

don't have to execute the additional Clustered Index Seek operator.

So the correct row can be found with less page reads on a heap table. But please don't

over estimate this detail, and think that you will get a performance benefit by using

non-clustered indexes on heap tables. The fact is that SQL Server always tries to

store the index pages in the Buffer Manager, so it's really very cheap for SQL Server

to do this additional Clustered Index Seek to get the correct record

from the clustered index back.

Let's now analyze

our non-unique non-clustered index. When you dump out the index root page, you can

see that SQL Server stores here the non-clustered index key and also the clustered

index key, which is different from the previous example with the unique non-clustered

index:

image

SQL Server needs

here the unique clustered index key to make each non-unique non-clustered index key

unique. This behavior is done on EACH LEVEL of the non-unique non-clustered

index, from the index root page down to the leaf-level. This means that you have a

huge storage overhead, because SQL Server stores in EACH INDEX RECORD also

your unique clustered key besides the non-unique non-clustered index key. So when

you have a badly chosen clustered key (like CHAR(100), etc.) it will

even get much more worse for you! When you analyze the index row you can see that

SQL Server uses the following bytes for the storage:

  • 1 Byte: Status Bits

  • n Bytes: Non-unique

    Non-Clustered Index Key – in this case 100 bytes

  • n Bytes: Unique Clustered

    Index Key – in this case 4 bytes for the integer value

  • 4 Bytes: PageID

  • 2 Bytes: FileID

When you sum up those

bytes, you will get the 111 bytes mentioned earlier. So please keep this additional

storage overhead in your head when using non-unique non-clustered indexes, because

it impacts your non-clustered indexes on each level! You can download the T-SQL script

for this posting here.

In the

next installment of this series we will have finally a look into the differences of

unique/non-unique non-clustered indexes defined on a non-unique clustered index. Stay

tuned J

-Klaus

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating