In this article, I thought of explaining different types and levels of index pages. but while start preparing for that, I realise , newbies need to understand concepts on Index architecture before moving on to see what is there inside index pages.
So, before going see internals of Clustered and NonClustered index pages, lets quickly recap some key points on Clustered and Non Clustered Indexes and its Architecture. This will help Us to quickly understand internals of different types and levels of index pages.
Note : Points given here are taken from msdn and I 've used a suitable query from MS press SQL Server 2008 Internals book.
Please have a look at reference section for more details.
Common Properties of Clustered and Non Clustered Indexes:
1. Indexes are organized as B-trees.
2. Each page in an index B-tree is called an index node.
3. The top node of the B-tree is called the root node.
4. The bottom level of nodes in the index is called the leaf nodes.
5. Any index levels between the root and the leaf nodes are collectively known as intermediate levels.
6. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a row in the leaf level of the index.
7. The pages in each level of the index are linked in a doubly-linked list.
8. The page collections for the B-tree are anchored by page pointers in the sys.system_internals_allocation_units system view.
Properties of Clustered Index:
1. The root and intermediate level nodes contain index pages holding index rows.
2 Leaf nodes of clustered index contain the data pages of the underlying table.
3. Clustered indexes have one row in sys.partitions, with index_id = 1 for each partition used by the index.
4. The pages in the data chain and the rows in them are ordered on the value of the clustered index key.
5. All inserts are made at the point where the key value in the inserted row fits in the ordering sequence among existing rows.
6. For a clustered index, the root_page column in sys.system_internals_allocation_units points to the top of the clustered index for a specific partition.
7. SQL Server moves down the index to find the row corresponding to a clustered index key.
8. To find a range of keys, SQL Server moves through the index to find the starting key value in the range and then scans through the data pages using the previous or next pointers.
9. To find the first page in the chain of data pages, SQL Server follows the leftmost pointers from the root node of the index.
10.If the clustered index is not created with the UNIQUE property, the Database Engine automatically adds a 4-byte uniqueifier column to the table. When it is required, the Database Engine automatically adds a uniqueifier value to a row to make each key unique.
Clustered Index Design Guidelines
1. With few exceptions, every table should have a clustered index defined on the column, or columns that offer the following:
1.1 Can be used for frequently used queries.
1.2 Provide a high degree of uniqueness and are accessed sequentially
1.3 Can be used in range queries.
1.4 Used frequently to sort the data retrieved from a table.
2. Before you create clustered indexes, understand how your data will be accessed. Consider using a clustered index for queries that do the following:
2.1 Return a range of values by using operators such as BETWEEN, >, >=, <, and <=.
2.2 Return large result sets.
2.3 Use JOIN clauses; typically these are foreign key columns.
2.4 Use ORDER BY, or GROUP BY clauses.
3 Clustered indexes are not a good choice for the following attributes:
3.1 Columns that undergo frequent changes
3.2 Wide keys
3.3 columns having less unique values
Properties of Non Clustered Index:
1. Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:
1.1 the data rows of the underlying table are not sorted and stored in order based on their
nonclustered keys.
1.2 The leaf layer of a nonclustered index is made up of index pages instead of data pages.
2. Nonclustered indexes can be defined on a table with a clustered index or a heap.
3. Each index row in the nonclustered index contains the nonclustered key value and a row locator.
4. The row locators in nonclustered index rows are either a pointer to a row or are a clustered index key
for a row:
4.1 If the table is a heap, the row locator is a pointer to the row.
The pointer is built from the file identifier (ID), page number, and number of the row on the
page. The whole pointer is known as a Row ID (RID).
4.2 If the table has a clustered index, or the index is on an indexed view, the row locator is the
clustered index key for the row.
5. Nonclustered indexes have one row in sys.partitions with index_id >1 for each partition used by the index.
6. For a Nonclustered index, the root_page column in sys.system_internals_allocation_units points to
the top of the non clustered index for a specific partition.
Nonclustered Index Design Guidelines
Consider using a nonclustered index for queries that have the following attributes:
1. Use JOIN or GROUP BY clauses.
2. Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.
3. Queries that do not return large result sets.
4. Contain columns frequently involved in search conditions of a query, such as WHERE clause, that return exact matches.
How to find different levels of clustered Index pages:
To understand Clustered index architecture, let’s create a simple table named tExample5 and create clustered index on intEmployeeId column
CREATE TABLE tExample5(
intEmployeeId int IDENTITY(10001,1),
strFirstName varchar(100),
strDeptCode int default round(rand()*999999,0),
strAddress varchar(1000),
strSalary int default round(rand()*999999,0))
GO
CREATE CLUSTERED INDEX NCI_tExample5_intEmployeeId ON tExample5(intEmployeeId)
GO
Let’s insert 50000 rows in tExample5 table to form at least 3 levels of b-tree (Root, Intermediate and Leaf level)
INSERT INTO tExample5(strFirstName, strAddress) VALUES('AAAAA',REPLICATE('A',1000))
GO 50000
Here is the query to list page allocation in each level of clustered index:
SELECT index_depth AS D
, index_level AS L
, record_count AS 'Count'
, page_count AS PgCnt
, avg_page_space_used_in_percent AS 'PgPercentFull'
, min_record_size_in_bytes AS 'MinLen'
, max_record_size_in_bytes AS 'MaxLen'
, avg_record_size_in_bytes AS 'AvgLen'
FROM sys.dm_db_index_physical_stats (DB_ID ('LearningInternals') , OBJECT_ID ('tExample5'), 1, NULL, 'DETAILED');
Output:
Observation:
1. Intermediate levels are formed based on length of key field. If Index key is short enough, an Index page will cover large data area resulting better query performance.
2. Based on this observation, it’s recommended to keep index key short for better performance.
Here is the DBCC command to list details of pages allocated to table tExample5
DBCC TRACEON(3604)
DBCC IND('LearningInternals', 'tExample5', -1)
DBCC TRACEOFF(3604)
Output (4 out of 7171 pages having different index level are listed):
Observation:
1. Page usage:
IAM Page | 1 |
Root Page | 1 |
Intermediate pages | 26 |
Data Pages | 7143 |
Total Pages | 7171 |
2. Sample page numbers in each B-Tree level:
PageFID | PagePID | PageType | PageTypeDescription | IndexLevel | IndexLevelDescription |
1 | 256 | 10 | IAM Page | NULL | IAM page. |
1 | 270 | 2 | Index page | root level (index page) | |
1 | 2247 | 2 | Index page | Intermediate level (index page) | |
1 | 8474 | 1 | Data page | 0 | Leaf level (data page) |
Summary:
In this section, we have seen properties of Clustered and Non Clustered indexes and how to see page allocations in B-Tree structure.
In next section, let’s see how to read different level and types of index pages.
Reference:
Microsoft Press SQL Server 2008 Internals
http://msdn.microsoft.com/en-us/library/ms177443.aspx
http://msdn.microsoft.com/en-us/library/ms190639.aspx
http://msdn.microsoft.com/en-us/library/ms177484.aspx
http://msdn.microsoft.com/en-us/library/ms179325.aspx