In several of my last few blog posts, I’ve shared several methods of getting internal information from a database by using the DBCC PAGE command and utilizing the “WITH TABLERESULTS” option to be allowed to automate this process for further processing. This post will also do this, but in this case, we’ll be using it to bust a common myth—data in a clustered index is physically stored on disk in the order of the clustered index.
Busting this myth
To bust this myth, we’ll create a database, put a table with a clustered index into this database, and then we’ll add some rows in random order. Next, we will show that the rows are stored on the pages in logical order, and then we’ll take a deeper look at the page internals to see that the rows are not stored in physical order.
To start off with, let’s create a database and a table, and add a few rows to this table:
IF DB_ID('CIPageTest') IS NULL CREATE DATABASE CIPageTest; GO -- use the database USE CIPageTest; GO -- if the PageTest table exists, then drop it to start all over IF OBJECT_ID('dbo.PageTest','U') IS NOT NULL DROP TABLE dbo.PageTest; GO -- create the dbo.PageTest table CREATE TABLE dbo.PageTest ( RowID INTEGER PRIMARY KEY CLUSTERED, Col1 VARCHAR(1000) ); INSERT INTO dbo.PageTest (RowID, Col1) VALUES (1, REPLICATE('Row01', 100)); INSERT INTO dbo.PageTest (RowID, Col1) VALUES (3, REPLICATE('Row03', 100)); INSERT INTO dbo.PageTest (RowID, Col1) VALUES (5, REPLICATE('Row05', 100)); INSERT INTO dbo.PageTest (RowID, Col1) VALUES (2, REPLICATE('Row02', 100)); INSERT INTO dbo.PageTest (RowID, Col1) VALUES (4, REPLICATE('Row04', 100));
What we have is a table with 5 rows. The table’s clustered key is the RowID integer column. The rows are inserted so that the odd rows are inserted first, followed by the even rows.
Determine row’s physical location
At this point, let’s look at where the system reports these rows to be at. To do this, we’ll utilize two undocumented system commands. The first is %%physloc%%, which returns the physical location in a hexadecimal format. The second is fn_PhysLocFormatter, which converts this into a format of FileID:PageID:SlotID. This is just simply added to the select clause, so the query is:
SELECT RowID, Col1, sys.fn_PhysLocFormatter(%%physloc%%) AS [File:Page:Slot] FROM dbo.PageTest;
This query produces these results:
From these results, you can see that the rows are all on the same page, and that the slot in the slot array on that page are incrementing for the appropriate RowID value. Remember also that the slot array is zero-based, where the first slot is slot #0. This is the logical order.
How inserts change the slot the row is in
To see this changing around as the rows are inserted, just put the select statement (which I just introduced) after each of the prior insert commands and run the script to create the table and add the rows again. You will initially see RowID=1 put into Slot #0, RowID =3 into Slot #1 and RowID=5 into slot #2. When you then add RowID=2, this needs to be between RowID #s 1 and 3, so #2 is now in slot array #1, #3 moves to slot array #2, and #5 is moved to slot array #3. When you add RowID=4, it gets inserted into slot #3 and RowID#5 is again pushed down, to slot #4. The logical order follows what we are expecting:
Check the physical storage
In order to show that these are not physically stored in this order, we will need to crack this page and look internally at where the data is actually stored. We will accomplish this with this code:
DECLARE @Page INTEGER, @SQLCMD VARCHAR(MAX); DECLARE @DBCCIND TABLE ( PageFID INTEGER, PagePID INTEGER, IAMFID INTEGER, IAMPID INTEGER, ObjectID INTEGER, IndexID INTEGER, PartitionNumber INTEGER, PartitionID BIGINT, iam_chain_type VARCHAR(100), PageType INTEGER, IndexLevel INTEGER, NextPageFID INTEGER, NextPagePID INTEGER, PrevPageFID INTEGER, PrevPagePID INTEGER ); INSERT INTO @DBCCIND EXECUTE ('DBCC IND (CIPageTest, ''dbo.PageTest'', -1)'); SELECT @Page = PagePID FROM @DBCCIND WHERE PageType = 1; SET @SQLCMD = 'DBCC PAGE (CIPageTest, 1, ' + CONVERT(VARCHAR(15), @Page) + ', 3) WITH TABLERESULTS'; DECLARE @DBCCPAGE TABLE ( RowID INTEGER IDENTITY, ParentObject VARCHAR(255), Object VARCHAR(255), Field VARCHAR(255), Value VARCHAR(255)); INSERT INTO @DBCCPAGE EXECUTE (@SQLCMD); SELECT DISTINCT Page = @Page, ParentObject FROM @DBCCPAGE WHERE ParentObject LIKE 'Slot%Offset%';
Which produces these results:
The row’s offset is where the row starts at on the page
As we look at these results, pay close attention to the Offset. This is where the row physically starts on the page. You can see how for RowID #2, that this offset is higher than the offset for RowID #3… and even RowID #5. When the row was added, the data was added to the end of the other existing rows on the page, and the offset where this data starts was entered into the slot array, after having the remaining slot array entries pushed down in order to maintain the correct logical order. We can see this happening once again when RowID #4 is inserted.
And, finally, let’s use DBCC PAGE to crack open this page and look at the raw data. For this, we want to use dump style 2:
DECLARE @Page INTEGER; SET @Page = xyz; --<< set to the appropriate page from above DBCC PAGE (CIPageTest, 1, @Page, 2) WITH TABLERESULTS;
If you expand out the VALUE column, you can see it go from Row01 to Row03 to Row05, then to Row02 and Row04.
So there you go… the data is physically stored in the next available space on the page; however, the page’s slot array is in the clustered key order. Furthermore, we now know that we can look at the offset and determine the order on the page. And we also have yet another way to use DBCC PAGE in an automated manner to show this.
Previous related posts:
Persisting DBCC Output
Automating DBCC Page
Automating DBCC Page, Part 2
This post is re-published from my original post on SQL Solutions Group.
The post Automating DBCC PAGE – Part 3 (Is a Clustered Index physically sorted on disk?) appeared first on Wayne Sheffield.