Author's Note: This is a follow-up from my original Hierarchies in SQL article, posted on SQL Server Central on 28 Jan 2009. I originally wrote it a while back, and then found that it needed a lot of editing and a lot more testing, so it doesn’t include any SQL 2012-specific data, but it’s still valid and applies equally to SQL 2008, 2008R2, and 2012.)
I've recently had hierarchies in SQL Server brought to my attention again. Questions also keep coming up on how the new HierarchyID datatype compares to both adjacency hierarchies and nested sets hierarchies. I've dug into this datatype a few times in the past, but not with any degree of detail or serious attention, so I decided I needed to remedy that.
If you like surprise twists at the end of stories, this article has definitely got one. If you prefer to skip to the end, you can do that and avoid all the suspense. I tend towards verbosity in the same way that hurricanes “tend towards” damp and breezy, and you can skip most of that if you scroll down to the summary. But you’ll miss out on all the test scripts. And the high-budget special effects (well, you would if there were any).
In my prior tests, I'd found that adjacency can be very, very fast for updates and inserts, but is much slower at resolving hierarchies past a certain depth or complexity. I've found that nested sets are very, very fast at resolving hierarchies regardless of depth or complexity, but are very slow for updates and inserts except on VERY simple hierarchies. And that the HierarchyID method is faster to query than adjacency, but slower to update, and slower to query than nested sets, but faster to update, and thus seemed to be a sort of "not really the best at anything but an okay compromise".
Based on that, I actually worked out a "hybrid hierarchy", using both adjacency and nested sets, that was fast to update and fast to query, but kind of complex to manage overall. At the time, not having SQL Server 2008 in a production environment, that was a better compromise for the real-world situation I had to solve at that time. (See the prior article, linked above, for a lot more details on that compromise.)
First, to set up a test harness for comparing all three methods, run this code:
IF DB_ID(N'ProofOfConcept') IS NULL BEGIN CREATE DATABASE [ProofOfConcept]; ALTER DATABASE [ProofOfConcept] SET COMPATIBILITY_LEVEL = 100; END; GO USE [ProofOfConcept] GO CREATE TABLE [dbo].[HierarchyTest]( [ID] [int] IDENTITY(1,1) PRIMARY KEY, [NodeName] [varchar](10) NOT NULL, [ParentID] [int] NULL, [RangeStart] [int] NULL, [RangeEnd] [int] NULL, [HID] [hierarchyid] NULL, [HID_Level] AS ([HID].[GetLevel]())) GO CREATE NONCLUSTERED INDEX [IDX_Adjacency] ON [dbo].[HierarchyTest] ( [ParentID] ASC ) INCLUDE ( [NodeName]); GO CREATE NONCLUSTERED INDEX [IDX_HID] ON [dbo].[HierarchyTest] ( [HID] ASC ) INCLUDE ( [NodeName]); GO CREATE NONCLUSTERED INDEX [IDX_Nested] ON [dbo].[HierarchyTest] ( [RangeStart] ASC, [RangeEnd] ASC ) INCLUDE ( [NodeName]); GO ALTER TABLE [dbo].[HierarchyTest] WITH CHECK ADD FOREIGN KEY([ParentID]) REFERENCES [dbo].[HierarchyTest] ([ID]); GO ALTER TABLE [dbo].[HierarchyTest] WITH CHECK ADD CONSTRAINT [CK_Range] CHECK (([RangeStart] IS NULL OR [RangeEnd] IS NULL OR [RangeEnd]>=[RangeStart])); GO ALTER TABLE [dbo].[HierarchyTest] CHECK CONSTRAINT [CK_Range]; GO CREATE TABLE dbo.Numbers ( Number INT PRIMARY KEY); GO INSERT INTO dbo.Numbers (Number) SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY T1.object_id) FROM sys.all_columns AS T1 CROSS JOIN sys.all_columns AS T2;
I like to do testing in a database I call ProofOfConcept, usually on a desktop machine that's running a copy of SQL Server Developer Edition. The particular desktop computer I'm running all of these tests on this time, has a Core i7 processor (quad-core, two threads per core = effectively 8 CPUs), 16 Gigs of RAM, and the hard drives are on an eSATA bus (3.5 Terabytes of storage space total on the machine). (Yes, that's my desktop computer, not a server.)
Now to set up some test data in the table. I start out with just a few rows of data, just to make sure this is all working correctly:
TRUNCATE TABLE dbo.HierarchyTest; INSERT INTO dbo.HierarchyTest (NodeName, ParentID, RangeStart, RangeEnd, HID ) VALUES ('Joe', -- NodeName - varchar(10) null, -- ParentID - int 1, -- RangeStart - int 6, -- RangeEnd - int hierarchyid::GetRoot() -- HID - hierarchyid ); INSERT INTO dbo.HierarchyTest (NodeName, ParentID, RangeStart, RangeEnd, HID ) VALUES ('Bob', -- NodeName - varchar(10) 1, -- ParentID - int 2, -- RangeStart - int 3, -- RangeEnd - int CAST('/1/' AS HIERARCHYID) -- HID - hierarchyid ), ('Doug', -- NodeName - varchar(10) 1, -- ParentID - int 4, -- RangeStart - int 5, -- RangeEnd - int CAST('/2/' AS HIERARCHYID) -- HID - hierarchyid ); SELECT *, HID.ToString() AS HierarchyString FROM dbo.HierarchyTest ORDER BY HID; SELECT *, HID.ToString() AS HierarchyString FROM dbo.HierarchyTest ORDER BY HID_Level, HID;
This shows that the table is working, and tests some very simple queries against it, just to make sure I am getting expected results from known data. Every later test builds on this one. Since this is the first test, the initial TRUNCATE command isn't really necessary, but I keep it in the script in case I want to come back to this and need to clean out the table.
I next built a more realistic hierarchy. Still small, with only ten-thousand rows of data.
TRUNCATE TABLE dbo.HierarchyTest; GO INSERT INTO dbo.HierarchyTest (NodeName, ParentID, RangeStart, RangeEnd, HID ) SELECT 'Person' + RIGHT('0000' + CAST(Number AS VARCHAR(4)), 4), NULL, NULL, NULL, NULL FROM dbo.Numbers; GO UPDATE dbo.HierarchyTest SET ParentID = ID / 10 + ID % 10 WHERE ID > 10; GO ; WITH Hierarchy(ID, PID, Lvl, Pth) AS (SELECT ID, NULL, 0, '/' + CAST(ID AS VARCHAR(MAX)) + '/' FROM dbo.HierarchyTest WHERE ParentID IS NULL UNION ALL SELECT HSub.ID, HSub.ParentID, Hierarchy.Lvl + 1, Hierarchy.Pth + CAST(HSub.ID AS VARCHAR(MAX)) + '/' FROM dbo.HierarchyTest AS HSub INNER JOIN Hierarchy ON HSub.ParentID = Hierarchy.ID), Bottom AS (SELECT ID, Pth, ROW_NUMBER() OVER (ORDER BY Pth) * 10 AS RS FROM Hierarchy WHERE ID NOT IN (SELECT ParentID FROM dbo.HierarchyTest WHERE ParentID IS NOT NULL)) MERGE dbo.HierarchyTest AS H USING (SELECT Hierarchy.ID AS RangeID, CAST(Hierarchy.Pth AS HIERARCHYID) AS HID, MIN(RS) AS RS, MAX(RS) AS RE FROM Bottom INNER JOIN Hierarchy ON Bottom.Pth LIKE Hierarchy.Pth + '%' GROUP BY Hierarchy.ID, Hierarchy.Pth) AS Ranges ON H.ID = Ranges.RangeID WHEN MATCHED THEN UPDATE SET H.RangeStart = Ranges.RS, H.RangeEnd = Ranges.RE + 9, H.HID = Ranges.HID ; GO ; WITH CTE AS (SELECT ID, ROW_NUMBER() OVER (ORDER BY HT1.HID) AS R FROM dbo.HierarchyTest AS HT1 WHERE RangeStart IS NULL AND RangeEnd IS NULL AND NOT EXISTS ( SELECT * FROM dbo.HierarchyTest AS HT2 WHERE HT2.ParentID = HT1.ID )) UPDATE Tgt SET RangeStart = R, RangeEnd = R FROM dbo.HierarchyTest AS Tgt INNER JOIN CTE ON CTE.ID = Tgt.ID ; WHILE @@ROWCOUNT > 0 BEGIN WITH CTE AS (SELECT HT1.ID, MIN(HT2.RangeStart) AS RS, MAX(HT2.RangeEnd) AS RE FROM dbo.HierarchyTest AS HT1 INNER JOIN dbo.HierarchyTest AS HT2 ON HT1.ID = HT2.ParentID WHERE HT1.RangeStart IS NULL AND HT1.RangeEnd IS NULL AND HT2.RangeStart IS NOT NULL AND HT2.RangeEnd IS NOT NULL GROUP BY HT1.ID) UPDATE Tgt SET RangeStart = CTE.RS, RangeEnd = CTE.RE FROM dbo.HierarchyTest AS Tgt INNER JOIN CTE ON Tgt.ID = CTE.ID ; END ;
First, this script clears the table. Not necessary for a first run, but this script is designed to be run multiple times in case I want a clean dataset on subsequent tests. Next, the script inserts raw rows, using a string-builder instead of, for example, real names of people, departments, or components. Fake data like this will suffice for this kind of testing.
The next piece arbitrarily divides the hierarchy data into chunks and assigns parent IDs to all but 10 of the rows. Not completely realistic, but it'll do for this test. Note that the method I used for building the HierarchyID data is pretty much exactly as outlined in Books Online (and on MSDN) by Microsoft. (http://msdn.microsoft.com/en-us/library/bb677290.aspx)
The method for building the Nested Sets data is significantly different from the way outlined in my original article, and very different from the method I first learned from Joe Celko's work. The article of his that I linked to my earlier article on this is a broken link, but if you search online for "nested sets hierarchy" and "Joe Celko", you can find the relevant data. It works, and that’s what really matters.
There are some people who insist on building the Nested Sets data in a single query. I don’t know why, since this does it accurately and in about 1 second on my machine. Surely that’s fast enough and efficient enough.
Now to query my hierarchy:
IF OBJECT_ID(N'tempdb..#T1') IS NOT NULL DROP TABLE #T1; IF OBJECT_ID(N'tempdb..#T2') IS NOT NULL DROP TABLE #T2; IF OBJECT_ID(N'tempdb..#T3') IS NOT NULL DROP TABLE #T3; SET NOCOUNT ON ; SET STATISTICS TIME, IO ON; PRINT 'HierarchyID Select' ; DECLARE @MgrID HIERARCHYID = (SELECT HID FROM dbo.HierarchyTest WHERE ID = 2) ; SELECT HID, NodeName INTO #T1 FROM dbo.HierarchyTest WHERE HID.IsDescendantOf(@MgrID) = 1 ; PRINT 'Nested Sets Select' ; DECLARE @RS INT, @RE INT ; SELECT @RS = RangeStart, @RE = RangeEnd FROM dbo.HierarchyTest WHERE ID = 2 ; SELECT RangeStart, RangeEnd, NodeName INTO #T2 FROM dbo.HierarchyTest WHERE RangeStart BETWEEN @RS AND @RE ; PRINT 'Adjacency Select' ; ; WITH Hierarchy(ID, PID) AS (SELECT 2, NULL UNION ALL SELECT H.ID, H.ParentID FROM dbo.HierarchyTest AS H INNER JOIN Hierarchy ON H.ParentID = Hierarchy.ID) SELECT HT.ID, HT.ParentID, HT.NodeName INTO #T3 FROM dbo.HierarchyTest AS HT INNER JOIN Hierarchy ON HT.ID = Hierarchy.ID ; SET STATISTICS TIME, IO OFF; /* HierarchyID Select SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 2 ms. Nested Sets Select SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 2 ms. Adjacency Select SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 1112, logical reads 4461, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 2, logical reads 6673, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 15 ms, elapsed time = 18 ms. */
The commented portions at the bottom are the results I got. By inserting into on-the-fly temp tables, I eliminated performance variables that come from actually returning a dataset to the screen. That makes it a cleaner test, in my experience.
The first thing to note is that the HierarchyID and the Nested Sets method both got nearly identical results. This didn't match with my prior experiences, so I really dug into it. Either my prior tests were flawed (likely), or Microsoft has improved the HierarchyID query speed significantly since I first tested it. Either way, it proved out on multiple tests in my environment, so I have to go with the results I got. That definitely surprised me, but results are results.
The second thing to note is that, as expected, the Adjacency query took much longer, and was much more I/O intensive. It had to actually scan the table over 1,000 times just to build a worktable, and had to perform over 10,000 logical reads between the two tables. That's a lot of I/O for a seemingly simple query. That test exactly matches prior tests I've done, so I just ran it a few times to confirm the results.
Now comes another interesting test: updating data. Let's say I need to move ID 12 to ParentID 30. Someone's department is getting reorganized, and now they answer to a new manager, while keeping all of their existing people under them. Simple, right?
SET NOCOUNT ON ; DECLARE @ID INT = 12, @NewParentID INT = 30 ; PRINT 'Adjacency' ; BEGIN TRANSACTION ; SET STATISTICS IO, TIME ON; UPDATE dbo.HierarchyTest SET ParentID = @NewParentID WHERE ID = @ID ; SET STATISTICS IO, TIME OFF; ROLLBACK ; -- Needs to roll back so each update can work on the same rows PRINT 'HID' ; BEGIN TRANSACTION ; SET STATISTICS IO, TIME ON; DECLARE @HID HIERARCHYID, @HID2 VARCHAR(MAX), @HID3 VARCHAR(MAX), @IDString VARCHAR(10) ; SELECT @HID = HID, @HID2 = HID.ToString(), @IDString = ID FROM dbo.HierarchyTest WHERE ID = @ID ; SELECT @HID3 = HID.ToString() + @IDString + '/' FROM dbo.HierarchyTest WHERE ID = @NewParentID ; UPDATE dbo.HierarchyTest SET HID = CAST(REPLACE(HID.ToString(), @HID2, @HID3) AS HIERARCHYID) WHERE HID.IsDescendantOf(@HID) = 1 ; SET STATISTICS IO, TIME OFF; ROLLBACK ; /* Adjacency Update Table 'HierarchyTest'. Scan count 0, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 2, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. HID Update Table 'HierarchyTest'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 1, logical reads 6320, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 1, logical reads 1796, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 47 ms, elapsed time = 46 ms. */
I spent an hour or so trying to figure out a simple way to do this same test with a Nested Sets model, and all I could come up with was doing the Adjacency update, and then re-running the Nested Sets build for the whole table. So it ended up taking approximately 850 milliseconds average on my machine. Almost 20 times as long as the HierarchyID update.
The Adjacency update is trivial to code, and trivial to execute. It's FAST. It updates a single row, and that’s it. That's exactly as expected.
I did some research to find if there was a better method for editing the HierarchyID data in this, but the best I could come up with was use its ToString method and some simple string manipulation, and then Cast/Convert it back to HierarchyID. That works, and isn't exactly glacially slow, but it's massively slower and much more I/O intensive than the Adjacency model. Again, this exactly matches expectations and prior tests, so I just ran it a few times to confirm. It also matches exactly what Microsoft says in MSDN and Books Online.
I was, however, surprised that it was as fast as it was, once I hit on the method I ended up using. I was actually expecting to rebuild data from scratch, like Nested Sets. Instead, I found it easy and fast enough that I wouldn't worry about performance on it except on the most seriously overloaded servers. After all, 46 milliseconds isn’t exactly slow, it’s just slower than sub-millisecond for the Adjacency model. And about 20 times faster than Nested Sets.
Theoretically, it would be possible to cast the HierarchyID values to varbinary, and then use the Stuff() function to insert changes in the path, instead of casting to varchar. I tested this, and it seems to work, but it doesn’t save any complexity or performance, and isn’t as easy to read. Here’s an example:
DECLARE @HID HIERARCHYID = 0xE0247E00000201A8, @NewParent HIERARCHYID = 0xE024C0, @NewHID HIERARCHYID ; SET @NewHID = (SELECT CAST(CAST(STUFF(CAST(HID AS VARBINARY(MAX)), DATALENGTH(HID.GetAncestor(1)), 0, CAST(@NewParent AS VARBINARY(MAX))) AS VARBINARY(MAX)) AS HIERARCHYID) FROM dbo.HierarchyTest WHERE HID = @HID) ;
It uses a value from my test table, and inserts a new parent in between the existing parent and the current node. Would have to be done for every row affected. DataLength on the immediate ancestor gives the location in the binary string to stuff the new value, and nested conversions (cast) to and from varbinary and HierarchyID are needed.
That method isn’t documented in TechNet or MSDN or Books Online, so use at your own risk. Worth researching it, though, just for proof-of-concept.
One thing I did find in testing the HierarchyID datatype, that I wasn't expecting, but should have, is that the HierarchyID datatype doesn't deal well with multiple hierarchy roots. In Adjacency and Nested Sets hierarchies, you can very easily make a single table hold multiple top nodes. It’s possible, but not as easy, with the HierarchyID type.
To see this, re-run the first data load (three rows), and then run this:
SELECT * FROM dbo.HierarchyTest WHERE HID = HIERARCHYID::GetRoot() ; INSERT INTO dbo.HierarchyTest (NodeName, ParentID, RangeStart, RangeEnd, HID ) VALUES ('Person0000', -- NodeName - varchar(10) null, -- ParentID - int null, -- RangeStart - int null, -- RangeEnd - int HIERARCHYID::GetRoot() -- HID - hierarchyid ); SELECT *, HID.GetDescendant(NULL, NULL) FROM dbo.HierarchyTest WHERE HID = HIERARCHYID::GetRoot() ;
You'll find that the second top node, Person000, will have the descendants of the first top node. This can be worked around by not using the GetRoot method to make top nodes, and instead starting with the ID of each top node's row, as per my other examples.
Now, back to performance testing. This time, with a million rows and greater complexity to the hierarchy structure. 100 root notes, multiple levels deep, variable depth from different roots.
TRUNCATE TABLE dbo.HierarchyTest; GO INSERT INTO dbo.HierarchyTest (NodeName, ParentID, RangeStart, RangeEnd, HID ) SELECT TOP 1000000 'Person' + RIGHT('000000' + CAST(ROW_NUMBER() OVER (ORDER BY N1.Number) AS VARCHAR(6)), 6), NULL, NULL, NULL, NULL FROM dbo.Numbers AS N1 CROSS JOIN dbo.Numbers AS N2; GO UPDATE dbo.HierarchyTest SET ParentID = ID / 100 + ID % 100 WHERE ID > 100; GO ;
I ran the same scripts for the HierarchyID build and the Nested Sets build as with the 10k-row version. There was no need to change the scripts just because the table has 100 times more rows. The execution took about a minute for each. It could probably be tuned to be faster but that’s good enough on a million rows in this scenario.
The Nested Sets method that I'd been using for years, the one from my first article, was absolutely DEADLY when run on a hierarchy this size. It ran for over 5 hours and then I gave up on it, so there’s no telling how much longer it would have run! Definitely use either the method here, or something you find online that’s even faster, and don’t use my old method.
Tests on the selects came to the same result, but scaled, as with the smaller sets.
/* HierarchyID Select SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 1, logical reads 62, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 15 ms, elapsed time = 6 ms. Nested Sets Select SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Adjacency Select SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 10101, logical reads 61280, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 2, logical reads 60607, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 125 ms, elapsed time = 129 ms. *//* Adjacency Update Table 'HierarchyTest'. Scan count 0, logical reads 12, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 2, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. HID Update Table 'HierarchyTest'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table 'HierarchyTest'. Scan count 1, logical reads 94579, physical reads 0, read-ahead reads 49, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 358 ms, elapsed time = 454 ms. */
Summary of Million-Row Tests
Adjacency selects slow down to the point where glaciers and continental plates stuck behind them start tailgating and beeping at them to speed up, and the I/O stats are even more outrageous. Adjacency updates are still nearly instantaneous.
HierarchyID selects are still very, very fast, while updates are slowing down, but still not slow enough to matter (under half a second) except on an overloaded system.
Nested Sets selects are still the fastest, but only by 6 milliseconds in my tests, compared to HierarchyIDs. Updates on them are slow enough to make adjacency selects look fast.
Conclusion
This was by no means a be-all-and-end-all set of tests, but it was eye opening for me that HierarchyID outperformed Nested Sets so thoroughly! The HierarchyID was within the margin-of-error of being as fast on querying, and infinitely faster on both updates and complex generation.
I have to recommend the new HierarchyID if you have a database that needs fast queries and reasonably fast updates/inserts/deletes. Stay away from Nested Sets. If you need super-fast updates and can live with slower queries, go with Adjacency. That result was a bit of a shock to me, and thus I felt the need to share it, both to save anyone who might go down the wrong path because of my prior article, and to give others a chance to test it and possibly refute my findings.
Of course, none of the HierarchyID uses are applicable to anyone still using SQL 2005 or prior, so on those servers, I'd still recommend the hybrid method from my prior article, with the modified update in this one. But this might be another reason to add to the list for upgrading.
Follow-Up
I’ve also done some experimenting on a “Nested Sets” variation that ignores numbers completely and takes advantage of the fact that “AAAA” is between “AA” and “AB” in a string sort. I’ll have to write that one up once I get a chance to play with it more. It’s basically a string version of HierarchyID.