Understanding the fundamentals is key for success, with everything you do. These days SQL Server has expanded into much more than just a database; but, products that ingest data for analysis still need to get the data from a source, such as an OLTP SQL database.
So why are HEAP tables important? Isn’t a table just a table? Well, you could say a table is a table, sure. Is an Audi A6 the same as an Audi R8? They’re both cars right? Well hold up there! The R8 is much, much faster than the A6! Or is it? What’s the 1/4 mile time?
Let’s assume we’re talking about the 2018 model for both and we’ll pit the supercharged 3.0T A6 against the V10 Plus R8. The A6 has 340 HP which is impressive but the R8 has 610 and weights much less. So it’s obvious that the A6’s 13.5 sec 1/4 mile is much slower than the R8’s 10.5 seconds. But what if you have to shuttle 4 people? The R8 has 2 seats and the A6 has 4. In this scenario the A6 tramples the R8 all day long.
So is a car just a car and a table just a table? How does a HEAP compare with a Clustered table? That’s a question for a different post. For now let’s stick to what a HEAP is. I’ve heard that it is a table without any indexes and this isn’t the case. A HEAP can be identified by finding index ID 0, clustered indexes are ID 1, and non-clustered indexes are > 2.
If you check our indexes using SYS.INDEXES we’ll see every index in the database.
From the image, we can see Object_Id 245575 (245575913) is a HEAP and has a single supporting non-clustered index. So the simple definition of a HEAP is a table without a clustered index.
The more complex answer is that the HEAP is unordered and non-clustered indexes use a RID (row id) for the pointer; while, a table with a clustered index is ordered by the clustered key and non-clustered indexes use the clustered key as the pointer.
Let’s examine the leaf of a non-clustered index with a HEAP.
First, we’ll need to find out what a leaf page is for our index. We can use the old school DBCC IND command to do this but since I’m running a modern version of SQL, it’s easier to use SYS.DM_DB_DATABASE_PAGE_ALLOCATIONS instead.
SELECT allocated_page_file_id, allocated_page_page_id, page_level, page_type_desc
FROM sys.dm_db_database_page_allocations(7, 245575913, 2, null, ‘DETAILED’)
Result:
The Root page is always the highest page_level. Intermediate pages are any pages between the root and leaf. Your leaf pages should always be page_level 0.
Here’s the index as seen in SSMS:
The index key (DatabaseLogID) is listed for each row and as you can see there is a RID attached to every key value. This allows for a RID Lookup to occur in the event that this index isn’t covering the query, meaning it doesn’t include all the columns needed to satisfy the query request. The RID lookup will appear in a Nested Loop join in the query plan showing that each row found in the non-clustered is then searched against the HEAP to obtain the additional column data. The RID is the “pointer” I referred to earlier in the post. In the event that this was not a HEAP, the non-clustered would include the full clustered key instead of the RID; which is one argument for keeping your clustered index keys narrow.
Should I use a HEAP or Clustered table? Well, this is up to you. The main argument is that of Forwarded Records. When a HEAP is fragmented you end up with records being in a different place, kind of like when you move and fill out the post office change of address form. At some point weeks later your mail arrives with a yellow re-address sticker. Now imagine your query going to the place it thinks it needs to be and then finding it needs to reroute elsewhere. So it’s generally considered a best practice to apply a clustered index to all your tables. Additionally, there are several sources that show a HEAP is slower for inserts, which means a Clustered Index is the Audi R8 but with 4 seats :). With that said, you should probably consider replacing your HEAP tables wherever possible.