In today’s blog posting I want to talk about Heap Tables in SQL Server. Heap tables are tables without a Clustered Index. A table in SQL Server can have a Clustered Index, then it’s called a Clustered Table, and without a Clustered Index, it’s called a heap table. In a heap table, the data is not sorted in any way, it’s just a pile of unordered, unstructured records. When you access a heap table through a SELECT statement, SQL Server will use a Table Scan operator in the Execution Plan, when you have no suitable Non-Clustered Index defined. There is no Table Seek operator available. That’s very important.
As you will see next week, when we talk about Clustered Indexes, you access a Clustered Index through a Clustered Index Scan and through a Clustered Index Seek operator. On a heap table you have only a Table Scan operator. And Table Scan means that you must scan the whole table. And it will not scale with the amount of data you have. The more data you have, the longer the operation takes. A Table Scan is always a O(n) operation (more about the Big O notation) – it will not scale as your table gets larger and larger. Now let’s have a more detailed look at the advantages and disadvantages of heap tables in SQL Server.
Advantages
Heap tables are very, very, very fast – for inserting data. As we have said earlier it’s just a pile of data – nothing more. When you crack out every page from a traditional phone book and place the individual pages on the desk in front of you, you have created a heap table. Inserting new phone book records into that heap table is very fast: you allocate a new page (of 8kb), write the new records onto that page, and finally put the page with the other ones in front of you. Finished. No sorting order must be guaranteed.
It’s the same with SQL Server: a new page is allocated, the records are stored onto that page, and the page is assigned to the heap table. Finished. This is a very fast approach, because SQL Server doesn’t have to ensure any sorting order. It’s up to SQL Server where to put the new records.
For that reason, it can be sometimes a good idea to leave some tables as heap tables in a database schema: tables on which you have a huge INSERT activity. Just think about your logging/auditing tables. But I’m never ever recommending using heap tables everywhere. There are only a few specific use cases where they make sense. But not everywhere!
Disadvantages
Besides the advantage that heap tables are very fast for inserting data, there are also a huge number of disadvantages, that you also must incorporate into your thoughts when you decide to create a table as a heap table.
As a first disadvantage, a heap table leads to random I/O in the storage subsystem when you access the table data. Imagine you are executing a simple SELECT statement against your heap table. If the data isn’t yet cached in the Buffer Pool, SQL Server must issue physical reads to your storage subsystem. These read will be random I/O because the pages of the heap table are stored somewhere in your data files, but not next to each other.
If you are using traditional rotational storage (which is currently still the most common case), you will have performance problems on your storage level, because random I/O is slow, very slow. SSD drives are a big game changer here, because it’s doesn’t really matter anymore, if you are performing random or sequential I/O on SSDs. Both operations are almost the same regarding their speed (random I/O is still “a little” bit slower than sequential I/O).
Another big problem that you will have with your heap tables are so-called Forwarding Records. Records that are stored on a heap table can move in some circumstances from one page to another one. If this occurs, SQL Server stores on the original page a forwarding record that points to the new location where the record is stored.
When you are accessing your data, SQL Server still accesses the original page, and fetches the record through the forwarding record, which is an additional page read that you need. And this will also slow down your read performance tremendously. If you are more interested in forwarding records, and how you can avoid them, I suggest that you watch my 10th SQL Server Quickie on that specific topic.
Summary
Heap tables have their uses – in some cases. I always recommend creating a Clustered Table (define a Clustered Index on it), but think about some specific use cases, and whether a heap table would serve your business needs in a better way (regarding the performance characteristics). If you want to have a more detailed explanation of when heap tables can be suitable for your workload, I also highly recommend that you read Thomas Kejser’s blog posting Clustered Indexes vs. Heaps about it. Thomas gives you a very controversial, but still valid insight on when heap tables can make sense for you.
Thanks for your time,
-Klaus