Columnstore Indexes for Fast DW
The SQL Server 11.0 release (2012) introduces a new data warehouse query acceleration feature based on a new type of index called the columnstore. This new index, combined with enhanced query optimization and execution features, improves data warehouse query performance by hundreds to thousands of times in some cases, and can routinely give a tenfold speedup for a broad range of queries fitting the scenario for which it was designed. It does all this within the familiar T-SQL query language, and the programming and system management environment of SQL Server. It’s thus fully compatible with all reporting solutions that run as clients of SQL Server, including SQL Server Reporting Services.
A columnstore index stores each column in a separate set of disk pages, rather than storing multiple rows per page as data traditionally has been stored. We use the term “row store” to describe either a heap or a B-tree that contains multiple rows per page. The difference between column store and row store approaches is illustrated below:
Figure 1: Comparison between row store and column store data layout
5
The columns C1…C6 are stored in different groups of pages in the columnstore index. Benefits of this are:
- only the columns needed to solve a query are fetched from disk (this is often fewer than 15% of the columns in a typical fact table),
- it’s easier to compress the data due to the redundancy of data within a column, and
- buffer hit rates are improved because data is highly compressed, and frequently accessed parts of commonly used columns remain in memory, while infrequently used parts are paged out.
The columnstore index in SQL Server employs Microsoft’s patented Vertipaq™ technology, which it shares with SQL Server Analysis Services and PowerPivot. SQL Server columnstore indexes don’t have to fit in main memory, but they can effectively use as much memory as is available on the server. Portions of columns are moved in and out of memory on demand.
SQL Server Denali columnstore indexes are “pure” column stores, not a hybrid, because they store all data for separate columns on separate pages. This improves I/O scan performance and buffer hit rates. SQL Server is the first major database product to support a pure columnstore index. Others have claimed that it was impossible to leverage pure columnstore technology in an established database product with a broad market. We’re happy to prove them wrong and we think you’ll be glad we did!
Using Columnstore Indexes
To improve query performance, all you need to do is build a columnstore index on the fact tables in a data warehouse. If you have extremely large dimensions (say more than 10 million rows) then you may wish to build a columnstore index on those dimensions as well. After that, you simply submit queries to SQL Server, and they can run much, much faster.
Note: ColumnStore Index was just renamed to “xVelocity“
Resources:
Columnstore Index Wiki – contains the latest information from white papers, code examples, to presentations.