(Be sure to checkout the FREE SQLpassion Performance Tuning Training Plan - you get a weekly email packed with all the essential knowledge you need to know about performance tuning on SQL Server.)
Today I want to talk about a specific kind of index in SQL Server: Filtered Indexes, which were introduced back with SQL Server 2008 – so already very long time ago. A Filtered Index is a traditional Non-Clustered Index, which includes a Filter Predicate that restricts the rows that are indexes in that specific index.
Advantages of Filtered Indexes
Let’s start now by looking at the advantages of Filtered Indexes, and where they can be beneficial. Let’s have a look at the column PurchaseOrderNumber of the table Sales.SalesOrderHeader of the AdventureWorks2014 database. As you can see from the following T-SQL query, there are a lot of NULL values in that column. In my case 27659 records out of 31465 records have a NULL value.
SELECT COUNT(*), COUNT(PurchaseOrderNumber) FROM Sales.SalesOrderHeader GO
In the next step I create a Covering Non-Clustered Index for a specific query that I want to speed up.
-- Create a supporting Non-Clustered Index for the query CREATE NONCLUSTERED INDEX idxTest ON Sales.SalesOrderHeader(PurchaseOrderNumber, SalesPersonID) INCLUDE (OrderDate, ShipDate) GO -- When we execute the query, SQL Server can use the previous created -- Non-Clustered Index, but we have indexed a huge amount of NULL values that -- we don't request from this query... SELECT PurchaseOrderNumber, OrderDate, ShipDate, SalesPersonID FROM Sales.SalesOrderHeader WHERE PurchaseOrderNumber LIKE 'PO6%' AND SalesPersonID IS NOT NULL GO
When you run the query, you can see in the Execution Plan that the Non-Clustered Index is referenced by the Query Optimizer.
When we check the size of the index through the Dynamic Management Function (DMF) sys.dm_db_index_physical_stats, you can see that the Non-Clustered Index consists of 132 pages: 1 index root page, and 131 pages in the leaf level.
-- The previous created Non-Clustered Index has 132 pages (1 + 131) SELECT * FROM sys.dm_db_index_physical_stats ( DB_ID('AdventureWorks2014'), OBJECT_ID('Sales.SalesOrderHeader'), NULL, NULL, 'DETAILED' ) WHERE index_id = (SELECT index_id FROM sys.indexes WHERE name = 'idxTest') GO
To make now the index structure more efficient, you can also create a Filtered Non-Clustered Index. When you think back to our query, we only request column values with a Non-NULL value in the column PurchaseOrderNumber. Therefore we can just filter the NULL values out from the index structure. So let’s recreate now the Non-Clustered Index as a Filtered Non-Clustered Index.
-- Recreate the Non-Clustered Index as a Filtered Index and filter out the NULL values CREATE NONCLUSTERED INDEX idxTest ON Sales.SalesOrderHeader(PurchaseOrderNumber, SalesPersonID) INCLUDE (OrderDate, ShipDate) WHERE PurchaseOrderNumber IS NOT NULL AND SalesPersonID IS NOT NULL WITH (DROP_EXISTING = ON) GO
When we check now again the index size through the DMF sys.dm_db_index_physical_stats, you will see now that the Filtered Non-Clustered Index consists of only 29 pages: 1 index root page, and only 28 pages in the leaf level.
In our case the query performance does’t get better, but you have created a much smaller index, and therefore you introduce less overhead in your database. Your Index Reorganize and Index Rebuild operations will be much faster, because they are dealing with a smaller index structure.
Disadvantages of Filtered Indexes
By now you know the advantages of Filtered Indexes in SQL Server. Let’s talk now about the disadvantages. Yes, there are also disadvantages! There are always some pro’s and con’s about everything. Like in real life!
Let’s work now with parameterized T-SQL statements. In my case I create a simple Stored Procedure that accepts a @PurchaseOrderNumber parameter.
CREATE PROCEDURE RetrieveData ( @PurchaseOrderNumber NVARCHAR(25) ) AS BEGIN SELECT PurchaseOrderNumber, OrderDate, ShipDate, SalesPersonID FROM Sales.SalesOrderHeader WHERE PurchaseOrderNumber = @PurchaseOrderNumber END GO
But when you execute now the Stored Procedure and you check the Execution Plan, you will see that the Query Optimizer hasn’t chosen the Filtered Index. You are scanning the whole Clustered Index of the table Sales.SalesOrderHeader.
Why is the Query Optimizer not referencing the Filtered Non-Clustered Index? Because it’s not safe! Imagine that initially the Filtered Index is referenced in the Execution Plan. That Execution Plan get’s cached in the Plan Cache. When you now execute the Stored Procedure again with a different parameter value, SQL Server can reuse the already cached Execution Plan from the Plan Cache.
But what happens if you specifiy a NULL value as input parameter value? Then the cached plan wouldn’t return you a result, because the Filtered Non-Clustered Index doesn’t know anything about the NULL values. So the query result would be nothing, which is wrong! And therefore the Query Optimizer can’t reference a Filtered Index in a parameterized T-SQL statement!
Summary
Filtered Non-Clustered Indexes are great – as long as you know their limitations. Normally parameterized T-SQL statements are great, because you don’t pollute the Plan Cache with Adhoc queries. But in combination with Filtered Non-Clustered Indexes in SQL Server they are somehow counterproductive. So please keep that in mind when you work on your indexing strategy.
Thanks for your time,
-Klaus