All of the indexing strategy posts I’ve written in the past have been concerned with predicates combined with ANDs. That’s only one half of the possibilities though. There’s the case of predicates combines with ORs, and the guidelines for indexing that work well with ANDs don’t work with ORs
When dealing with predicates combined with AND, the predicates are cumulative, each one operates to further reduce the resultset.
For this reason, multi-column indexes support multiple predicates combined with AND operators.
If we look at a quick example, consider the following.
CREATE TABLE Customers ( CustomerID INT IDENTITY PRIMARY KEY, Surname VARCHAR(30) NOT NULL, FirstName VARCHAR(30), Title VARCHAR(5), CustomerType CHAR(1) NOT NULL, IsActive BIT DEFAULT 1 NOT NULL, RegistrationDate DATETIME NOT NULL DEFAULT GETDATE() );</pre> CREATE INDEX idx_Customers_SurnameFirstName ON Customers (Surname, FirstName);
Again I’m going to be lazy and get SQLDataGenerator to generate a few rows.
With that two column index on those columns and a query that looks for Surname = ‘Kelley’ AND Name = ‘Rick’, SQL can do a double column seek to go directly to the start of the range then just read down the index to the end of the range, basically until it finds the first row that it’s not interested in.
So how does that that differ when the operator is an OR?
The main difference is that with an OR, the predicates are independent. The second doesn’t serve to reduce the recordset, but rather to expand it. It’s similar to evaluating two separate predicates and combining the result. Let’s have a look at that 2 column index again when the two predicates are combined with an OR.
SELECT CustomerID FROM Customers WHERE Surname = 'Kelley' OR FirstName = 'Rick';
If we try to use that index to evaluate Surname = ‘Kelley’ OR Name = ‘Rick’, there’s a problem. While the first of those predicates can be evaluated with a seek (it’s a sargable predicate on the left-most column of an index), the second predicate cannot. It’s sargable, but it is on the second column of the index (and for the moment let’s assume there are no other indexes on the table). Seeks are only possible if the predicate filters on a left-based subset of the index key.
Hence to evaluate that predicate SQL will have to do an index scan. Since it has to do a scan to evaluate the one predicate, it won’t bother also doing a seek to evaluate the first predicate as it can also evaluate that during the scan.
Hence, in this case, the query will execute with a single index scan.
So how do we get this query to rather seek?
The key is that the predicates are independent, each get evaluated separately. Given that they are evaluated separately, it’s not a stretch to conclude that they perhaps need separate indexes, and that is indeed the case.
Taking the demo above and splitting that two column index into two separate indexes , the query now does execute with an index seek, or to be more correct, with two of them.
CREATE NONCLUSTERED INDEX idx_Customers_Surname ON dbo.Customers (Surname); CREATE NONCLUSTERED INDEX idx_Customers_FirstName ON dbo.Customers (FirstName);
That’s the simple case dealt with. What about a more complex where clause, one with both AND and OR operators in it.
SELECT CustomerID FROM dbo.Customers WHERE CustomerType = 'A' AND IsActive = 1 AND (Surname = 'Kelley' OR FirstName = 'Rick');
The easiest way to work this one out is to modify the form of that where clause. Boolean logic (specifically the distributivity) property states that a AND (b OR c) is equal to (a AND b) OR (a AND c)
Converted as such, the query now looks like this
SELECT CustomerID FROM dbo.Customers WHERE (CustomerType = 'A' AND IsActive = 1 AND Surname = 'Kelley') OR (CustomerType = 'A' AND IsActive = 1 AND FirstName = 'Rick');
Now that has the same pattern as the previous query, so is easy identify indexes. Each predicate (or set of predicates) combined with OR needs an index. There are two sets of predicates, so two indexes.
CREATE INDEX idx_Customers_TypeActiveFirstName ON dbo.Customers (CustomerType, IsActive, FirstName) GO CREATE INDEX idx_Customers_TypeActiveSurname ON dbo.Customers (CustomerType, IsActive, Surname) GO
Now I’m not suggesting that queries actually be written in that form. It’s more typing and can be confusing, and the query parser converts it back to the form a AND (b or c), but the query can be imagined in that form to make the indexes easier to work out.
The column order I’ve used there is not a requirement, the query seeks just as well when the order of the column in one or both indexes are different. The order is more determined by other queries against the table that may be able to use one or both.
One more thing to look at, and that’s the case where the index doesn’t cover the query. As I’m sure most of us know, if the index is not covering and the predicate not highly selective, SQL is likely to ignored the index in favour of scanning the clustered index (or heap).
The same thing applies here, with one complication. Typically (well, in all the cases I tested) it was necessary for both indexes to be covering or SQL just goes off and scans the cluster, even when one or both predicates have very low row estimates (like 1 row). In fact, using hints to try and force two seeks (with key lookups) results in an error
CREATE INDEX idx_Customers_IsActive ON dbo.Customers (IsActive) INCLUDE (FirstName, Surname) CREATE INDEX idx_Customers_RegistrationDate ON dbo.Customers (RegistrationDate) SELECT CustomerID, FirstName, Surname FROM dbo.Customers WHERE IsActive = 1 OR RegistrationDate = '2010-01-24'
Now that looks like a perfectly reasonable query. There are two indexes, one for each predicate combined with OR. If I reduce the SELECT to just CustomerID SQL does indeed do two seeks and a merge join (concatenation) as seen in earlier execution plans. Add the FirstName and Surname back into the query and SQL switches to a clustered index scan.
Can I force this to seek. Well, yes (SQL 2008 only)
SELECT CustomerID, FirstName, Surname FROM dbo.Customers WITH (FORCESEEK) WHERE IsActive = 1 OR RegistrationDate = '2010-01-24'
However this does not produce the plan that you might expect. Looking at how SQL processed the earlier queries, one might assume that SQL would seek both indexes, do a key lookup only on the rows returned from the index that’s not covering, then concatenate the two resultsets. Fair assumption, but that’s not what SQL does.
Instead it seeks on both indexes, concatenates the resultsets, then does the key lookup on what is essentially half of the table. Not efficient at all.
The optimiser appears not to be considering the possibility that it could seek both, do a key lookup only on the rows returned from the predicate on RegistrationDate (as the other is seeking on a covering index and hence already has the columns needed) and then concatenate the two. No wonder it picks a cluster scan by preference.
So what happens if it’s not practical to make both indexes covering? That’s going to have to be a topic for another day.For now I hope this has cleared up a bit on indexing for queries using OR.