April 22, 2008 at 1:29 pm
For the sake of argument, suppose I have a 10-million-record table with no index at all. The fields in this table include DataWeek (a smalldatetime; 10 unique values) and CustomerKey (int; 1000 unique values). Suppose I want to create a single nonclustered index involving both fields.
Hypothetically, if EVERY query EVER run against this table used an equality filter in the WHERE clause on both fields (e.g., WHERE DataWeek = '4/12/2008' AND CustomerKey = 100), would it matter whether i defined my index as (DataWeek, CustomerKey) or (CustomerKey, DataWeek), and if so, how?
April 22, 2008 at 1:46 pm
Steve F. (4/22/2008)
For the sake of argument, suppose I have a 10-million-record table with no index at all. The fields in this table include DataWeek (a smalldatetime; 10 unique values) and CustomerKey (int; 1000 unique values). Suppose I want to create a single nonclustered index involving both fields.Hypothetically, if EVERY query EVER run against this table used an equality filter in the WHERE clause on both fields (e.g., WHERE DataWeek = '4/12/2008' AND CustomerKey = 100), would it matter whether i defined my index as (DataWeek, CustomerKey) or (CustomerKey, DataWeek), and if so, how?
Hypothetically, probably not. This looks like something that a good test would help solve if there are any takers out there (Jeff, Matt).
Is your hypothetical situation ever going to happen, probably not, and then you do have to look carefully at how you construct your indexes.
😎
April 22, 2008 at 2:11 pm
In that case, I have another question: In what ways is the index physically implemented differently depending on the order of the columns within it? I understand that the first field in an index determines whether SQL Server can use it, but is the index ordered first by the first column, making range queries more effective on the first column in the index? I'm just prodding for info here; I'm curious.
Thanks again.
April 23, 2008 at 10:38 pm
You should put the most selective column first ie the column with the greater number of distinct values.
The index would be ordered by the first column then second column so you are correct in that a range query is fine on the first column but the index may not help on the second column if the query doesn't include the first. Also statistics are really only useful for the first column - although the stats include cardinality / selectivity estimates for subsequent columns, they are only relevant in some situations eg determining whether the index can cover the query.
April 24, 2008 at 12:24 am
The index is implemented much like a telephone directory. If you have a composite index (colour, age as a very contrived example) then the index pages look something like this
Black 75
Blue 40
Blue 50
Blue 55
Green 1
Green 10
Red 0
Red 100
...
If all your queries are equality on both columns, the order doesn't matter. It becomes important when you have range seeks or single column seeks.
The contrived example above can be used to seek the following queries
WHERE Colour = @x and Age = @y
WHERE Colour = @x and Age between @y1 and @y2
WHERE Colour BETWEEN @x1 and @x2
The folowing requires a scan, because the query does not inlcude the left column of the index
WHERE Age = @y
The advice about making an indexes first column the most selective is often mentioned. I tend to disagreee with it. The index columns should be arranged so that the queries can use it. The first column should be the one that can appear alone in queries, not the most selective column
When it does a 2-column seek, the 2 columns are matched in a single operation. Again using the above example,
WHERE Colour = @x and Age = @y
does not mean that SQL will find all the colours = @x, then seek again to find the age @y. If does a single seek on both conditions.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 24, 2008 at 12:25 am
Oh, and with the selectivity you're talking about (1000 unique values in 10 million rows), if that index is not covering, SQL is not going to use that index. Too expensive to do the RID lookups.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 24, 2008 at 4:13 am
And some testing...
SET STATISTICS IO ON
GO
SET STATISTICS TIME ON
GO
CREATE TABLE TestingIndexOrder (
ID INT IDENTITY,
DayOfWeek smallint,
CustomerKey int
)
GO
insert INTO TestingIndexOrder (DayOfWeek, CustomerKey)
SELECT DATEPART(dw,DATEADD(dd,number, '1900/01/01')), FLOOR(RAND(number*5452)*1000)
FROM (
SELECT TOP 4000000 v1.number, v1.number + v2.number AS Num
FROM master..spt_values v1, master..spt_values v2 WHERE v1.name is NULL AND v2.name IS null
UNION ALL
SELECT TOP 4000000 v1.number, v1.number + v2.number AS Num
FROM master..spt_values v1, master..spt_values v2 WHERE v1.name is NULL AND v2.name IS null
UNION ALL
SELECT TOP 2000000 v1.number, v1.number + v2.number AS Num
FROM master..spt_values v1, master..spt_values v2 WHERE v1.name is NULL AND v2.name IS null ) x
GO
CREATE INDEX idx_test1 ON TestingIndexOrder (DayOfWeek, CustomerKey)
GO
SELECT DayOfWeek, CustomerKey FROM TestingIndexOrder WHERE DayOfWeek =3 AND CustomerKey = 504
/*
(4096 row(s) affected)
Table 'TestingIndexOrder'. Scan count 1, logical reads 14, physical reads 0, read-ahead reads 0
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 121 ms.
*/
DROP INDEX idx_test1 ON TestingIndexOrder
CREATE INDEX idx_test2 ON TestingIndexOrder (CustomerKey, DayOfWeek)
GO
SELECT DayOfWeek, CustomerKey FROM TestingIndexOrder WHERE DayOfWeek =3 AND CustomerKey = 504
/*
(4096 row(s) affected)
Table 'TestingIndexOrder'. Scan count 1, logical reads 14, physical reads 0, read-ahead reads 0
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 123 ms.
*/
DROP TABLE TestingIndexOrder
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 24, 2008 at 5:24 am
Gail,
Thanks for jumping in on this and doing some testing. I tried to set up a test myself, but my mind wasn't working right and the test data I contrived didn't meet the needs of the hypothetical question.
I agree with you regarding the first column of an index, it shouldn't always be the most selective, but should be based on the queries used on the tables.
😎
April 24, 2008 at 5:58 am
I've seen a case where a person took the most-selectivel advice literally, and made the leading column of every index the primary key (an identity)
He was very curious as to why he wasn't seeing index seeks. (none of the queries filtered on the PK)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 24, 2008 at 6:28 am
Great info. Thanks all.
Any recommendations for books that cover this topic in depth?
April 24, 2008 at 6:57 am
GilaMonster (4/24/2008)
When it does a 2-column seek, the 2 columns are matched in a single operation. Again using the above example,WHERE Colour = @x and Age = @y
does not mean that SQL will find all the colours = @x, then seek again to find the age @y. If does a single seek on both conditions.
Hmm... Does this mean that the order of the columns in an index -after- the first column is irrelevant?
April 24, 2008 at 7:19 am
No. You can generalise my 2 column example to as many columns as you have in the index.
For example, a 3 column index (A, B, C) is most optimal for queries of the following forms
WHERE A = @a and B = @b-2 and C=@c
WHERE A = @a and B = @b-2 and C between @c1 and @c2
WHERE A = @a and B between @b1 and @b2
WHERE A = @a
WHERE A between @a1 and @a2
It's less optimal for things like
WHERE A = @a and C=@c (because the where clause predicates are not a left-based subset of the index.)
It will require a seek on A=@a, followed by a filter to do C=@c
Queries of the following form require a scan
WHERE B = @b-2 and C=@c
Think of a phone book, where entries are ordered by surname, first name, phone number
Which is easier, to find all the people with a surname Brown (1st column) and a phone number of 555-6572 (3rd column)
or to find all the people with a surname Brown (1st column) and a first name of Matt (2nd column)?
As for books, have a look at the 4th book in the Inside SQL Server 2005 series. Query tuning and optimisation. There should be something in there.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 24, 2008 at 8:19 am
GilaMonster (4/24/2008)
The advice about making an indexes first column the most selective is often mentioned. I tend to disagreee with it. The index columns should be arranged so that the queries can use it. The first column should be the one that can appear alone in queries, not the most selective column.When it does a 2-column seek, the 2 columns are matched in a single operation. Again using the above example,
WHERE Colour = @x and Age = @y
does not mean that SQL will find all the colours = @x, then seek again to find the age @y. If does a single seek on both conditions.
I agree with the above, however the non-leaf pages still need to be read - if the less selective column is first it can potentially mean more reads (only one more if I'm thinking straight, but 1 is more than 0!).
Of course, if there is any chance of different queries being used in future then the index would have to be reevaluated.
April 24, 2008 at 11:40 pm
matt stockham (4/24/2008)
I agree with the above, however the non-leaf pages still need to be read - if the less selective column is first it can potentially mean more reads (only one more if I'm thinking straight, but 1 is more than 0!).
Could you please explain how that scenario works?
Bear in mind, that for a multi-column index, all the columns are in all the pages. Leaf and non-leaf (Ignoring the SQL 2005 include columns, that is)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 28, 2008 at 7:14 pm
check this out
i remember the query optimization stuff gail mentioned about can be found in the book "SQL Server Query Performance Tuning Distilled by sajal dam"
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply