Index Implementation Question

Question

Index Implementation Question

Steve F.

Mr or Mrs. 500

Points: 587
More actions
April 22, 2008 at 1:29 pm

#186822

For the sake of argument, suppose I have a 10-million-record table with no index at all. The fields in this table include DataWeek (a smalldatetime; 10 unique values) and CustomerKey (int; 1000 unique values). Suppose I want to create a single nonclustered index involving both fields.
Hypothetically, if EVERY query EVER run against this table used an equality filter in the WHERE clause on both fields (e.g., WHERE DataWeek = '4/12/2008' AND CustomerKey = 100), would it matter whether i defined my index as (DataWeek, CustomerKey) or (CustomerKey, DataWeek), and if so, how?

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Steve F. (4/22/2008)
For the sake of argument, suppose I have a 10-million-record table with no index at all. The fields in this table include DataWeek (a smalldatetime; 10 unique values) and CustomerKey (int; 1000 unique values). Suppose I want to create a single nonclustered index involving both fields.
Hypothetically, if EVERY query EVER run against this table used an equality filter in the WHERE clause on both fields (e.g., WHERE DataWeek = '4/12/2008' AND CustomerKey = 100), would it matter whether i defined my index as (DataWeek, CustomerKey) or (CustomerKey, DataWeek), and if so, how?

Hypothetically, probably not. This looks like something that a good test would help solve if there are any takers out there (Jeff, Matt).

Is your hypothetical situation ever going to happen, probably not, and then you do have to look carefully at how you construct your indexes.

😎

Steve F. Mr or Mrs. 500 Points: 587 More actions · Answer 2

In that case, I have another question: In what ways is the index physically implemented differently depending on the order of the columns within it? I understand that the first field in an index determines whether SQL Server can use it, but is the index ordered first by the first column, making range queries more effective on the first column in the index? I'm just prodding for info here; I'm curious.

Thanks again.

matt stockham SSCrazy Eights Points: 9942 More actions · Answer 3

You should put the most selective column first ie the column with the greater number of distinct values.

The index would be ordered by the first column then second column so you are correct in that a range query is fine on the first column but the index may not help on the second column if the query doesn't include the first. Also statistics are really only useful for the first column - although the stats include cardinality / selectivity estimates for subsequent columns, they are only relevant in some situations eg determining whether the index can cover the query.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

The index is implemented much like a telephone directory. If you have a composite index (colour, age as a very contrived example) then the index pages look something like this

Black 75

Blue 40

Blue 50

Blue 55

Green 1

Green 10

Red 0

Red 100

...

If all your queries are equality on both columns, the order doesn't matter. It becomes important when you have range seeks or single column seeks.

The contrived example above can be used to seek the following queries

WHERE Colour = @x and Age = @y

WHERE Colour = @x and Age between @y1 and @y2

WHERE Colour BETWEEN @x1 and @x2

The folowing requires a scan, because the query does not inlcude the left column of the index

WHERE Age = @y

The advice about making an indexes first column the most selective is often mentioned. I tend to disagreee with it. The index columns should be arranged so that the queries can use it. The first column should be the one that can appear alone in queries, not the most selective column

When it does a 2-column seek, the 2 columns are matched in a single operation. Again using the above example,

WHERE Colour = @x and Age = @y

does not mean that SQL will find all the colours = @x, then seek again to find the age @y. If does a single seek on both conditions.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

Oh, and with the selectivity you're talking about (1000 unique values in 10 million rows), if that index is not covering, SQL is not going to use that index. Too expensive to do the RID lookups.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

And some testing...

SET STATISTICS IO ON

GO

SET STATISTICS TIME ON

GO

CREATE TABLE TestingIndexOrder (

ID INT IDENTITY,

DayOfWeek smallint,

CustomerKey int

)

GO

insert INTO TestingIndexOrder (DayOfWeek, CustomerKey)

SELECT DATEPART(dw,DATEADD(dd,number, '1900/01/01')), FLOOR(RAND(number*5452)*1000)

FROM (

SELECT TOP 4000000 v1.number, v1.number + v2.number AS Num

FROM master..spt_values v1, master..spt_values v2 WHERE v1.name is NULL AND v2.name IS null

UNION ALL

SELECT TOP 4000000 v1.number, v1.number + v2.number AS Num

FROM master..spt_values v1, master..spt_values v2 WHERE v1.name is NULL AND v2.name IS null

UNION ALL

SELECT TOP 2000000 v1.number, v1.number + v2.number AS Num

FROM master..spt_values v1, master..spt_values v2 WHERE v1.name is NULL AND v2.name IS null ) x

GO

CREATE INDEX idx_test1 ON TestingIndexOrder (DayOfWeek, CustomerKey)

GO

SELECT DayOfWeek, CustomerKey FROM TestingIndexOrder WHERE DayOfWeek =3 AND CustomerKey = 504

/*

(4096 row(s) affected)

Table 'TestingIndexOrder'. Scan count 1, logical reads 14, physical reads 0, read-ahead reads 0

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 121 ms.

*/

DROP INDEX idx_test1 ON TestingIndexOrder

CREATE INDEX idx_test2 ON TestingIndexOrder (CustomerKey, DayOfWeek)

GO

SELECT DayOfWeek, CustomerKey FROM TestingIndexOrder WHERE DayOfWeek =3 AND CustomerKey = 504

/*

(4096 row(s) affected)

Table 'TestingIndexOrder'. Scan count 1, logical reads 14, physical reads 0, read-ahead reads 0

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 123 ms.

*/

DROP TABLE TestingIndexOrder

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

Gail,

Thanks for jumping in on this and doing some testing. I tried to set up a test myself, but my mind wasn't working right and the test data I contrived didn't meet the needs of the hypothetical question.

I agree with you regarding the first column of an index, it shouldn't always be the most selective, but should be based on the queries used on the tables.

😎

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

I've seen a case where a person took the most-selectivel advice literally, and made the leading column of every index the primary key (an identity)

He was very curious as to why he wasn't seeing index seeks. (none of the queries filtered on the PK)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Steve F. Mr or Mrs. 500 Points: 587 More actions · Answer 9

Great info. Thanks all.

Any recommendations for books that cover this topic in depth?

Steve F. Mr or Mrs. 500 Points: 587 More actions · Answer 10

GilaMonster (4/24/2008)
When it does a 2-column seek, the 2 columns are matched in a single operation. Again using the above example,
WHERE Colour = @x and Age = @y
does not mean that SQL will find all the colours = @x, then seek again to find the age @y. If does a single seek on both conditions.

Hmm... Does this mean that the order of the columns in an index -after- the first column is irrelevant?

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

No. You can generalise my 2 column example to as many columns as you have in the index.

For example, a 3 column index (A, B, C) is most optimal for queries of the following forms

WHERE A = @a and B = @b-2 and C=@c

WHERE A = @a and B = @b-2 and C between @c1 and @c2

WHERE A = @a and B = @b-2

WHERE A = @a and B between @b1 and @b2

WHERE A = @a

WHERE A between @a1 and @a2

It's less optimal for things like

WHERE A = @a and C=@c (because the where clause predicates are not a left-based subset of the index.)

It will require a seek on A=@a, followed by a filter to do C=@c

Queries of the following form require a scan

WHERE B = @b-2 and C=@c

Think of a phone book, where entries are ordered by surname, first name, phone number

Which is easier, to find all the people with a surname Brown (1st column) and a phone number of 555-6572 (3rd column)

or to find all the people with a surname Brown (1st column) and a first name of Matt (2nd column)?

As for books, have a look at the 4th book in the Inside SQL Server 2005 series. Query tuning and optimisation. There should be something in there.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

matt stockham SSCrazy Eights Points: 9942 More actions · Answer 12

GilaMonster (4/24/2008)
The advice about making an indexes first column the most selective is often mentioned. I tend to disagreee with it. The index columns should be arranged so that the queries can use it. The first column should be the one that can appear alone in queries, not the most selective column.
When it does a 2-column seek, the 2 columns are matched in a single operation. Again using the above example,
WHERE Colour = @x and Age = @y
does not mean that SQL will find all the colours = @x, then seek again to find the age @y. If does a single seek on both conditions.

I agree with the above, however the non-leaf pages still need to be read - if the less selective column is first it can potentially mean more reads (only one more if I'm thinking straight, but 1 is more than 0!).

Of course, if there is any chance of different queries being used in future then the index would have to be reevaluated.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 13

matt stockham (4/24/2008)
I agree with the above, however the non-leaf pages still need to be read - if the less selective column is first it can potentially mean more reads (only one more if I'm thinking straight, but 1 is more than 0!).

Could you please explain how that scenario works?

Bear in mind, that for a multi-column index, all the columns are in all the pages. Leaf and non-leaf (Ignoring the SQL 2005 include columns, that is)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

kp_kumar Mr or Mrs. 500 Points: 534 More actions · Answer 14

check this out

i remember the query optimization stuff gail mentioned about can be found in the book "SQL Server Query Performance Tuning Distilled by sajal dam"