June 25, 2009 at 2:05 am
Hi
Something interesting, to me anyway, that I've noticed about index usage that I would like to know more about. When I do an initial SELECT, it scans the clustered primary key. When I create both of the new indices and do either of the SELECTs, the db engine uses the index that I created last. Why?
Code to reproduce:
CREATE TABLE dbo.Tmp_bob
(
pkColA varchar(15) NOT NULL,
otherCol float(53) NULL,
pkColB varchar(3) NOT NULL,
otherCols datetime NULL
)
GO
ALTER TABLE dbo.Tmp_bob ADD CONSTRAINT
PK_Tmp_bob PRIMARY KEY CLUSTERED
(
pkColA,
pkColB
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
SELECT pkColA FROM Tmp_bob
-- It scans the clustered primay key.
CREATE NONCLUSTERED INDEX IX_Tmp_bob_pkColA ON Tmp_bob(pkColA) WITH PAD_INDEX, FILLFACTOR = 80
CREATE NONCLUSTERED INDEX IX_Tmp_bob_pkColB ON Tmp_bob(pkColB) WITH PAD_INDEX, FILLFACTOR = 80
SELECT pkColA FROM Tmp_bob
-- It scans IX_Tmp_bob_pkColB
DROP INDEX Tmp_bob.IX_Tmp_bob_pkColA
DROP INDEX Tmp_bob.IX_Tmp_bob_pkColB
CREATE NONCLUSTERED INDEX IX_Tmp_bob_pkColB ON Tmp_bob(pkColB) WITH PAD_INDEX, FILLFACTOR = 80
CREATE NONCLUSTERED INDEX IX_Tmp_bob_pkColA ON Tmp_bob(pkColA) WITH PAD_INDEX, FILLFACTOR = 80
SELECT pkColA FROM Tmp_bob
-- It scans IX_Tmp_bob_pkColA
Thanks.
June 25, 2009 at 2:25 am
Because the last nonclustered index that you added is covering (contains all the columns that the query needs) and is smaller than the cluster (because the cluster is the table). Hence scanning that NC index is faster and requires less IOs than scanning the cluster.
In fact, both of the nonclustered indexes are covering, since a nonclustered index includes the clustering key. The two indexes are, for that query, equivalent and hence SQL can use either.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 25, 2009 at 3:54 am
Yes you are right .A bit strange (to us that why the index that is created second is used)..but there might be some logic we dont know .
main thing is , will it perform the same way once some good amount of data is in inside the table .
I have not done any stuff to populate this table .
Might try , but very occupied as of now .
Regards
Abhay Chaudhary
Sr.DBA (MCITP/MCTS :SQL Server 2005/2008 ,OCP 9i)
June 25, 2009 at 5:09 am
It's also worth noting that if there is a choice of covering indexes that can be used to return the results, the one occupying the smallest number of pages is chosen.
You can test this on any table with a clustered primary key and a few other indexes of varying sizes thus:
SELECT
ColumnAtTopOfPrimaryKey
FROM
TheTable
I wouldn't recommend testing behaviour on empty tables either, because I think it can produce misleading results.
Tim
.
June 25, 2009 at 5:36 am
Tim Walker (6/25/2009)
I wouldn't recommend testing behaviour on empty tables either, because I think it can produce misleading results.
Indeed. Reading 0 pages is pretty quick no matter what index.
Load the table up with a million or so rows and, provided that the length of the strings in those varchars are close to the declared value, I suspect that the index with B as the key col will most likely be favoured. It's close and the difference is unlikely to appear on small-mid sized tables
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 25, 2009 at 8:50 am
I just tested with both ten thousand and a million rows. The results are different now - only IX_Tmp_bob_pkColB is used, independent of index creation order. If I DROP IX_Tmp_bob_pkColB then IX_Tmp_bob_pkColA is used.
Why would it use IX_Tmp_bob_pkColB if I haven't included any reference/data/information about pkColA in it? ("RTFM on covering indices!"?)
June 25, 2009 at 9:03 am
You have included it - you just didn't realise!
Where you have a unique clustered index or primary key an index uses this to point to the data. Your example makes it a bit tricky to see this because you are using your columns for both the PK and the example indexes.
A clearer example:
Table contains ColA, ColB, ColC, ColD, ColE
Clustered PK (or unique clustered index) on ColA, ColB
Secondary Index on ColC
What the secondary index actually holds is the relationship between ColC and ColA, ColB.
So it is a covering index!
Tim
.
June 25, 2009 at 9:05 am
Additionally I am guessing it uses the index based on the varchar(3) column because it occupies less pages than the varchar(15) one.
🙂
Tim
.
June 25, 2009 at 9:29 am
Thanks for the index relationship info Tim. I also thought that it would prefer to lookup on a smaller index.
June 25, 2009 at 9:38 am
Tim Walker (6/25/2009)
Additionally I am guessing it uses the index based on the varchar(3) column because it occupies less pages than the varchar(15) one.
Depending, of course, on the size of the string stored. If all the strings in the varchar(3) are exactly 3 chars long and all the strings in the varchar(15) are either 2 or 3 characters long, then the index on the varchar(15) will be smaller.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 25, 2009 at 9:42 am
GDI Lord (6/25/2009)
IWhy would it use IX_Tmp_bob_pkColB if I haven't included any reference/data/information about pkColA in it? ("RTFM on covering indices!"?)
As I mentioned earlier, the clustering key is always part of all nonclustered indexes, whether you specify them of not. It has to be, it's the 'address' of the row. Hence both pk_ColA and pk_ColB will be part of every single nonclustered index.
That, incidentally, is why one of the recommendations for a clustered index key is 'narrow'.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 25, 2009 at 9:48 am
GilaMonster (6/25/2009)
Tim Walker (6/25/2009)
Additionally I am guessing it uses the index based on the varchar(3) column because it occupies less pages than the varchar(15) one.Depending, of course, on the size of the string stored. If all the strings in the varchar(3) are exactly 3 chars long and all the strings in the varchar(15) are either 2 or 3 characters long, then the index on the varchar(15) will be smaller.
Absolutely right!
Hey Gail, do you know what would happen if one index was narrower but fragmented so actually occupied more pages than a wider one? Which would be chosen then?
Tim
.
June 25, 2009 at 10:21 am
Tim Walker (6/25/2009)
Hey Gail, do you know what would happen if one index was narrower but fragmented so actually occupied more pages than a wider one? Which would be chosen then?
Well, fragmented doesn't necessarily mean more pages, I can have an index that's 99% fragmented but all the pages are close to 100% full.
Assuming you mean one of the indexes has pages only half full. Haven't tested, don't know for sure, but I suspect it wouldn't factor into the decision. I don't believe that the optimiser knows how fragmented an index is at the time that it compiles the query. I suspect it will base that solely on the statistics of the index which includes the average length of the columns.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 25, 2009 at 10:49 am
GilaMonster (6/25/2009)
Tim Walker (6/25/2009)
Hey Gail, do you know what would happen if one index was narrower but fragmented so actually occupied more pages than a wider one? Which would be chosen then?Well, fragmented doesn't necessarily mean more pages, I can have an index that's 99% fragmented but all the pages are close to 100% full.
Assuming you mean one of the indexes has pages only half full. Haven't tested, don't know for sure, but I suspect it wouldn't factor into the decision. I don't believe that the optimizer knows how fragmented an index is at the time that it compiles the query. I suspect it will base that solely on the statistics of the index which includes the average length of the columns.
Sorry, very sloppy terminology from me. Yes I did mean one that was sparse, e.g. though the original fill factor, page splits, deletions, or whatever!
I thought about it a bit after I asked the question and I think you must be right. I might test it sometime to find out. Specifying the fill factor to make an index bigger should work I think. I'm sure the optimizer won't consider the original fill factor as it isn't maintained over time.
Thanks for the reply.
Tim
.
June 26, 2009 at 8:11 am
Thanks Gila!
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply