August 28, 2005 at 7:50 am
I have the following in my MS-SQL 2000 DB:
Table Name: PCAPostWord
Columns: ID, WordID, ThreadID, PostWordSortOrder (all of type "int")
ID/WordID is a clustered index; ThreadID is indexed, PostWordSortOrder is indexed.
The table has about 9,000,000 rows.
From SQL Query Analyzer, if I do:
SELECT PostWordSortOrder FROM PCAPostWord WHERE PostWordSortOrder = 2
it takes under a second to return 25,000 rows.
If I do:
SELECT WordID FROM PCAPostWord WHERE PostWordSortOrder = 2
it takes under a second to return 25,000 rows.
But if I do:
SELECT ThreadID FROM PCAPostWord WHERE PostWordSortOrder = 2
it takes about 30 seconds to return 25,000 rows
Question: Why would it take so much longer to SELECT the ThreadID than the other columns?
Thanks.
Robert
August 28, 2005 at 8:46 am
Because for each of the 25000 rows that have PostWordSortOrder = 2 a bookmark lookup is needed. The results for the first 2 queries can be created and returned by doing an index seek in the PostWordSortOrder index to find the matching rows. The data to output is then found right there, in the first case it is the same column as is indexed, in the second it is part of the clustered index key which always exists in the nonclustered index rows.
But for the third query you are referencing a column that is not found in the nonclustered index PostWordSortOrder. So for each of the 25000 hits a bookmakr lookup (in this case a clustered index seek) needs to be done, making the query much slower.
August 28, 2005 at 11:35 am
Chris,
Thanks very much for your quick response. After I saw your response, I looked at the execution plan, and indeed for the slow search a clustered index scan is being done, which accounts for the slowness.
So the next question is: What are the various alternatives for making the slow SELECT faster?
Thanks.
Robert
August 28, 2005 at 2:16 pm
Ok, then I was slightly incorrect in stating that 25000 bookmark lookups where done. Since so many lookups would be necessary SQL Server instead chose to do a clustered index scan directly. With the existing indexes that is probably the fastest plan for the existing data. If you need to make that query execute fast you could add a covering index on (PostWordSortOrder, ThreadID). But remember that too many indexes might have an overall negative impact on your system. It all depends on where the priorities are and how everything is used.
August 29, 2005 at 12:40 pm
So, If your query is run once a month, at 3 in the morning, maybe you can afford to have it take 30 seconds and avoid another index. But if it's critical to users, and they run it all day long, then maybe you need the index.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply