November 3, 2011 at 5:24 am
I have a an execution plan that insists on a Clustered Index Scan, and I would like to understand why. This is possibly a question for Gail.
I have two tables taking part in an inner join.
CREATE TABLE ObjectTypes (ID int, TypeID); -- 228,913,078 rows
CREATE TABLE Products (ID int, Description varchar(255)); -- 4,367,682 rows
Both of these tables have a clustered index on ID. ObjectTypes has a non-clustered index on TypeID.
The following code is causing a Clustered Index Scan on Products
SELECT p.ID
,p.Description
,ot.TypeID
FROM ObjectTypes ot
INNER JOIN Products p
ON ot.ID = p.ID
AND ot.TypeID IN (1023, 1025, 1034, 1033, 1527)
This query returns 40,371 rows.
The execution plan performed a Clustered Index Scan on Products and passes 4,367,682 rows to a Merge Join operator which outputs 40,371 rows. The cost of the scan is 22.87 and the cost of the join is 9.2.
If I run the query using WITH (FORCESEEK) I get a Clustered Index Seek on Products passing 40,371 rows to a Nested Loop operator. The problem is that the cost of the seek is 57.89 and the cost of the join is 0.1.
I have updated all statistics using FULLSCAN.
I have read Gail's blog post entitles "Is a scan a bad thing?". Gail states:
The scan is there because of the number of seeks that would otherwise be required to execute this. A seek can only return a single value or a range of values. If a seek is required to evalute a join, that seek has to run once for each row in the other table. This is the classic nested loop join.
Is it really more efficient to scan 4,367,682 rows and perform a merge join than it is to perform 40,371 seeks.
Is this as good as it gets, ie: is there another way to write this?
November 3, 2011 at 6:33 am
If the TypeIDs in the first table were contiguous in the ID index on the second table, the scan would return just the range needed, but most likely, it's not contiguous, so it has to scan the full clustered index.
Instead of the costs (which are often misleading), try turning on time and I/O stats and running both queries. Post the results here.
In case you don't know it off the top of your head, the command for that is "SET STATISTICS TIME, IO ON". Put that at the top of the script, right after "SET NOCOUNT ON", then both versions of the query (the standard one and the one with the seek being forced), then turn the stats off (same command, but "off" instead of "on").
See what that gives you. The data from that is much more useful than execution plan costs.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
November 3, 2011 at 6:48 am
Artoo22 (11/3/2011)
Is it really more efficient to scan 4,367,682 rows and perform a merge join than it is to perform 40,371 seeks
Probably yes.
40371 seeks requires at least 80742 page reads (and that's assuming the index been seeked is only 2 levels deep). Reading 4,367,682 does not require 4 million odd page reads, it's as many pages as are in the leaf level of the index, plus they will be sequential reads whereas the seeks are random reads.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 3, 2011 at 6:53 am
This is the "normal" one using the CIX Scan
Table 'Products'. Scan count 1, logical reads 24452, physical reads 0, read-ahead reads 0
Table 'ObjectTypes'. Scan count 5, logical reads 134, physical reads 0, read-ahead reads 0
SQL Server Execution Times:
CPU time = 797 ms, elapsed time = 793 ms.
And this is using the FORCESEEK
Table 'Products'. Scan count 0, logical reads 123645, physical reads 0, read-ahead reads 0
Table 'ObjectTypes'. Scan count 5, logical reads 134, physical reads 0, read-ahead reads 0
SQL Server Execution Times:
CPU time = 188 ms, elapsed time = 181 ms.
November 3, 2011 at 6:55 am
120 000 reads for the seek vs 20 000 for the scan. Pretty clear why SQL chooses a scan here.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 3, 2011 at 6:58 am
November 3, 2011 at 7:02 am
Not offhand.
Second execution (so not incurring data caching), execution plan off when you pulled those IO and time stats?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 3, 2011 at 7:17 am
this sentence clarified lot of things for me.
40371 seeks requires at least 80742 page reads (and that's assuming the index been seeked is only 2 levels deep). Reading 4,367,682 does not require 4 million odd page reads, it's as many pages as are in the leaf level of the index, plus they will be sequential reads whereas the seeks are random reads.
I always thought if it is big table then index seek would be faster. it is clear that for smaller table it is possible that even table scan is faster then index seek. But in this case it is different. probably this is the reason most of the answers relevant to performance tuning and index is "depends" (i..e table size, execution plan, existing index etc. etc.). Really interesting.
November 3, 2011 at 7:20 am
dva2007 (11/3/2011)
I always thought if it is big table then index seek would be faster. it is clear that for smaller table it is possible that even table scan is faster then index seek.
A table scan won't be faster than an index seek (except in a couple odd edge cases), however if SQL has a choice of a table scan or several thousand index seeks, then the scan can be faster.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 3, 2011 at 7:39 am
Point taken. thanks gail.
November 3, 2011 at 8:34 am
GilaMonster (11/3/2011)
Not offhand.Second execution (so not incurring data caching), execution plan off when you pulled those IO and time stats?
Before running each one I issued DBCC FREEPROCCACHE & DBCC DROPCLEANBUFFERS.
I ran each one twice in succession.
I was generating the execution plan.
I have run the same test with execution plan off and I am getting the same results. WITH FORCESEEK is 4 times faster.
November 3, 2011 at 9:20 am
Scan will retrieve and touch all rows of the table whether or not the retrieved data is qualified for the request .It will be very costly for SQL server, since the cost is proportionate to total number of rows in the table. And remember SQL server query processor is cost-based optimizer.
This will only be ok if the table and requested data rows are VERY small.
Seek on the other hand will retrieve only the qualified data and the cost will be proportionate to number of qualifying data, cost will be very cheap.
Therefore, my answer to you is NEVER, especially in today’s production environment, where SMALL is not in the dictionary.
November 3, 2011 at 9:23 am
Omar Ismail (11/3/2011)
Therefore, my answer to you is NEVER, especially in today’s production environment, where SMALL is not in the dictionary.
Unfortunately this is wrong. I can easily conjure an example on a really large set of tables where a scan is faster than multiple index seeks. Scans are not necessarily bad things to have.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 3, 2011 at 9:26 am
Artoo22 (11/3/2011)
Before running each one I issued DBCC FREEPROCCACHE & DBCC DROPCLEANBUFFERS.I ran each one twice in succession.
I was generating the execution plan.
I have run the same test with execution plan off and I am getting the same results. WITH FORCESEEK is 4 times faster.
In that case you may want to use the hint, but if you do keep testing every month or so. As the data volume in the tables grows there will come a point where the scan is faster as well as doing less reads.
The optimiser's conservative, it will often tip to a scan before it's time-optimal to do so, so you're likely sitting in the space where the scan is slower while being estimated as cheaper.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 3, 2011 at 9:28 am
Might be useful to see the actual execution plans.
Viewing 15 posts - 1 through 15 (of 28 total)
You must be logged in to reply to this topic. Login to reply