Why does SQL insist on a clustered index scan?

Question

Post reply

Why does SQL insist on a clustered index scan?

Sean Pearce

SSCoach

Points: 15999
More actions
June 21, 2011 at 7:02 am

#239635

Hi,
The following query is driving me nuts, it insists on doing a scan when it makes no sense.
SELECTl.ID
,m.ID
FROMTBL_LIST l -- 200,337,067 rows
LEFT OUTER JOINTBL_MASTER m -- 4,200,073 rows
ON l.MasterID = m.ID
WHEREl.DocID IN(SELECTID
FROM TBL_DOC -- 206,461,257 rows
WHERE FK IN (@P0, @P1, @P2, @P3, @P4))
Thing is, on the particular set of parameters I am using, there are no rows returned from TBL_MASTER, yet it insists on a scan.
SQL Server Execution Times:
CPU time = 797 ms, elapsed time = 794 ms.
Table 'TBL_MASTER'. Scan count 1, logical reads 23374
If, however, I use FORCESEEK it works as expected.
SQL Server Execution Times
CPU time = 16 ms, elapsed time = 5 ms.
Table 'TBL_MASTER'. Scan count 0, logical reads 103
I cannot change the SQL, it is embedded in a 3rd party app. There is a clustered index on TBL_MASTER (ID), TBL_LIST(ID) and on TBL_DOC (ID). Statistics are up-to-date. Why the insistence of a scan?
The SQL Guy @ blogspot[/url]
@SeanPearceSQL
About Me[/url]

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

spaghettidba SSC Guru Points: 105732 More actions · Answer 1

A scan + hash could look reasonable based on the row counts. Probably a multicolumn statistic could solve the issue.

Can you post the actual execution plan for the scan version?

-- Gianluca Sartori

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 2

Agree with posting the actual execution plan, but I post both (with and without the forceseek).

My guess is that the query optimizer believes a table scan to be the best solution for this query. Remember clustered index scan == table scan.

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 3

The cluster is on ID but you're searching on DOCID. Unless that's a typo, that would explain why you're getting the scan.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Sean Pearce SSCoach Points: 15999 More actions · Answer 4

Lynn Pettis (6/21/2011)
Remember clustered index scan == table scan.

I'm actutely aware of that, hence the concern over this behaviour.

Grant Fritchey (6/21/2011)
The cluster is on ID but you're searching on DOCID. Unless that's a typo, that would explain why you're getting the scan.

Not a typo, but I ommitted the fact that a nonclustered index on DOCID is performing a seek. This is not the table that is performing a table scan.

The SQL Guy @ blogspot[/url]

@SeanPearceSQL

About Me[/url]

Sean Pearce SSCoach Points: 15999 More actions · Answer 5

OK, so I found the problem last night. When I said the stats were up-to-date, I had only checked TBL_MASTER, the table performing the scan. However it turns out the stats were wrong on TBL_LIST and as such SQL determined a table scan had a lower cost than a seek. What I don't completely understand is that the stats were not old, just completely wrong. The STATS_DATE on all the stats for that table was Thursday evening, exactly the same time our problems started. Anyway, thanks for the replies, I'm going to start monitoring the scan counts on a daily basis.

The SQL Guy @ blogspot[/url]

@SeanPearceSQL

About Me[/url]

spaghettidba SSC Guru Points: 105732 More actions · Answer 6

Just guessing: maybe the stats got updated automatically with a lower sample? I ran into this problem at times and decided to turn off stats recompute. You will need an additional maintenance task if you decide to go down this road.

-- Gianluca Sartori

Sean Pearce SSCoach Points: 15999 More actions · Answer 7

Gianluca Sartori (6/22/2011)
Just guessing: maybe the stats got updated automatically with a lower sample? I ran into this problem at times and decided to turn off stats recompute. You will need an additional maintenance task if you decide to go down this road.

Yeah, quite possible I suppose. The DBA is currently working on a new maintenance job to do stats updates, I will recommend turning off auto update.

The SQL Guy @ blogspot[/url]

@SeanPearceSQL

About Me[/url]

spaghettidba SSC Guru Points: 105732 More actions · Answer 8

Sean Pearce (6/22/2011)
Gianluca Sartori (6/22/2011)
Just guessing: maybe the stats got updated automatically with a lower sample? I ran into this problem at times and decided to turn off stats recompute. You will need an additional maintenance task if you decide to go down this road.
Yeah, quite possible I suppose. The DBA is currently working on a new maintenance job to do stats updates, I will recommend turning off auto update.

I would be extremely careful on this. Generally speaking, statistics recompute is a good thing and it has to be disabled only when found to cause issues. I would recommend a thorough analysis before disabling it.

-- Gianluca Sartori

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 9

With the IN clause, the optimizer will use distribution type stats to determine estimated rows and if you have some values that are a large fraction of the total rows in the table you can wind up with an Estimated Rowcount that will make the mathematics work out such that a table scan is most efficient.

Note: you can force a plan for this particular query with a plan guide. However, if you DO have skew you will get screwed by horrible performance if it really does need to access a large number of rows. I also note that the IN clause probably varies in the number of parameters for the clause, and a plan guide will not help on each of them separately.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service