June 21, 2011 at 7:02 am
Hi,
The following query is driving me nuts, it insists on doing a scan when it makes no sense.
SELECTl.ID
,m.ID
FROMTBL_LIST l -- 200,337,067 rows
LEFT OUTER JOINTBL_MASTER m -- 4,200,073 rows
ON l.MasterID = m.ID
WHEREl.DocID IN(SELECTID
FROM TBL_DOC -- 206,461,257 rows
WHERE FK IN (@P0, @P1, @P2, @P3, @P4))
Thing is, on the particular set of parameters I am using, there are no rows returned from TBL_MASTER, yet it insists on a scan.
SQL Server Execution Times:
CPU time = 797 ms, elapsed time = 794 ms.
Table 'TBL_MASTER'. Scan count 1, logical reads 23374
If, however, I use FORCESEEK it works as expected.
SQL Server Execution Times
CPU time = 16 ms, elapsed time = 5 ms.
Table 'TBL_MASTER'. Scan count 0, logical reads 103
I cannot change the SQL, it is embedded in a 3rd party app. There is a clustered index on TBL_MASTER (ID), TBL_LIST(ID) and on TBL_DOC (ID). Statistics are up-to-date. Why the insistence of a scan?
June 21, 2011 at 7:07 am
A scan + hash could look reasonable based on the row counts. Probably a multicolumn statistic could solve the issue.
Can you post the actual execution plan for the scan version?
-- Gianluca Sartori
June 21, 2011 at 7:29 am
Agree with posting the actual execution plan, but I post both (with and without the forceseek).
My guess is that the query optimizer believes a table scan to be the best solution for this query. Remember clustered index scan == table scan.
June 21, 2011 at 8:47 am
The cluster is on ID but you're searching on DOCID. Unless that's a typo, that would explain why you're getting the scan.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 22, 2011 at 1:16 am
Lynn Pettis (6/21/2011)
Remember clustered index scan == table scan.
I'm actutely aware of that, hence the concern over this behaviour.
Grant Fritchey (6/21/2011)
The cluster is on ID but you're searching on DOCID. Unless that's a typo, that would explain why you're getting the scan.
Not a typo, but I ommitted the fact that a nonclustered index on DOCID is performing a seek. This is not the table that is performing a table scan.
June 22, 2011 at 1:20 am
OK, so I found the problem last night. When I said the stats were up-to-date, I had only checked TBL_MASTER, the table performing the scan. However it turns out the stats were wrong on TBL_LIST and as such SQL determined a table scan had a lower cost than a seek. What I don't completely understand is that the stats were not old, just completely wrong. The STATS_DATE on all the stats for that table was Thursday evening, exactly the same time our problems started. Anyway, thanks for the replies, I'm going to start monitoring the scan counts on a daily basis.
June 22, 2011 at 1:27 am
Just guessing: maybe the stats got updated automatically with a lower sample? I ran into this problem at times and decided to turn off stats recompute. You will need an additional maintenance task if you decide to go down this road.
-- Gianluca Sartori
June 22, 2011 at 1:48 am
Gianluca Sartori (6/22/2011)
Just guessing: maybe the stats got updated automatically with a lower sample? I ran into this problem at times and decided to turn off stats recompute. You will need an additional maintenance task if you decide to go down this road.
Yeah, quite possible I suppose. The DBA is currently working on a new maintenance job to do stats updates, I will recommend turning off auto update.
June 22, 2011 at 2:21 am
Sean Pearce (6/22/2011)
Gianluca Sartori (6/22/2011)
Just guessing: maybe the stats got updated automatically with a lower sample? I ran into this problem at times and decided to turn off stats recompute. You will need an additional maintenance task if you decide to go down this road.Yeah, quite possible I suppose. The DBA is currently working on a new maintenance job to do stats updates, I will recommend turning off auto update.
I would be extremely careful on this. Generally speaking, statistics recompute is a good thing and it has to be disabled only when found to cause issues. I would recommend a thorough analysis before disabling it.
-- Gianluca Sartori
June 23, 2011 at 9:55 am
With the IN clause, the optimizer will use distribution type stats to determine estimated rows and if you have some values that are a large fraction of the total rows in the table you can wind up with an Estimated Rowcount that will make the mathematics work out such that a table scan is most efficient.
Note: you can force a plan for this particular query with a plan guide. However, if you DO have skew you will get screwed by horrible performance if it really does need to access a large number of rows. I also note that the IN clause probably varies in the number of parameters for the clause, and a plan guide will not help on each of them separately.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply