Why does SQL insist on a clustered index scan?

  • Hi,

    The following query is driving me nuts, it insists on doing a scan when it makes no sense.

    SELECTl.ID

    ,m.ID

    FROMTBL_LIST l -- 200,337,067 rows

    LEFT OUTER JOINTBL_MASTER m -- 4,200,073 rows

    ON l.MasterID = m.ID

    WHEREl.DocID IN(SELECTID

    FROM TBL_DOC -- 206,461,257 rows

    WHERE FK IN (@P0, @P1, @P2, @P3, @P4))

    Thing is, on the particular set of parameters I am using, there are no rows returned from TBL_MASTER, yet it insists on a scan.

    SQL Server Execution Times:

    CPU time = 797 ms, elapsed time = 794 ms.

    Table 'TBL_MASTER'. Scan count 1, logical reads 23374

    If, however, I use FORCESEEK it works as expected.

    SQL Server Execution Times

    CPU time = 16 ms, elapsed time = 5 ms.

    Table 'TBL_MASTER'. Scan count 0, logical reads 103

    I cannot change the SQL, it is embedded in a 3rd party app. There is a clustered index on TBL_MASTER (ID), TBL_LIST(ID) and on TBL_DOC (ID). Statistics are up-to-date. Why the insistence of a scan?

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • A scan + hash could look reasonable based on the row counts. Probably a multicolumn statistic could solve the issue.

    Can you post the actual execution plan for the scan version?

    -- Gianluca Sartori

  • Agree with posting the actual execution plan, but I post both (with and without the forceseek).

    My guess is that the query optimizer believes a table scan to be the best solution for this query. Remember clustered index scan == table scan.

  • The cluster is on ID but you're searching on DOCID. Unless that's a typo, that would explain why you're getting the scan.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Lynn Pettis (6/21/2011)


    Remember clustered index scan == table scan.

    I'm actutely aware of that, hence the concern over this behaviour.

    Grant Fritchey (6/21/2011)


    The cluster is on ID but you're searching on DOCID. Unless that's a typo, that would explain why you're getting the scan.

    Not a typo, but I ommitted the fact that a nonclustered index on DOCID is performing a seek. This is not the table that is performing a table scan.

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • OK, so I found the problem last night. When I said the stats were up-to-date, I had only checked TBL_MASTER, the table performing the scan. However it turns out the stats were wrong on TBL_LIST and as such SQL determined a table scan had a lower cost than a seek. What I don't completely understand is that the stats were not old, just completely wrong. The STATS_DATE on all the stats for that table was Thursday evening, exactly the same time our problems started. Anyway, thanks for the replies, I'm going to start monitoring the scan counts on a daily basis.

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • Just guessing: maybe the stats got updated automatically with a lower sample? I ran into this problem at times and decided to turn off stats recompute. You will need an additional maintenance task if you decide to go down this road.

    -- Gianluca Sartori

  • Gianluca Sartori (6/22/2011)


    Just guessing: maybe the stats got updated automatically with a lower sample? I ran into this problem at times and decided to turn off stats recompute. You will need an additional maintenance task if you decide to go down this road.

    Yeah, quite possible I suppose. The DBA is currently working on a new maintenance job to do stats updates, I will recommend turning off auto update.

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • Sean Pearce (6/22/2011)


    Gianluca Sartori (6/22/2011)


    Just guessing: maybe the stats got updated automatically with a lower sample? I ran into this problem at times and decided to turn off stats recompute. You will need an additional maintenance task if you decide to go down this road.

    Yeah, quite possible I suppose. The DBA is currently working on a new maintenance job to do stats updates, I will recommend turning off auto update.

    I would be extremely careful on this. Generally speaking, statistics recompute is a good thing and it has to be disabled only when found to cause issues. I would recommend a thorough analysis before disabling it.

    -- Gianluca Sartori

  • With the IN clause, the optimizer will use distribution type stats to determine estimated rows and if you have some values that are a large fraction of the total rows in the table you can wind up with an Estimated Rowcount that will make the mathematics work out such that a table scan is most efficient.

    Note: you can force a plan for this particular query with a plan guide. However, if you DO have skew you will get screwed by horrible performance if it really does need to access a large number of rows. I also note that the IN clause probably varies in the number of parameters for the clause, and a plan guide will not help on each of them separately.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply