order of criteria in where clause changes query palan

  • Adi Cohn-120898 (11/4/2011)


    I think that this is a bug, but I might be wrong. Can anyone have an explanation for that?

    Sure. I think most people will say that the explanation I'm about to give is a bug, but if you worked on the query optimization team I'm pretty sure you'd call it a limitation 🙂

    The difference between the two plans is in the Clustered Index Seek. In one case, it seeks on DataCLInd >= @LastRunTime and then applies a residual predicate of DataCLInd >= dateadd(month,(-1),getutcdate()). In the second case, the seek and residual predicates are reversed. This is the crux of the matter.

    The seek on >= @LastRunTime will match all but a few hundred of the rows in the table, meaning that the second predicate is evaluated on 4 million rows or so, which is obviously very expensive (I'll call this plan A). When we seek instead on the GETUTCDATE expression, no rows qualify so there's nothing to apply the residual predicate to (plan B).

    Of course, both plans show the same number of estimated rows finally emitted by the Clustered Index Seek, the key point is the order the seek and residual predicates are applied within that operator. The SQL Server costing component takes this order into consideration when estimating the cost of the operator, so you see the plan A seek has an estimated operator cost of 6.2 whereas the plan B operator has a cost of 0.009 or so.

    Plan A has a high enough estimated cost to complete phase I optimization and then re-run phase I looking for parallel plans. By the by, it terminates after the second run of phase I because it does not meet the criteria to enter phase II query optimization. Plan B on the other hand has a very small estimated cost and optimization terminates early during phase I because the cost of the whole plan is by now only 0.025, so it makes no sense to spend any more time to find an optimization that improves on this.

    So, the limitation (or bug) comes down to how the optimizer chooses which predicate becomes the seek and which becomes a residual predicate. In this specific case, both predicates result in guessed cardinality estimates on the same column, and since both estimates are the same, the optimizer has no way to know which would be better as seek or residual. So, it does nothing, and simply follows the written order of the query. This is reasonable: people tend to write queries with the most restrictive predicate first, so in the absence of information to the contrary, the optimizer goes with that.

    As is so often the case, we get a better plan if we give the optimizer more information. One way to do this is to avoid the cardinality guessing, and we can do that by giving the optimizer real data to compare against the statistics. Adding OPTION (RECOMPILE) to the query allows SQL Server to use the runtime values of @LastRunTime and GETUTCDATE to build a good plan. This plan will not be cached, so it doesn't have to worry about producing a plan that will perform reasonably for other values. You do need a relatively modern version of SQL Server for OPTION (RECOMPILE) to work - at least 2008 SP1 CU5 or 2008 R2 CU1. For earlier versions, you can either construct dynamic SQL with literal values, use sp_executesql, or to recognise that if column1 > @value1 and column1 > @value2 then column1 must contain a value that is greater than the highest value of (@value1, @value2). It's easy to rewrite the query so that @LastRunTime and the GETUTCDATE expression are evaluated outside the query, and only the highest used as a predicate for the column.

  • Hi Paul & co,

    This is one of the places where I'd consider that breaking sargability can help, as a non-sargable predicate isn't going to be used in a Seek Predicate (at least, not until the QO sorted out invertibility), and why I think FORCESCAN (can't believe I've mentioned FORCESCAN in a coversation with a New Zealander and South African - that's just asking for trouble) should be a Predicate Hint, not an Index Hint.

    And yes - residualiciousness seems to be coming into play here. I've written posts on this, like:

    http://sqlblog.com/blogs/rob_farley/archive/2011/05/25/covering-index-because-it-s-covering-up-the-truth.aspx

    http://sqlblog.com/blogs/rob_farley/archive/2011/05/25/turning-a-scan-into-a-seek-with-a-pointless-predicate.aspx

    http://sqlblog.com/blogs/rob_farley/archive/2011/05/19/covering-schmuvvering-when-a-covering-index-is-actually-rubbish.aspx

    Hope these help,

    Rob

    Rob Farley
    LobsterPot Solutions & Adelaide SQL Server User Group
    Company: http://www.lobsterpot.com.au
    Blog: http://blogs.lobsterpot.com.au

  • Thank you for everyone that participated in this thread. I’ve learned a lot from it, (and also have more reading to do about the subject).

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 3 posts - 16 through 17 (of 17 total)

You must be logged in to reply to this topic. Login to reply