SQL Execution Plan Bug!

  • Dears,

    I noticed the following SQL behavior and want to share my concerns with you.

    I believe that this should be considered as a bug in setting the execution plan in SQL Server!

    CREATE TABLE [Table1] (

    [C1] [int] IDENTITY (1, 1) NOT NULL ,

    [C2] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [C3] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [C4] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED

    ( [C1] ) ON [PRIMARY]

    ) ON [PRIMARY]

    --GO

    INSERT INTO [Table1] ([C2],[C3],[C4])VALUES('a','b','c')

    INSERT INTO [Table1] ([C2],[C3],[C4])VALUES('d','e','f')

    INSERT INTO [Table1] ([C2],[C3],[C4])VALUES('g','h','i')

    -- APPLY THE BELOW WITH SHOW EXECUTION PLAN ON

    DECLARE @X INT

    SET @X = 2

    SELECT C1, C2, C3, C4

    FROM Table1

    WHERE C1 = @X OR @X = -1

    -- CLUSTERED INDEX SCAN!!!! 85.46%

    SELECT C1, C2, C3, C4

    FROM Table1

    WHERE C1 = @X --OR @X = -1

    -- CLUSTERED INDEX SEEK 14.54%

    What do you think?

    I know that I can solve this issue by implementing dynamic queries, but all my stored procedures are built on this principle which is making them generic.

    Most of my WHERE conditions look like:

    WHERE (Table.Col1 = @Col1 OR @Col1 = -1)

    AND (Table.Col2 = @Col2 OR @Col2 = -1)

    AND (Table.Col3 = @Col3 OR @Col3 = -1)

    AND (Table.Col4 = @Col4 OR @Col4 = -1)

    AND (Table.Col5 = @Col5 OR @Col5 = -1)

    And so when I call this SP I can do the filter in the way I like, I can send -1 for the column I don’t like to apply the filter on.

    Unfortunately, I noticed that the performance is so bad! And discovered that Indexes Scan is being done always!

  • I wouldn't call this a bug, SQL is evaluating @X = -1 for each row so it is using a scan. It can't use the primary key to find the rows since "or @X = -1" doesn't have anything to do with the primary key. Generally, using dynamic SQL is frowned upon if you can avoid it. But it is a business decision you have to make. If the proc is used a lot and performance is a big issue, you might have to go that route.

    The only other solutions I can think of would be so extremely tedious that they wouldn't be worth it. For instance, creating a separate stored proc for each possible combination of parameters. For 5 columns that would be 120 combinations if I did the math right.

    Aunt Kathi Data Platform MVP
    Author of Expert T-SQL Window Functions
    Simple-Talk Editor

  • In effect your query is one of the type "give me every record or give me some records".  However the Plan must deal with both situations....and a "give me every record" query invokes a SCAN....be that INDEX scan or TABLE SCAN.

    Your 2 examples are not equivalent....and thus the results are not equivalent either. 

    Reexamining the need to have such flexable code may be the route to go.

  • Break the SQL statement into two statements with a UNION operator.  This should allow for an index seek on ends of the sql statement.

  • Unless I misunderstand your code, @x is a value determined *before* the select and presumably does not change during the process!!!!, so it would make much more sense to evaluate @x in logic, and conditionally call the select in an 'if' statement.

    ...

    -- FORTRAN manual for Xerox Computers --

  • its a bad behaviour! scanning the table searching nothing! do you agree?

  • Two examples are 100% equivilant in case you send value other than -1. see in my example i am sending 2 as a values and the returned dataset is only 1 row! scan should not occure.

  • i agree with you. but what if you have a query where you have 15 filtering condtions! you will have too many if conds in your code!

    if you have condtions: c1, c2, c3 then the if will look like:

    if c1 is true and c2 is false and c3 is false

    then ...

    if c1 is true and c2 is true and c3 is false

    then ...

    if c1 is true and c2 is true and c3 is true

    then ...

    if c1 is false and c2 is true and c3 is false

    then ...

     

    etc ...

     

  • SQL Server could perform the clustered index scan for the @x = -1 and then perform an index seek for C1 = @X, and then perform a union on the result sets for both queries.  But it does not.  It takes the most efficient route, of only performing a single clustered index scan, which can evaluate and satisfy both conditions.

    You have to understand that SQL Server cannot ignore a clause just because you know it means nothing.  Every clause is a filtering condition (good or bad) so it has to be evaluated.

    If you want to eliminate the scans, remove the need to evaluate open conditions.

    And to reiterate what the others have said, your queries are not equivalent.  If you reread your queries,  1 has two filter conditions.  The other has one filter condition.  just because the result set is the same it doesn't mean the queries generating them are.

  • People!

     

    SELECT     SID

    FROM         Sales

    WHERE     (SID = 15) or (1=2)

    Clustered Index Seek!

     

    declare @x int

    set @x = 1

    SELECT     SID

    FROM         Sales

    WHERE     (SID = 15) or (@x=2)

    Clustered Index Scan!

     

    Is that fear???

  • rata,

    no it is not fear, fear is a feeling of anxiety, apprehenion or agitation, usually due to the presence or imminence of danger or the imagined thought of.

    There are several things that need to be understood to resolve your problem.

    1st, the Query Optimizer produces the Execution Plans

    2nd, the Query Optimizer does not resolve @variables, the Relational Engine does, but not until after the Query Optimizer has been run (the Optimizer is but one step inside the Relational Engine).

    3rd, the Query Optimizer determines the BEST execution plan for ALL possible combinations of execution

    4th, the Query Optimizer runs PRIOR to the execution of the statements.

    EDIT: Changed 4th item reference from the "Relational Engine" to "execution of the statements".  Added bracketed stuff to 2nd item.

    With your query examples, this means that when the Query Optimizer is determining the best plan, with the C1 = @X, C1 is a known value, @X is an unknown value (remembering that the Optimizer doesn't know what the value of @X will be but it does know the possible range of values due to the type), and can only be one instance of C1 (due to the =), thus an Index Seek is the best plan.  With C1 = @X OR @X = -1 there are several possible plans, but the most efficient is a Clustered Index Scan, why?, because @X is an unknown value.  @X is an integer so falls somewhere in the range of -2,147,483,648 through 2,147,483,647, the Optimizer knows this, and can also see that the clause COULD return all rows from the table IF (@X = -1) = True.  If you write the query

    SELECT C1, C2, C3, C4

    FROM Table1

    WHERE @X = -1

    you will get an execution plan of a Clustered Index Scan, due to the fact that the query will either return all rows, or no rows.  Combined this with C1 = @X by OR and the best plan to support both filter requirements is a Clustered Index Scan.  Simply put, the Optimizer will return a Clustered Index Scan as the best choice, because the Optimizer does not know the value of @X

    With your second set of examples

    SID = 15 OR 1 = 2

    If you read the execution plan that is used for this query, you would see that the Optimizer has evaluated the literal 1=2 and determined that is has no impact (result is false and OR false means the other side of the OR determines result), and as such is DROPPED from the Query and Filter, thus you get a Clustered Index Seek.  If you try running this with SID = 2 OR 2 = 2, then you will get a Clustered Index Scan, because 2=2 is true and anything OR true, will return all possible results.

    The second item is a Clustered Index Scan purely for the reasons outlined above.

  • Where the H3ll did you learn that?

  • fear? looked it up in the dictionary.

    seriously, its how a static compiler works.

  • Any book I can get to learn more about that?

  • Dear Grasshopper,

    Appreciating all what you mentioned, it’s all correct, BUT...

    My friend: The part “OR @x = -1” is being evaluated for each row in the table! The optimizer should be smart enough to evaluate this part one time only as it will never change between a row and another... this is simply my point.

    if you take the part “Table1.Col1 = 5” for example, the result of this condition differs from a row to another,, the rows which have Col1 = 5 will affect the condition to be TRUE, while other values will affect the condition to be FALSE, so for each row in he table the condition should be evaluated or in other words:

    This condition is DEPENDENT ,, it depends on each row in the table

    This is not valid for the my condition @x = -1 it will be either true or false for all rows.

    This condition is INDEPENDENT ,, it does not depend on any row in the table.

     

    I swear I am saying the truth, regardless how Microsoft implemented the optimizer, but it should be smart enough to evaluate the condition for each row or evaluate it one time only.

     

    Regards,

     

     

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply