order of criteria in where clause changes query palan

  • Inna is a colleague of mine that came across an interesting problem. She had a query that had a wrong query plan. She modified the order of the conditions in the where clause, and got a better query plan and the query ran much faster. She even wrote a small script that reproduces this behavior. You’ll might need to run the script 2 or 3 times because it is based on random data, but it does show different query plans for 2 queries that are logically the same. I think that this is a bug, but I might be wrong. Can anyone have an explanation for that?

    Use tempdb

    GO

    CREATE TABLE Inna (ID int identity(1,1), DataCLInd datetime constraint DF_Inna_DataCLInd default (getdate()),

    SecField INT, Third INT,

    constraint PK_Inna_DataCLInd_SecField_Third Primary Key Clustered (DataCLInd,SecField,Third)

    )

    insert into Inna (DataCLInd,

    SecField,

    Third)

    Values ('20110101' , 2,23)

    DECLARE @ROWCOUNT int

    select @ROWCOUNT = 1

    --Inserting data. Since we are inserting random data, we will have few errors, but at the end we'll have

    --more then 4000000 records and very fast

    While @ROWCOUNT < 2000000

    BEGIN

    insert into Inna (DataCLInd,

    SecField,

    Third)

    SELECT dateadd(MINUTE,+datepart(MILLISECOND,GETDATE())*datepart(MINUTE,GETDATE()),DataCLInd),

    ID * 1.5,

    ID * 4.5

    FROM Inna

    SELECT @ROWCOUNT = @@ROWCOUNT

    END

    go

    --First query

    declare @LastRunTime datetime

    select @LastRunTime = '2011-01-16 00:00:52.870'

    ;WITH RankedRows AS (

    SELECT ID,DataCLInd,SecField,Third,

    RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum

    FROM Inna

    WHERE DataCLInd >= @LastRunTime AND DataCLInd >= DATEADD(MONTH,-1,getutcdate()))

    SELECT ID,DataCLInd,SecField,Third

    FROM RankedRows

    WHERE RowNum = 1

    go

    --Second query

    declare @LastRunTime datetime

    select @LastRunTime = '2011-01-16 00:00:52.870'

    ;WITH RankedRows AS (

    SELECT ID,DataCLInd,SecField,Third,

    RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum

    FROM Inna

    WHERE DataCLInd >= DATEADD(MONTH,-1,getutcdate()) AND DataCLInd >= @LastRunTime)

    SELECT ID,DataCLInd,SecField,Third

    FROM RankedRows

    WHERE RowNum = 1

    go

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Save the two different plans.

    Swap the two queries around in the script, run until you get two different plans again.

    Compare the four plans.

    The results suggest that plan generation is dependant upon the resources available at the time the query is run - how could it be otherwise?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • A couple of Grant Fitychey's presentations he did on execution plans might be relevant;

    as i remember it, the key to look for is the property "Reason For Early Termination of Statement Optimization"

    if the value is anything other than "Good Enough Plan Found" in the actual execution plan, it means it gave up and used what was considered the best "so far"...so in that case the order of the joins and wheres have an impact on which plan to use.

    for a large-ish query, at a certain point the query engine decides it would be more work to build a "perfect" execution plan than it is to use an exisitng one.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (11/4/2011)


    A couple of Grant Fitychey's presentations he did on execution plans might be relevant;

    as i remember it, the key to look for is the property "Reason For Early Termination of Statement Optimization"

    if the value is anything other than "Good Enough Plan Found" in the actual execution plan, it means it gave up and used what was considered the best "so far"...so in that case the order of the joins and wheres have an impact on which plan to use.

    for a large-ish query, at a certain point the query engine decides it would be more work to build a "perfect" execution plan than it is to use an exisitng one.

    In both cases the optimize level was full.

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Adi Cohn-120898 (11/4/2011)


    Lowell (11/4/2011)


    A couple of Grant Fitychey's presentations he did on execution plans might be relevant;

    as i remember it, the key to look for is the property "Reason For Early Termination of Statement Optimization"

    if the value is anything other than "Good Enough Plan Found" in the actual execution plan, it means it gave up and used what was considered the best "so far"...so in that case the order of the joins and wheres have an impact on which plan to use.

    for a large-ish query, at a certain point the query engine decides it would be more work to build a "perfect" execution plan than it is to use an exisitng one.

    In both cases the optimize level was full.

    Adi

    One parallel, one not parallel (serial?) ?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Adi Cohn-120898 (11/4/2011)


    Lowell (11/4/2011)


    A couple of Grant Fitychey's presentations he did on execution plans might be relevant;

    as i remember it, the key to look for is the property "Reason For Early Termination of Statement Optimization"

    if the value is anything other than "Good Enough Plan Found" in the actual execution plan, it means it gave up and used what was considered the best "so far"...so in that case the order of the joins and wheres have an impact on which plan to use.

    for a large-ish query, at a certain point the query engine decides it would be more work to build a "perfect" execution plan than it is to use an exisitng one.

    In both cases the optimize level was full.

    Lowell's not talking about optimisation level (which can be full or trivial). He's talking about the 'reason for early abort'

    Can you post both queries and both plans?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Those are the 2 queries:

    --First query

    declare @LastRunTime datetime

    select @LastRunTime = '2011-01-16 00:00:52.870'

    ;WITH RankedRows AS (

    SELECT ID,DataCLInd,SecField,Third,

    RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum

    FROM Inna

    WHERE DataCLInd >= @LastRunTime AND DataCLInd >= DATEADD(MONTH,-1,getutcdate()))

    SELECT ID,DataCLInd,SecField,Third

    FROM RankedRows

    WHERE RowNum = 1

    go

    --Second query

    declare @LastRunTime datetime

    select @LastRunTime = '2011-01-16 00:00:52.870'

    ;WITH RankedRows AS (

    SELECT ID,DataCLInd,SecField,Third,

    RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum

    FROM Inna

    WHERE DataCLInd >= DATEADD(MONTH,-1,getutcdate()) AND DataCLInd >= @LastRunTime)

    SELECT ID,DataCLInd,SecField,Third

    FROM RankedRows

    WHERE RowNum = 1

    go

    In my first message there is also a script to create the table and insert datat into the tables. The attached files contain the query plan for each one of the queries.

    Thank you for your help.

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • i had to use winmerge to be sure, the two files you posted are identical; could you have copied the same plan as two different names?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (11/4/2011)


    i had to use winmerge to be sure, the two files you posted are identical; could you have copied the same plan as two different names?

    You are correct that they are the same. Each time that I right clicked on one of the query plans, it saved both. You can open just one of the files and see both query plans.

    Sorry for my mistake.

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?

    i thought it would say something else...i've got to look at my notes on this.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (11/4/2011)


    well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?

    The other did find a plan, it just had to go through all stages of optimisation before it found one that it considered good enough, whereas the first one aborted with a good enough plan without having to go through all the stages.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (11/4/2011)


    Lowell (11/4/2011)


    well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?

    The other did find a plan, it just had to go through all stages of optimisation before it found one that it considered good enough, whereas the first one aborted with a good enough plan without having to go through all the stages.

    Gail, I'm sure I've read somewhere that the process of plan checking is susceptible to system pressures, that is, fewer plans can be checked within the allocated time - but I haven't found a reference for this anywhere. Do you know if this is indeed the case?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Lowell (11/4/2011)


    well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?

    i thought it would say something else...i've got to look at my notes on this.

    I see. It is interesting how come with one query the optimizer found a good enough plan and with the other one it didn't.

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • ChrisM@Work (11/4/2011)


    GilaMonster (11/4/2011)


    Lowell (11/4/2011)


    well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?

    The other did find a plan, it just had to go through all stages of optimisation before it found one that it considered good enough, whereas the first one aborted with a good enough plan without having to go through all the stages.

    Gail, I'm sure I've read somewhere that the process of plan checking is susceptible to system pressures, that is, fewer plans can be checked within the allocated time - but I haven't found a reference for this anywhere. Do you know if this is indeed the case?

    I'm running the queries on my laptop. I'm the only one that works with it and there are no jobs. I still get different queries for each procedure. I don't think that it has anything to do with resources

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Adi Cohn-120898 (11/4/2011)


    Inna is a colleague of mine that came across an interesting problem. She had a query that had a wrong query plan. She modified the order of the conditions in the where clause, and got a better query plan and the query ran much faster. She even wrote a small script that reproduces this behavior. You’ll might need to run the script 2 or 3 times because it is based on random data, but it does show different query plans for 2 queries that are logically the same. I think that this is a bug, but I might be wrong. Can anyone have an explanation for that?

    Use tempdb

    GO

    CREATE TABLE Inna (ID int identity(1,1), DataCLInd datetime constraint DF_Inna_DataCLInd default (getdate()),

    SecField INT, Third INT,

    constraint PK_Inna_DataCLInd_SecField_Third Primary Key Clustered (DataCLInd,SecField,Third)

    )

    insert into Inna (DataCLInd,

    SecField,

    Third)

    Values ('20110101' , 2,23)

    DECLARE @ROWCOUNT int

    select @ROWCOUNT = 1

    --Inserting data. Since we are inserting random data, we will have few errors, but at the end we'll have

    --more then 4000000 records and very fast

    While @ROWCOUNT < 2000000

    BEGIN

    insert into Inna (DataCLInd,

    SecField,

    Third)

    SELECT dateadd(MINUTE,+datepart(MILLISECOND,GETDATE())*datepart(MINUTE,GETDATE()),DataCLInd),

    ID * 1.5,

    ID * 4.5

    FROM Inna

    SELECT @ROWCOUNT = @@ROWCOUNT

    END

    go

    --First query

    declare @LastRunTime datetime

    select @LastRunTime = '2011-01-16 00:00:52.870'

    ;WITH RankedRows AS (

    SELECT ID,DataCLInd,SecField,Third,

    RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum

    FROM Inna

    WHERE DataCLInd >= @LastRunTime AND DataCLInd >= DATEADD(MONTH,-1,getutcdate()))

    SELECT ID,DataCLInd,SecField,Third

    FROM RankedRows

    WHERE RowNum = 1

    go

    --Second query

    declare @LastRunTime datetime

    select @LastRunTime = '2011-01-16 00:00:52.870'

    ;WITH RankedRows AS (

    SELECT ID,DataCLInd,SecField,Third,

    RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum

    FROM Inna

    WHERE DataCLInd >= DATEADD(MONTH,-1,getutcdate()) AND DataCLInd >= @LastRunTime)

    SELECT ID,DataCLInd,SecField,Third

    FROM RankedRows

    WHERE RowNum = 1

    go

    Adi

    Adi,

    Different lesson and just a suggestion to speed up writing and executing test table construction... try the following. The ISNULLs are there to make NOT NULL columns. I think you'll find it much faster than your loop. Easier to control the absolute size and range of the data, as well.

    USE TempDB

    ;

    SELECT TOP 2000000

    ID = IDENTITY(INT,1,1),

    DataCLInd = ISNULL(RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'2011','2012')+CAST('2011' AS DATETIME),0),

    SecField = ISNULL(ABS(CHECKSUM(NEWID()))%1000000,0),

    Third = ISNULL(ABS(CHECKSUM(NEWID()))%1000000,0)

    INTO dbo.Inna

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    ALTER TABLE dbo.Inna

    ADD CONSTRAINT PK_Inna_DataCLInd_SecField_Third PRIMARY KEY CLUSTERED (DataCLInd,SecField,Third)

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply