Need help with 2 query plans

  • I need some help regarding a difference in query plan that 2 queries have. Pleas take a look at the 2 queries bellow (that work on AdventureWorks database):

    if not exists (select * from sys.indexes where name = 'IX_SalesSalesOrderHeader_DueDate')

    CREATE INDEX IX_SalesSalesOrderHeader_DueDate ON Sales.SalesOrderHeader (DueDate)

    go

    --First query. Works with the index as expected and very fast

    declare @dt datetime

    set @dt = dateadd(day,datediff(day,0,getdate()),0)

    select max(DueDate) from [Sales].[SalesOrderHeader] where DueDate < @dt

    --Second query. Goes through all the records and works slow

    select max(DueDate) from [Sales].[SalesOrderHeader] where DueDate < dateadd(day,datediff(day,0,getdate()),0)

    For some reason there is a difference between the query plans. The first query does an index seek in backward direction so it gets the first record and has only 2 logical reads. According to the second query's query plan, it does an index seek, but it looks that it does an index scan because it goes it returns all the records to the next step and it does 73 logical reads. According to the query plan's properties it is doing a forward seek. Can anyone explain why it says index seek when it does an index scan and why there is a difference between the query plans?

    Thank you for your help

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • While I'm not 100% certain, my gut says that declaring the variable allows the optmizer to see that what's being compared to is a constant, whereas the expression appearing directly in the query may not be looked at as deterministic. If anyone with the necessary knowledge can confirm (or deny - I'm not proud), please let me know..

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Actually with a variable the server doesn't know what value will be used. This is one of the differences between variable and parameter. It can do parameter sniffing when it creates a query plan for a procedure, but when we are working with variables it has no idea of its value until runtime (which of course is after the query plan was created).

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Adi Cohn-120898 (12/18/2013)


    Actually with a variable the server doesn't know what value will be used. This is one of the differences between variable and parameter. It can do parameter sniffing when it creates a query plan for a procedure, but when we are working with variables it has no idea of its value until runtime (which of course is after the query plan was created).

    Adi

    While true, there is a concept of deterministic vs. non-deterministic, and the optimizer definitely plans around that concept. If it can determine that a variable or expression IS deterministic, then it can plan the query one way, and if it can determine that a variable or expression IS NOT deterministic, then it may plan the query a different way, and I was primarily looking to confirm if anyone knows for sure that for this user's query, is experiencing that difference. After all, his slow-down occurs with the expression in the query instead of with the variable.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I don't think that it has anything to do with determinism. In both queries the query gets 1 value (once as a variable and once as an expression). I know that there is a difference between using constants or using parameter or using another method that gets its value during runtime. In my example both queries got the value that should be checked during runtime. By the way bellow is an example that uses a variable and the function rand and both are using the same query plan. This was the behavior that I expected to get from the original queries that I've posted.

    select * from Sales.SalesOrderheader where CustomerID = cast (rand()*(30118 - 11000) + 1 as int)

    go

    declare @CustomerID int

    set @CustomerID = cast (rand()*(30118 - 11000) + 1 as int)

    select * from Sales.SalesOrderheader where CustomerID = @CustomerID

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • I'm not so sure... As the variable is a constant, there is only one evaluation of it, and the value remains a constant for every record in the query. However, when the expression appears in the query, it has to be evaluated for every record because the GETDATE() function is NON-DETERMINISTIC:

    http://technet.microsoft.com/en-us/library/aa214775(v=sql.80).aspx

    While you and I know that the date value that gets generated will remain constant for a short-duration query run anytime not near midnight, SQL Server may not be able to make that same assumption. I'm pretty sure that's why functions are assigned to be either deterministic or not.

    It's also why I never include GETDATE() directly in my queries. It's almost always caused me grief to do so, and it also keeps my queries from being capable of breaking if run really close to midnight.

    Is there anyone who can provide more detail? This is my understanding, but I have yet to get good confirmation on exactly how this works.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Unfortunately, I just tested it on my machine and got identical execution plans for the two queries, which is what I expected. I'm going to go and try a couple of different versions of SQL Server. Maybe it's a difference in the optimizer.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • OK. That was just 2014 acting cute. I did get two different plans in 2012.

    To understand the differences, look to the properties. I have both plans using an Index Seek, but the differences are in the Predicates. The faster plan looks like this:

    Seek Keys[1]: End: [AdventureWorks2012].[Sales].[SalesOrderHeader].DueDate < Scalar Operator([@dt])

    The slower plan like this:

    Seek Keys[1]: End: [AdventureWorks2012].[Sales].[SalesOrderHeader].DueDate < Scalar Operator(dateadd(day,datediff(day,'1900-01-01 00:00:00.000',getdate()),'1900-01-01 00:00:00.000'))

    It has to be down to the estimates and the functions. The top date reverses the order of the seek and pulls back the one row. The lower one does a forward seek, calculating like made, and walks through 31k rows. But the biggest point is that while both are assuming one row, from the same index, each one has a different estimated cost. So the estimation engine sees a difference.

    Since we're not dealing with a recompile issue, there's no variable sniffing going on, so it has to be the ability of the optimizer to accurately estimate what's going to happen.

    Now, the fun part, and I'm pretty sure it comes down the new cardinality estimator in 2014, but it's able to figure out that the plans can be identical.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • @Grant,

    That's really interesting behavior, and quite an improvement in the optimizer. Although, I wonder if, having figured out that SQL 2014 will start to allow you to "get away with" writing your queries that way, how many folks will blindly do other things and rely on the optimizer being smart enough to "fix it"... Might make for an interesting debate... Either that or a total disaster... And I'm quite sure I don't want the latter...

    However, debate or not, this does suggest that perhaps some of the other situations that folks have learned about the existing optimizer behavior might be tested against SQL 2014 and the results published. I'd be especially interested in seeing what happens to performance with Jeff Moden's CTE string splitter in SQL 2014, and whether the performance curves occupy the same relative locations with respect to number of elements and the average size thereof, as well as whether or not Jeff might be able to "get away with" a CASE statement where he knew he'd have a problem.

    Jeff ?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson (12/18/2013)


    That's really interesting behavior, and quite an improvement in the optimizer. Although, I wonder if, having figured out that SQL 2014 will start to allow you to "get away with" writing your queries that way, [font="Arial Black"]how many folks will blindly do other things and rely on the optimizer [/font]being smart enough to "fix it"... Might make for an interesting debate... Either that or a total disaster... And I'm quite sure I don't want the latter...

    Heh... from what I see of many people's code, they've been doing that all along. 😛

    However, debate or not, this does suggest that perhaps some of the other situations that folks have learned about the existing optimizer behavior might be tested against SQL 2014 and the results published. I'd be especially interested in seeing what happens to performance with Jeff Moden's CTE string splitter in SQL 2014, and whether the performance curves occupy the same relative locations with respect to number of elements and the average size thereof, as well as whether or not Jeff might be able to "get away with" a CASE statement where he knew he'd have a problem.

    Jeff ?

    Wayne Sheffield is actually working on an article on this subject as we speak. I'm not sure if he's going to publish it here or on his blog.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Grant Fritchey (12/18/2013)


    OK. That was just 2014 acting cute. I did get two different plans in 2012.

    To understand the differences, look to the properties. I have both plans using an Index Seek, but the differences are in the Predicates. The faster plan looks like this:

    Seek Keys[1]: End: [AdventureWorks2012].[Sales].[SalesOrderHeader].DueDate < Scalar Operator([@dt])

    The slower plan like this:

    Seek Keys[1]: End: [AdventureWorks2012].[Sales].[SalesOrderHeader].DueDate < Scalar Operator(dateadd(day,datediff(day,'1900-01-01 00:00:00.000',getdate()),'1900-01-01 00:00:00.000'))

    It has to be down to the estimates and the functions. The top date reverses the order of the seek and pulls back the one row. The lower one does a forward seek, calculating like made, and walks through 31k rows. But the biggest point is that while both are assuming one row, from the same index, each one has a different estimated cost. So the estimation engine sees a difference.

    Since we're not dealing with a recompile issue, there's no variable sniffing going on, so it has to be the ability of the optimizer to accurately estimate what's going to happen.

    Now, the fun part, and I'm pretty sure it comes down the new cardinality estimator in 2014, but it's able to figure out that the plans can be identical.

    I have to admit that I expected it to have the same query plans in SQL Server 2012 (and still expect, but I know that it won't happen). Another thing that I don't understand is why it says that it is doing an index seek when it seems that it is doing an index scan? Can anyone explain that?

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Read the description of a seek. It's basically the same as a scan. But in this case, it thinks it can seek a limited range but ends up seeking on the entire range.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • When I looked at this (in SQL Server 2012), I was interested to see that the faster plan (using the variable in the WHERE clause) included a TOP operator between the index seek and the stream aggregate that was absent in the other plan (using the expression in the WHERE clause), so I tried this:

    select top(1) DueDate from Sales.SalesOrderHeader where DueDate < @dt ORDER BY DueDate DESC

    select top(1) DueDate from Sales.SalesOrderHeader where DueDate < dateadd(day,datediff(day,0,getdate()),0) ORDER BY DueDate DESC

    The seek predicates differed just as they did in the original example - the query with the variable used

    Seek Keys[1]: End: [AdventureWorks2012].[Sales].[SalesOrderHeader].DueDate < Scalar Operator([@dt])

    while the query with the expression used

    Seek Keys[1]: End: [AdventureWorks2012].[Sales].[SalesOrderHeader].DueDate < Scalar Operator(dateadd(day,datediff(day,'1900-01-01 00:00:00.000',getdate()),'1900-01-01 00:00:00.000'))

    In both cases, though, I got the same execution plan - a BACKWARD range scan that yielded one row to a TOP operator between it and the SELECT operator.

    One other interesting thing I saw - the original example queries each showed Optimization Level as FULL, while both of my TOP(1) . . . ORDER BY queries showed Optimization Level as TRIVIAL.

    Jason Wolfkill

  • wolfkillj (12/19/2013)


    When I looked at this (in SQL Server 2012), I was interested to see that the faster plan (using the variable in the WHERE clause) included a TOP operator between the index seek and the stream aggregate that was absent in the other plan (using the expression in the WHERE clause), so I tried this:

    select top(1) DueDate from Sales.SalesOrderHeader where DueDate < @dt ORDER BY DueDate DESC

    select top(1) DueDate from Sales.SalesOrderHeader where DueDate < dateadd(day,datediff(day,0,getdate()),0) ORDER BY DueDate DESC

    The seek predicates differed just as they did in the original example - the query with the variable used

    Seek Keys[1]: End: [AdventureWorks2012].[Sales].[SalesOrderHeader].DueDate < Scalar Operator([@dt])

    while the query with the expression used

    Seek Keys[1]: End: [AdventureWorks2012].[Sales].[SalesOrderHeader].DueDate < Scalar Operator(dateadd(day,datediff(day,'1900-01-01 00:00:00.000',getdate()),'1900-01-01 00:00:00.000'))

    In both cases, though, I got the same execution plan - a BACKWARD range scan that yielded one row to a TOP operator between it and the SELECT operator.

    One other interesting thing I saw - the original example queries each showed Optimization Level as FULL, while both of my TOP(1) . . . ORDER BY queries showed Optimization Level as TRIVIAL.

    One more thing - considering that the TOP(1) . . . ORDER BY construct I tested above doesn't allow for grouping, I tried this:

    select CustomerID, max(DueDate) from [Sales].[SalesOrderHeader] where DueDate < @dt group by CustomerID

    select CustomerID, max(DueDate) from [Sales].[SalesOrderHeader] where DueDate < dateadd(day,datediff(day,0,getdate()),0) group by CustomerID

    The difference in the plans of the query using a variable and the query using the expression were pretty significant, as the attached execution plans show. SQL Server got the memory grant all wrong for the query using the expression (allocating about one-fourth of what the query using the variable got) and had a sort that spilled to tempdb. The IO workload was also starkly different:

    Query using variable:

    (19119 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'SalesOrderHeader'. Scan count 2, logical reads 130, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    ---------------------------

    Query using expression:

    (19119 row(s) affected)

    Table 'SalesOrderHeader'. Scan count 1, logical reads 94468, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    Jason Wolfkill

  • Wayne Sheffield is actually working on an article on this subject as we speak. I'm not sure if he's going to publish it here or on his blog.

    I'll be looking forward to that... Let me know when it gets published...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply