why is one query better performing than the other

  • let me describe the scenario.

    There is an order, orderline and OrderLineStatus tables.

    There is one order, it has many order lines, and each order lines can have many OrderLineStatus updates.

    I want to find all order lines and their most recent stauts update for a set number of orders.

    First i ran this query. I added a where clause in the sub query as i thought it would help with performance, but when i run the query without the where clause in the sub query, it actually runs quicker. any ideas why this is?

    I thought adding the where clause in the sub query would limit the rows returned by that query, instead of it reading the entire contents of the tables it contains, as its not a co-related sub query:

    when the two queries are run in the same batch, this one takes 59% of the batch

    select order.Order, orderline.product, orderline.ShippedDate,OrderLineStatus.productStatus

    from order join orderline on Order.orderid = orderLine.orderid

    join

    (

    select ols.lineitemid,ols.max(statusUpdateDate) as'maxStatusUpdateDate'

    from OrderLineStatus ols join orderline olon ols.orderlineid = ol.orderlineid

    join order o on o.orderid = ol.orderid

    where o.orderId in

    (

    00010,9845989,2342321,234345

    )

    group by OrderLineStatus.lineitemid

    ) as mostRecentStatus

    join orderlinestatus ols on ols.orderlineid = mostrecentStatus.orderlineid

    and ols.statusupdateDate = mostrecentstatus.maxStatusUpdateDate

    where order.orderid in

    (

    00010,9845989,2342321,234345

    )

    this one, with the sub query where clause commented out only takes 41% of the batch.

    select order.Order, orderline.product,OrderLineStatus.productStatus

    from order join orderline on Order.orderid = orderLine.orderid

    join

    (

    select ols.lineitemid,ols.max(statusUpdateDate) as'maxStatusUpdateDate'

    from OrderLineStatus ols join orderline olon ols.orderlineid = ol.orderlineid

    join order o on o.orderid = ol.orderid

    --where o.orderId in

    --(

    --00010,9845989,2342321,234345

    --)

    group by OrderLineStatus.lineitemid

    ) as mostRecentStatus

    join orderlinestatus ols on ols.orderlineid = mostrecentStatus.orderlineid

    and ols.statusupdateDate = mostrecentstatus.maxStatusUpdateDate

    where order.orderid in

    (

    00010,9845989,2342321,234345

    )

    any explination as to why this might be? if i run the sub query on its own, without a where clause it takes ages, as its not limited to a smaller data set, but when its used as a sub query it is quicker. what is sql server doing in the backend? is it automatically compiling this as a co-related subquery?

  • The question is moot really. Since neither the derived table mostRecentStatus

    nor the table orderlinestatus are joined to orders or orderlines, you have a cross join.

    Try this instead:

    SELECT

    o.[Order],

    ol.product,

    ol.ShippedDate,

    mostRecentStatus.maxStatusUpdateDate

    FROM [order] o

    INNER JOIN orderline ol ON o.orderid = ol.orderid

    INNER JOIN (

    SELECT

    ols.orderlineid,

    MAX(ols.statusUpdateDate) AS [maxStatusUpdateDate]

    FROM OrderLineStatus ols

    GROUP BY ols.orderlineid

    ) AS mostRecentStatus ON mostRecentStatus.orderlineid = ol.orderlineid

    WHERE [order].orderid IN

    ('00010','9845989','2342321','234345')

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • if its a cross join, wouldnt it give a combination of all the various results, and take a while ( considering the real tables in my system have millions of rows).

    the query runs about 2 seconds currently.

  • Look at it again. A cross join doesn't have to generate millions and millions of rows, and since you've got a filter on the derived table, this one won't, it will generate (the number of rows returned by the derived table) x (number of filtered rows from the main query).

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Could you please post up the execution plans from both your queries?

    Offhand, I'll speculate that by moving the WHERE clause and using it to filter the Order table in the FROM clause, you are dramatically reducing the number of rows which have to be joined to the derived table.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • i cant im afraid. the code is production code and i cant put it online. the script i gave in the initial post is a simplified version of the actual production code but with all col and table names changed. i cant do that in a query plan.

  • Too bad, there's no way to know exactly what is going on without referring to the execution plans.

    Have a good weekend.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply