September 10, 2010 at 3:35 am
let me describe the scenario.
There is an order, orderline and OrderLineStatus tables.
There is one order, it has many order lines, and each order lines can have many OrderLineStatus updates.
I want to find all order lines and their most recent stauts update for a set number of orders.
First i ran this query. I added a where clause in the sub query as i thought it would help with performance, but when i run the query without the where clause in the sub query, it actually runs quicker. any ideas why this is?
I thought adding the where clause in the sub query would limit the rows returned by that query, instead of it reading the entire contents of the tables it contains, as its not a co-related sub query:
when the two queries are run in the same batch, this one takes 59% of the batch
select order.Order, orderline.product, orderline.ShippedDate,OrderLineStatus.productStatus
from order join orderline on Order.orderid = orderLine.orderid
join
(
select ols.lineitemid,ols.max(statusUpdateDate) as'maxStatusUpdateDate'
from OrderLineStatus ols join orderline olon ols.orderlineid = ol.orderlineid
join order o on o.orderid = ol.orderid
where o.orderId in
(
00010,9845989,2342321,234345
)
group by OrderLineStatus.lineitemid
) as mostRecentStatus
join orderlinestatus ols on ols.orderlineid = mostrecentStatus.orderlineid
and ols.statusupdateDate = mostrecentstatus.maxStatusUpdateDate
where order.orderid in
(
00010,9845989,2342321,234345
)
this one, with the sub query where clause commented out only takes 41% of the batch.
select order.Order, orderline.product,OrderLineStatus.productStatus
from order join orderline on Order.orderid = orderLine.orderid
join
(
select ols.lineitemid,ols.max(statusUpdateDate) as'maxStatusUpdateDate'
from OrderLineStatus ols join orderline olon ols.orderlineid = ol.orderlineid
join order o on o.orderid = ol.orderid
--where o.orderId in
--(
--00010,9845989,2342321,234345
--)
group by OrderLineStatus.lineitemid
) as mostRecentStatus
join orderlinestatus ols on ols.orderlineid = mostrecentStatus.orderlineid
and ols.statusupdateDate = mostrecentstatus.maxStatusUpdateDate
where order.orderid in
(
00010,9845989,2342321,234345
)
any explination as to why this might be? if i run the sub query on its own, without a where clause it takes ages, as its not limited to a smaller data set, but when its used as a sub query it is quicker. what is sql server doing in the backend? is it automatically compiling this as a co-related subquery?
September 10, 2010 at 3:49 am
The question is moot really. Since neither the derived table mostRecentStatus
nor the table orderlinestatus are joined to orders or orderlines, you have a cross join.
Try this instead:
SELECT
o.[Order],
ol.product,
ol.ShippedDate,
mostRecentStatus.maxStatusUpdateDate
FROM [order] o
INNER JOIN orderline ol ON o.orderid = ol.orderid
INNER JOIN (
SELECT
ols.orderlineid,
MAX(ols.statusUpdateDate) AS [maxStatusUpdateDate]
FROM OrderLineStatus ols
GROUP BY ols.orderlineid
) AS mostRecentStatus ON mostRecentStatus.orderlineid = ol.orderlineid
WHERE [order].orderid IN
('00010','9845989','2342321','234345')
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 10, 2010 at 4:25 am
if its a cross join, wouldnt it give a combination of all the various results, and take a while ( considering the real tables in my system have millions of rows).
the query runs about 2 seconds currently.
September 10, 2010 at 4:31 am
Look at it again. A cross join doesn't have to generate millions and millions of rows, and since you've got a filter on the derived table, this one won't, it will generate (the number of rows returned by the derived table) x (number of filtered rows from the main query).
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 12, 2010 at 1:03 pm
Could you please post up the execution plans from both your queries?
Offhand, I'll speculate that by moving the WHERE clause and using it to filter the Order table in the FROM clause, you are dramatically reducing the number of rows which have to be joined to the derived table.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 12, 2010 at 1:07 pm
i cant im afraid. the code is production code and i cant put it online. the script i gave in the initial post is a simplified version of the actual production code but with all col and table names changed. i cant do that in a query plan.
September 12, 2010 at 1:08 pm
Too bad, there's no way to know exactly what is going on without referring to the execution plans.
Have a good weekend.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply