October 26, 2011 at 11:58 pm
The database I'm working with is an order database. It stores order status in a history table whose structure is indicated in the CREATE TABLE statement below:
CREATE TABLE [dbo].[OrderStatusLog](
[OrderStatusID] [int] IDENTITY(1,1) NOT NULL,
[OrderID] [int] NOT NULL,
[StatusID] [int] NOT NULL,
[EmployeeID] [int] NOT NULL,
[StatusDate] [datetime] NOT NULL,
CONSTRAINT [PK_OrderStatusLog] PRIMARY KEY CLUSTERED);
The current order status for a given order is determined by retrieving the last StatusID value for that OrderID sorting by StatusDate ascending. To query all orders with their current status, I've been using the following derived table / self join combo platter:
SELECT O.OrderID, SL.StatusID, SL.StatusDate
FROM Orders O
INNER JOIN (SELECT OrderID, MAX(OrderStatusID) AS OrderStatusID
FROM OrderStatusLog
GROUP BY OrderID) L ON O.OrderID = L.OrderID
INNER JOIN OrderStatusLog SL ON L.OrderStatusID = SL.OrderStatusID;
A problem with this is that I frequently augment this query using additional inner joins to deliver a more "informative" result set (combining with customer data, shipping data, etc.). Query performance can get a little weak when I do so.
So, is there a more efficient query to retrieve this result set? Moving the order status to the Orders table with an AFTER UPDATE trigger may not be an option, but I'm curious about input on that as a possibility as well.
Thanks in advance.
October 27, 2011 at 12:08 am
I should add that, because OrderStatusID is the clustered primary key, which is why I opted to use it for my sort (MAX(OrderStatusID)) and join.
October 27, 2011 at 4:49 am
This is probably more efficient:
SELECT
O.OrderID,
l.StatusID,
l.StatusDate
FROM Orders O
INNER JOIN (
SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY OrderID ORDER BY StatusDate)
FROM OrderStatusLog
) l ON l.OrderID = o.OrderID AND l.rn = 1
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 27, 2011 at 7:09 am
Depending on your indices and your data, the following may be more efficient.
SELECT
o.OrderID,
l.StatusID,
l.StatusDate
FROM Orders AS o
OUTER APPLY (
SELECT TOP (1) StatusID, StatusDate
FROM OrderStatusLog
WHERE l.OrderID = o.OrderID
ORDER BY StatusDate DESC
) l
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 27, 2011 at 9:31 am
Based on current indexing, execution time on all three is very close. The two suggestions show a higher cost, but I can see where some index tuning may make them perform better. I will play with those and see what comes of it.
I did push the StatusID back to the Orders table in a test environment and saw a 400% improvement in query performance, not to mention the code simplification. I doubt anyone will be surprised by that.
Thanks for your help.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply