Retrieving status from history

  • The database I'm working with is an order database. It stores order status in a history table whose structure is indicated in the CREATE TABLE statement below:

    CREATE TABLE [dbo].[OrderStatusLog](

    [OrderStatusID] [int] IDENTITY(1,1) NOT NULL,

    [OrderID] [int] NOT NULL,

    [StatusID] [int] NOT NULL,

    [EmployeeID] [int] NOT NULL,

    [StatusDate] [datetime] NOT NULL,

    CONSTRAINT [PK_OrderStatusLog] PRIMARY KEY CLUSTERED);

    The current order status for a given order is determined by retrieving the last StatusID value for that OrderID sorting by StatusDate ascending. To query all orders with their current status, I've been using the following derived table / self join combo platter:

    SELECT O.OrderID, SL.StatusID, SL.StatusDate

    FROM Orders O

    INNER JOIN (SELECT OrderID, MAX(OrderStatusID) AS OrderStatusID

    FROM OrderStatusLog

    GROUP BY OrderID) L ON O.OrderID = L.OrderID

    INNER JOIN OrderStatusLog SL ON L.OrderStatusID = SL.OrderStatusID;

    A problem with this is that I frequently augment this query using additional inner joins to deliver a more "informative" result set (combining with customer data, shipping data, etc.). Query performance can get a little weak when I do so.

    So, is there a more efficient query to retrieve this result set? Moving the order status to the Orders table with an AFTER UPDATE trigger may not be an option, but I'm curious about input on that as a possibility as well.

    Thanks in advance.

  • I should add that, because OrderStatusID is the clustered primary key, which is why I opted to use it for my sort (MAX(OrderStatusID)) and join.

  • This is probably more efficient:

    SELECT

    O.OrderID,

    l.StatusID,

    l.StatusDate

    FROM Orders O

    INNER JOIN (

    SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY OrderID ORDER BY StatusDate)

    FROM OrderStatusLog

    ) l ON l.OrderID = o.OrderID AND l.rn = 1

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Depending on your indices and your data, the following may be more efficient.

    SELECT

    o.OrderID,

    l.StatusID,

    l.StatusDate

    FROM Orders AS o

    OUTER APPLY (

    SELECT TOP (1) StatusID, StatusDate

    FROM OrderStatusLog

    WHERE l.OrderID = o.OrderID

    ORDER BY StatusDate DESC

    ) l

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Based on current indexing, execution time on all three is very close. The two suggestions show a higher cost, but I can see where some index tuning may make them perform better. I will play with those and see what comes of it.

    I did push the StatusID back to the Orders table in a test environment and saw a 400% improvement in query performance, not to mention the code simplification. I doubt anyone will be surprised by that.

    Thanks for your help.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply