The Problem
Someone asked me for some help the other day because they were about to write a cursor, and they remembered what I said, “If you ever need to write a cursor you should call me. There is always a better way.” So my colleague took me up on my offer.
This is what they wanted to do. They had a query that returned a data set. The data listed
- A unique ID
- OrderIDs
- Codes
- The date the order was placed.
They wanted to know when the most resent order’s code was different from the one previous to it. In the sample data below, only ORDER4 meets that criteria. Meaning it is the only order whose most recent date (2015-10-03) has a different code (DS) than the one before it (AG) from 2015-10-02. That was a mouthful.
The question is how do you report that the row, with ID 9, has a different Code than the row with ID 7, which came in the day before?
Enter the Windowing Functions
There is actually a very graceful and efficient way to do this. I originally started with a Common Table Expression (CTE) and later realized the better way to handle this. If you’re on a version of SQL Server that supports them (SQL Server 2012+), you can easily do this via windowing functions. In this article we will be using the LEAD() windowing function introduced in 2012.
Setup
Start by creating your query. Below I created a CTE that would give me the sample data above. In the real world just start with your query.
WITH cteSAMPLEDATA AS (--Create Sample data SELECT ID, ORDER_ID, CODE, ORDER_DATE FROM (VALUES (1, 'ORDER1', 'DS', N'20151001'), (2, 'ORDER2', 'DS', N'20151001'), (5, 'ORDER2', 'DS', N'20151002'), (3, 'ORDER3', 'DS', N'20151001'), (6, 'ORDER3', 'AG', N'20151002'), (8, 'ORDER3', 'AG', N'20151003'), (4, 'ORDER4', 'DS', N'20151001'), (7, 'ORDER4', 'AG', N'20151002'), (9, 'ORDER4', 'DS', N'20151003') ) AS MYTABLE(ID, ORDER_ID, CODE, ORDER_DATE) ) SELECT H1.ID, H1.ORDER_ID, H1.CODE, H1.ORDER_DATE FROM cteSAMPLEDATA H1 ORDER BY ORDER_ID, ORDER_DATE DESC
Step 1
For this problem, we are interested in:
- The most recent order for every group of ORDER_IDs.
- The order that came before it (previous order).
- If the CODE on the on these 2 records is different.
Sorting the record set on ORDER_ID, ORDER_DATE, DESCENDING, will get us part of the way there, but is there a way (In SQL) to compare the values in one row, to another row? There is, but let’s get there one step at a time.
Enter the Windowing Functions. Now this is an entire topic unto itself. Books have been written about them. What this will allow you to do, is take a recordset and partition it into sub sections; or Windows. In this case we want to partition our recordset by ORDER_ID, essentially grouping all rows with the same ORDER_ID together. Creating a partition/window over each group. Then for each of the groups (partitions/windows) we want the MAX(ORDER_DATE). This is your code (I have excluded the CTE for readability and simplicity).
SELECT H1.ID, H1.ORDER_ID, H1.ORDER_DATE, MAX(H1.ORDER_DATE) --Get the latest date for each ORDER OVER (PARTITION BY ORDER_ID) AS MAX_DATE, CODE FROM cteSAMPLEDATA H1
The new column (MAX_DATE) I have added has told SQL Server to partition the data on ORDER_ID, and for each partition give me the latest (MAX) date. I have highlighted each “partition” in a different color so you can see what is going on.
Great! Now we know whenever ORDER_DATE = MAX_DATE we are on the most recent record.
Step 2
Just like MAX() can get us the latest record for a date column, there are other functions we can use to access data in a partition. The LEAD() windowing function can retrieve the value of the next row in a record set. Since our rows as sorted by ORDER_DATE descending, that would be what we want. Here is the Microsoft Description (note this is for SQL Server 2012+ and Azure):
“Accesses data from a subsequent row in the same result set without the use of a self-join in SQL Server 2016. LEAD provides access to a row at a given physical offset that follows the current row. Use this analytic function in a SELECT statement to compare values in the current row with values in a following row.”
This sounds a lot like what we want to do. There are some things to note about this windowing function. This is touted as an analytics function for working with dates and numbers (items like calculating the increase/decrease in sales period over period). But there are no rules that say the values have to be a number or date.
LEAD() also has a partner function called LAG() that does the opposite. Depending on how you are ordering your recordset you may need to use one or the other. I point that out because I have chosen to sort descending. Had I sorted ascending I would use LAG() to get the row before. Having both functions at your disposal allows you more freedom on how you order the data.
LEAD() “provides access to a row at a given physical offset that follows the current row”. In our problem, we are only interested in the one order that came in before. But we could go three orders back if we liked.
If no value is found, an optional 3rd parameter can return a default value. In our case we will just take the NULL. So, please don’t wrap this in an ISNULL().
Using one our new tools (LEAD() or LAG()) I would like to add a column that contains the next orders code. This would be the next next row in the recordset, if we sort our partitions in descending order on date. Once I have that I can compare it to the current orders code and see if the codes are different. Let’s take a look at how we add a new column (PREVIOUS_CODE) to our query.
SELECT H1.ID, H1.ORDER_ID, H1.ORDER_DATE, MAX(H1.ORDER_DATE) --Get the latest date for each ORDER OVER (PARTITION BY ORDER_ID) AS MAX_DATE, H1.CODE, LEAD(H1.CODE,1) --Get the code 1 row after the current row. OVER (PARTITION BY ORDER_ID ORDER BY ORDER_DATE DESC) AS PREVIOUS_CODE FROM cteSAMPLEDATA H1
As you can see. We now have a column that lists the previous orders code, and when there is no previous order, we get a null.
Also note that LEAD (and LAG) requires a PARTITION and an ORDER BY. This makes sense. The PARTITION creates the windows, and without order there isn’t much point in knowing what comes before, or after, the current row because it could be anything. Why? Because SQL Server does not guarantee the order data is returned unless you explicitly use an ORDER BY.
Step 3
Now we need to filter out all the unwanted records. You can’t use a HAVING or WHERE to test if the PREVIOUS_CODE is different from CODE. Below I have tried the HAVING and you see the error that I get.
The reason for that is out of the scope of this article, however it has to do with when the windowing is performed by the Query Processor.
So since that is a dud we have to come up with a way to find WHERE ORDER_DATE = MAX_DATE AND CODE != PREVIOUS_CODE. The easiest way to do that is to put the query in another CTE. Once you have wrapped up the new query we can test it directly using a simple select. Below I have put our query into a CTE called cteGET_MAX_DATE_AND_CODE. After that I select the data from it using previously mentioned WHERE clause.
WITH cteSAMPLEDATA AS ( SELECT ID, ORDER_ID, CODE, ORDER_DATE FROM (VALUES (1, 'ORDER1', 'DS', N'20151001'), (2, 'ORDER2', 'DS', N'20151001'), (5, 'ORDER2', 'DS', N'20151002'), (3, 'ORDER3', 'DS', N'20151001'), (6, 'ORDER3', 'AG', N'20151002'), (8, 'ORDER3', 'AG', N'20151003'), (4, 'ORDER4', 'DS', N'20151001'), (7, 'ORDER4', 'AG', N'20151002'), (9, 'ORDER4', 'DS', N'20151003') ) AS MYTABLE(ID, ORDER_ID, CODE, ORDER_DATE) ), cteGET_MAX_DATE_AND_CODE AS ( SELECT H1.ID, H1.ORDER_ID, H1.CODE, H1.ORDER_DATE, MAX(H1.ORDER_DATE) --Get the latest date for each ORDER OVER (PARTITION BY ORDER_ID) AS MAX_DATE, LEAD(H1.CODE,1) --Get the code 1 row before the current row. OVER (PARTITION BY ORDER_ID ORDER BY ORDER_DATE DESC) AS PREVIOUS_CODE FROM cteSAMPLEDATA H1 ) SELECT ID, ORDER_ID, CODE, PREVIOUS_CODE, ORDER_DATE, MAX_DATE FROM cteGET_MAX_DATE_AND_CODE WHERE ORDER_DATE = MAX_DATE AND CODE != PREVIOUS_CODE AND PREVIOUS_CODE IS NOT NULL;
So now we know when an order comes in and the code has changed since the last time that order came in.
Conclusion
Frequently, I see my colleagues look at windowing functions as analytic functions, for use on numbers and dates. But if you don’t limit yourself to just numbers, you would be surprised what you can accomplish. In addition Windowing Functions along with CTEs have greatly reduced the need for cursors. To repeat something written here in an article entitled Tally OH! An Improved SQL 8K "CSV Splitter" Function, "Before you can think outside the box, you must first realize... YOU'RE IN A BOX"!.