Two of the coolest additions to T-SQL in SQL 2005 are Common Table Expressions (CTEs), and the Row_number() function. In this article we're going to see how you can use these two features together to provide an elegant solution to an age old problem.
First an extremely quick look at each of them.
Common Table Expressions
A CTE is a temporary result set, which is valid for the scope of a single Select Update Insert or Delete statement. I've heard it said that this very restricted scope seriously limits their usefulness, and that it's only function is to improve the readability of your SQL. (Writing recursive queries is the obvious area where CTEs come in to play - but I won't be covering this in this article.)
I hope to show in this article that the CTEs usefulness is not purely cosmetic.
One key point that is often overlooked is that the resultset can be referenced multiple times in the S/U/I/D statement. We'll see this fact being put to use later on.
Row_number()
The row_number() function has been on most SQL Developers Christmas list for many years now. It provides a sequential row number for each record in a result set. An ordering must be specified (to give a basis to the numbering), and optionally a partition may be specified to split the result set numbering. We'll look more closely at what this means with our example.
The Price History example
Our retail client is storing a history of price changes in a table PriceHistory. For each item (in the Items table) we can see the initial price, and an additional record for each subsequent price change. The table creation script is available below in the Resources section.
Data from the PriceHistory table
The client would like to see a report showing the Item name, the old price, the new price and the date range for which the new price was applied. This type of question has often given headaches to sql developers, as it usually involved messy subqueries using the max / min functions. Some DBAs prefer to store both the start date and end date in the table, but this too causes problems, as it is difficult to keep subsequent records in synch, especially when a modification occurs.
Item | Old Price | Range Price | Start Date | End Date |
---|---|---|---|---|
vacuum cleaner | 250.00 | 2004-03-01 | 2005-06-15 | |
vacuum cleaner | 250.00 | 219.99 | 2005-06-15 | 2007-01-03 |
vacuum cleaner | 219.99 | 189.99 | 2007-01-03 | 2007-02-03 |
vacuum cleaner | 189.99 | 200.00 | 2007-02-03 | |
washing machine | 650.00 | 2006-07-12 | 2007-01-03 | |
washing machine | 650.00 | 550.00 | 2007-01-03 | |
toothbrush | 1.99 | 2005-01-01 | 2006-01-01 | |
toothbrush | 1.99 | 1.79 | 2006-01-01 | 2007-01-01 |
toothbrush | 1.79 | 1.59 | 2007-01-01 | 2008-01-01 |
toothbrush | 1.59 | 1.49 | 2008-01-01 |
The format we're aiming for
If we examine one line from the target report, we can see that it includes information from 3 consecutive rows in the table. The key to solving this problem lies in linking the current row to the previous row and to the next row.
vacuum cleaner | 219.99 | 189.99 | 2007-01-03 | 2007-02-03 |
ItemId | PriceStartDate | Price |
---|---|---|
1 | 2004-03-01 | 250.00 |
1 | 2005-06-15 | 219.99 |
1 | 2007-01-03 | 189.99 |
1 | 2007-02-03 | 200.00 |
2 | 2006-07-12 | 650.00 |
2 | 2007-01-03 | 550.00 |
3 | 2005-01-01 | 1.99 |
3 | 2006-01-01 | 1.79 |
3 | 2007-01-01 | 1.59 |
3 | 2008-01-01 | 1.49 |
The first step is to construct a CTE which numbers the rows of the data table. (Note that the semi-colon is required when the CTE is not the first statement in a batch.)
;WITH PriceCompare AS ( SELECT i.Item, ph.ItemId, ph.PriceStartDate, ph.Price, ROW_NUMBER() OVER (Partition BY ph.ItemId ORDER BY PriceStartDate) AS rownum FROM Items i INNER JOIN PriceHistory ph ON i.ItemId = ph.ItemId) SELECT * FROM PriceCompare
In the CTE, I add a new column, rownum, whose value comes from the row_number() function. You can see that I've partitioned by ItemId which means that numbering recommences with each new item. I've ordered on the PriceStartDate column, to tell the CTE how it should apply the numbering i.e. the earliest PriceStartDate for an item will have number 1 etc..
Item | ItemId | PriceStartDate | Price | rownum |
---|---|---|---|---|
vacuum cleaner | 1 | 2004-03-01 | 250.00 | 1 |
vacuum cleaner | 1 | 2005-06-15 | 219.99 | 2 |
vacuum cleaner | 1 | 2007-01-03 | 189.99 | 3 |
vacuum cleaner | 1 | 2007-02-03 | 200.00 | 4 |
washing machine | 2 | 2006-07-12 | 650.00 | 1 |
washing machine | 2 | 2007-01-03 | 550.00 | 2 |
toothbrush | 3 | 2005-01-01 | 1.99 | 1 |
toothbrush | 3 | 2006-01-01 | 1.79 | 2 |
toothbrush | 3 | 2007-01-01 | 1.59 | 3 |
toothbrush | 3 | 2008-01-01 | 1.49 | 4 |
Now, I'll enhance the SELECT statement which follows the CTE.
SELECT currow.Item, prevrow.Price AS OldPrice, currow.Price AS RangePrice, currow.PriceStartDate AS StartDate, nextrow.PriceStartDate AS EndDate FROM PriceCompare currow LEFT JOIN PriceCompare nextrow ON currow.rownum = nextrow.rownum - 1 AND currow.ItemId = nextrow.ItemId LEFT JOIN PriceCompare prevrow ON currow.rownum = prevrow.rownum + 1 AND currow.ItemId = prevrow.ItemId
Note that I use Left Joins as for the first row for an item there is no previous row, just as the last row has no next row.
I use the aliases before each field to denote which row the data should come from. Thus PriceStartDate from the current row is the Start Date while the same field from the next row is the End Date.
Running the query now, gives the resultset required for the Price History report. An additional bonus is that you can wrap up the whole CTE in a view!
CREATE VIEW [dbo].[PriceCompare] AS WITH PriceCompare AS ( SELECT i.Item, ph.ItemId, ph.PriceStartDate, ph.Price, ROW_NUMBER() OVER (Partition BY ph.ItemId ORDER BY PriceStartDate) AS rownum FROM Items i INNER JOIN PriceHistory ph ON i.ItemId = ph.ItemId ) SELECT currow.Item, prevrow.Price AS OldPrice, currow.Price AS RangePrice, currow.PriceStartDate AS StartDate, nextrow.PriceStartDate AS EndDate FROM PriceCompare currow LEFT JOIN PriceCompare nextrow ON currow.rownum = nextrow.rownum - 1 AND currow.ItemId = nextrow.ItemId LEFT JOIN PriceCompare prevrow ON currow.rownum = prevrow.rownum + 1 AND currow.ItemId = prevrow.ItemId
I hope to have shown with this article why you should be using CTEs in your applications, and not just as a way to tidy up your SQL.