Introduction
The introduction of the ROW_NUMBER() function in SQL 2005 offers a number of significant benefits in terms of functionality and performance, which is often overlooked.
As well as standard row numbering and paging duties (which I will cover briefly in this article), the “Partitioning” clause provides the ability to produce efficient version based filtered queries. Prior to the arrival of the ROW_NUMBER() function, these queries used to have to be satisfied through less intuitive and less efficient correlated sub-queries.
All in all, the ROW_NUMBER() function should form part of every developer's toolkit, and should spring to mind naturally when faced with certain coding requirements.
In this article, I'll start by showing the basics of the ROW_NUMBER() function and how it can be employed to fulfill “page at a time” results to a client. I'll then go on to explore the real power of the function, using its PARTITION BY clause.
Row Numbering Basics
The ROW_NUMBER() function in its simplest form, provides the ability to add a “RowNumber “column to a tabular result set. Although this may seem trivial to desktop database systems such as Access, in the strongly relational world of SQL Server, adding a sequential row identifier number column to a result set has, until SQL 2005 been a challenge.
It's worth pointing out that this capability is not the same as an IDENTITY column, which is persisted in the table and fixed at row insert time (although several solutions used this approach with the use of a temporary table prior to the ROW_NUMBER() functions existence).
Rather, the ROW_NUMBER() function constructs the column on the fly as part of the query. We will see the benefits of this later on.
Let's build a simple table of Book data to use as an example:
IF OBJECT_ID('dbo.Books')IS NOT NULL DROP TABLE dbo.Books GO CREATE TABLE dbo.Books ( BookId BIGINT NOT NULL PRIMARY KEY IDENTITY(1,1), BookTitle VARCHAR(100) NOT NULL, BookEdition SMALLINT NOT NULL, BookPublishDate SMALLDATETIME NOT NULL, BookAuthor VARCHAR(100) NOT NULL ) CREATE UNIQUE INDEX uq_BookTitleEdition ON dbo.Books(BookTitle, BookEdition) INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('Eating Cheese Without Getting Killed', 1, '01 jan 2009', 'M.Mouse') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('Eating Cheese Without Getting Killed', 2, '01 jan 2010', 'M.Mouse') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('Why I Split With Minnie', 1, '01 feb 2010', 'M.Mouse') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('Why Pluto Wrecked My Life', 1, '02 mar 2010', 'M.Mouse') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('My Drug Addiction and How I Rebuilt My Life', 1, '01 apr 2010', 'M.Mouse') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('1001 Great Carrot Recipes', 1, '01 jan 2009', 'B.Bunny') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('1001 Great Carrot Recipes', 2, '01 jan 2010', 'B.Bunny') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('Counter Surveillance in Rural Locations', 1, '01 feb 2010', 'B.Bunny') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('Counter Surveillance in Rural Locations', 2, '02 mar 2010', 'B.Bunny') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('My Celebrity Tips for a Shiny Coat and Fluffy Tail', 1, '01 apr 2010', 'B.Bunny') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('Elementary Mathematics, and Numbers Higher Than Infinite', 1,'01 jan 2007', 'B.LightYear') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('Elementary Mathematics, and Numbers Higher Than Infinite', 2,'01 jan 2008', 'B.LightYear') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('Elementary Mathematics, and Numbers Higher Than Infinite', 3,'01 jan 2009', 'B.LightYear') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('How to Spot and Avoid CowBoys', 2, '02 mar 2010', 'B.LightYear') INSERT dbo.Books(BookTitle, BookEdition, BookPublishDate, BookAuthor) VALUES ('Exploring The MetaPhysical - Am I Real or Just a Toy?', 1, '01 apr 2010', 'B.LightYear')
Note: This is a simplified table design for instructional purposes. In reality, the BookAuthor field is likely to be a foreign key to another table.
Now we have our data, we can use the ROW_NUMBER() function to instruct SQL Server to return a result set in a certain order, adding a row number column in flight as it is produced.
The basic syntax of the ROW_NUMBER() function requires an ORDER BY clause which defines the order in which the row number column values are produced.
For example, let's see all the Books in BookAuthor, BookTitle, and BookEdition order:
SELECT ROW_NUMBER()OVER(ORDER BY BookAuthor, BookTitle, BookEdition)AS 'RowNumber', BookAuthor, BookTitle, BookEdition FROM dbo.Books ORDER BY BookAuthor, BookTitle, BookEdition
Notice that, should you wish, the ORDER BY clause for the main SELECT query can differ from the one contained as part of the ROW_NUMBER() function.Generally speaking, it's more intuitive and common to have the two ORDER BY clauses aligned.
I don't want to spend too much more time covering this basic syntax, as the additional partitioning functionality supplied by the ROW_NUMBER() function is more interesting, but it is easy to see using this approach makes it straight-forward to build paging style functionality into reports.
Let's create a stored procedure that shows this:
IF OBJECT_ID('dbo.usp_ShowBooks') IS NOT NULL DROP PROCEDURE dbo.usp_ShowBooks GO CREATE PROCEDURE dbo.usp_ShowBooks @PageNumber INT, @PageSize INT AS BEGIN ;WITH BookCTE (RowNumber, BookAuthor, BookTitle, BookEdition) AS ( SELECT ROW_NUMBER()OVER (ORDER BY BookAuthor, BookTitle, BookEdition), BookAuthor, BookTitle, BookEdition FROM dbo.Books ) SELECT TOP (@PageSize) BookAuthor, BookTitle, BookEdition FROM BookCTE WHERE RowNumber>((@PageNumber-1)*@PageSize) ORDER BY BookAuthor, BookTitle, BookEdition END
One limitation of the column produced by the ROW_NUMBER() function is that it can't be used directly in the WHERE clause. It's simple enough to overcome this though, and here we use a CTE (Common Table Expression) as a convenient way of referencing the RowNumber column in the subsequent WHERE clause. We could have created an intermediary temporary table to do the same thing, but a CTE is nice and neat and self-contained.
We can now use this stored procedure to pass recordsets to our front end to produce pages of data. For example, assuming the front end is expecting 5 rows of data at a time:
EXEC dbo.usp_ShowBooks @PageNumber=1, @PageSize=5 EXEC dbo.usp_ShowBooks @PageNumber=2, @PageSize=5 EXEC dbo.usp_ShowBooks @PageNumber=3, @PageSize=5
Going Further with Row Numbering and Partitioning
The partitioning capabilities of the ROW_NUMBER() function really start to show off its possibilities. This clause defines “partitions” that govern when the row number values are “reset” and start from 1 again.
For example, let's say we wanted to show all books by Author, with their books numbered in order of creation date:
SELECT ROW_NUMBER()OVER(PARTITION BY BookAuthor ORDER BY BookPublishDate) AS ' RowNumber', BookAuthor, BookPublishDate, BookTitle, BookEdition FROM dbo.Books ORDER BY BookAuthor, BookPublishDate
By “partitioning” by BookAuthor, the RowNumber values are reset back to 1 for each change in Author. The ORDER BY clause of the function ensures that the first book published by each author gets assigned RowNumber 1.
This was a hypothetical example to show the syntax and results.
Now for a more useful example - we want to know the latest edition of each book:
;WITHBookCTE (RowNumber, BookTitle, BookEdition, BookPublishDate, BookAuthor) AS ( SELECT ROW_NUMBER()OVER (PARTITION BY BookTitle ORDER BY BookEdition DESC), BookTitle, BookEdition, BookPublishDate, BookAuthor FROM dbo.Books ) SELECT BookTitle, BookEdition, BookPublishDate, BookAuthor FROM BookCTE WHERE RowNumber=1 ORDER BY BookTitle
This time, we partition by BookTitle, and order by BookEdition DESCENDING, so that the latest edition is always returned as RowNumber=1.
We then wrap this in a CTE as before and filter WHERE RowNumber=1.
This is logically simple to conceive, and also more efficient than a correlated sub-query equivalent. Here is the SET STATISTICS IO output for the query above.
Table 'Books'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Here is an equivalent correlated sub-query solution without using ROW_NUMBER():
SELECT BookTitle, BookEdition, BookPublishDate, BookAuthor FROM dbo.Books b WHERE BookEdition=(SELECT MAX(BookEdition)FROM dbo.Books WHERE BookTitle=b.BookTitle) Table 'Books'. Scan count 1, logical reads 42, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
2 logical reads versus 42. This is on a table with only 15 rows!
Before we move onto some other real world examples where I have recruited the ROW_NUMBER() function, let's return the latest book by each author, because it's so easy:
;WITH BookCTE (RowNumber, BookTitle, BookEdition, BookPublishDate, BookAuthor) AS ( SELECT ROW_NUMBER()OVER (PARTITION BY BookAuthor ORDER BY BookPublishDate DESC), BookTitle, BookEdition, BookPublishDate, BookAuthor FROM dbo.Books ) SELECT BookTitle, BookEdition, BookPublishDate, BookAuthor FROM BookCTE WHERE RowNumber=1 ORDER BY BookTitle
Real World Examples
Deals
Let's say we have a dbo.Deal table, holding financial Deal attributes. The primary key is DealID, and the logical (natural) identifier for each Deal is DealCode. There are multiple records for each DealCode, identified by the DealVersion column, added as the Deal attributes may evolve over time. In addition, multiple DealCodes can exist across different SourceIDs, identifying the source system where the Deal was originally created.
A small sample of the Deal table looks like this, with some other interesting attributes shown:
There's currently over 13 million records in the Deal table, so we need to be as efficient as possible when querying it (as always).
We also have a temporary table we've pre-built (called #PNLData) containing valuation data that includes the DealCode and SourceId. Now we want to “look up” the latest versions of these Deals to return the latest Deal information attributes alongside our valuation data:
;WITH PNLWithDealInfo(RowNumber, DealCode, DealVersion, SourceId, EffectiveDate, MaturityDate, LocalValue, LocalCurrencyCode) AS ( SELECT ROW_NUMBER() OVER(PARTITION BY d.DealCode, d.SourceId ORDER BY d.DealVersion DESC), d.DealCode, d.DealVersion, d.SourceId, d.EffectiveDate, d.MaturityDate, t.LocalValue, t.LocalCurrencyCode FROM dbo.Deal d INNER JOIN #PNLData t ON d.DealCode=t.DealCode AND d.SourceId=t.SourceId ) SELECT DealCode, DealVersion, SourceId, EffectiveDate, MaturityDate, LocalValue, LocalCurrencyCode FROM PNLWithDealInfo WHERE RowNumber=1
(results abridged)
Hedges
Although not required to follow the example, for background information, a “Hedge” is defined in simple terms as a collection of assets and risks with similar but offsetting characteristics. They are used, among other things, to offset market risk and provide accounting benefits.
Let's say we have a Hedge state table (dbo.hfv_hedge_rel_state) that stores historical Hedge information against monthly reporting dates.
Each Hedge (designated by a hfv_hedge_rel_id) record either continues as status “Active” in subsequent months, or is “Dedesignated” and no subsequent entries for this Hedge are added.
For example, the following records are present for Hedge 5815:
We can construct the following useful view to return the latest Hedge status information for each Hedge using the ROW_NUMBER() function and CTE construction we used earlier:
CREATE VIEW dbo.v_hfv_latest_hedge_status AS WITH LatestHedgeStatus (RowNumber, hfv_hedge_rel_state_id, hfv_hedge_rel_id, reporting_date, effective_percent, hedge_rel_status_code) AS ( SELECT ROW_NUMBER() OVER(PARTITION BY hfv_hedge_rel_id ORDER BY reporting_date DESC), hfv_hedge_rel_state_id, hfv_hedge_rel_id, reporting_date, effective_percent, hedge_rel_status_code FROM dbo.hfv_hedge_rel_state ) SELECT hfv_hedge_rel_state_id, hfv_hedge_rel_id, reporting_date, effective_percent, hedge_rel_status_code FROM LatestHedgeStatus WHERE RowNumber=1 GO
Now, we simply use the view to query for the Hedge we were looking at earlier:
SELECT hfv_hedge_rel_id, reporting_date, effective_percent, hedge_rel_status_code FROM dbo.v_hfv_latest_hedge_status WHERE hfv_hedge_rel_id=5815
Conclusion
The ROW_NUMBER() function offers us an intuitive and efficient method to return subsets of versioned data, “partitioned” to suit our requirements. The ROW_NUMBER() based solutions are easier to read, intuitive to write, and more efficient than correlated sub-query equivalents.
If you find yourself writing a correlated sub-query involving MAX or MIN, you should take a look at ROW_NUMBER() first.