A while ago Robert Cary posted an article on SQL Server Central entitled 2005 Paging – The Holy Grail which is, as the title would suggest about paging in SQL Server. This article provoked some really interesting chat around the subject and is well worth a read.
This is now a lot easier in SQL Server 2012 with the introduction of the OFFSET extension to the ORDER BY clause, but what is the most optimal method is you are not using 2012 ?
Well, whilst playing around by the OFFSET portion of my “What’s new in SQL Server 2012 – TSQL” presentation, I hit on a different method that I’ve not seen published before.
Now whilst finding which rows are on which page is a problem, it is only part of a much wider problem, that being that cost of the lookups to find other related data. For example: You have a list of people which you are paging through in the order of LastName, but you also wish to display FirstName. That is not in your index and so a key lookup occurs, OK I could INCLUDE it in the index but im just simplifying the problem.
So, to demonstrate this I need to create an index on Person.Person in AdventureWorks.
Create index idxLastName on Person.Person(LastName)
The query for the “holy grail” method would look something like this :
with ctePaging as ( Select LastName,FirstName, row_number() over (order by LastName,BusinessEntityID)-1 as RowN from Person.Person ) Select * from ctePaging where RowN between 20 and 39 order by RowN;
The issue here is that SQL Server has initiated an index scan (against a different index than the one we created) and had to process all the rows in the table and then sort them.
We only want 20 rows returned so this is quite a lot of wasted effort on the engine's part.
OFFSET has been introduced in 2012 and running the equivalent query of :
Select LastName,FirstName,BusinessEntityID from Person.Person order by LastName,BusinessEntityID offset 20 rows fetch next 20 rows only;
Gives us the query plan of :
Even this is non-optimal though, as the key lookup has occurred 40 times , even though we only needed the data (in this case FirstName) for 20 rows.
This can be resolved by doing the key lookup yourself.
with cteKeySeek as ( Select LastName,BusinessEntityID from Person.Person order by LastName,BusinessEntityID offset 20 rows fetch next 20 rows only ) Select cteKeySeek.LastName, FirstName, cteKeySeek.BusinessEntityID from cteKeySeek inner join Person.Person on cteKeySeek.BusinessEntityID = Person.BusinessEntityID order by cteKeySeek.LastName,FirstName,cteKeySeek.BusinessEntityID;
Even though its longer, wordier and involves a join , it is more efficient as the join has replaced the key lookup and it is now only occurring on the 20 rows of data that we need
Quite neat hey ? When using OFFSET it is important to remember that no magic is happening, SQL Server still has to ‘count’ and scan through the rows that are not to be processed before it can decide which ones it does need.
A comparable query for previous versions and taking the lead from the holy grail method would be :
with cteKeySeek as ( Select BusinessEntityID,LastName, row_number() over (order by LastName,BusinessEntityID)-1 as RowN from Person.Person ) Select cteKeySeek.LastName,FirstName,cteKeySeek.BusinessEntityID ,RowN from cteKeySeek inner loop join Person.Person on cteKeySeek.BusinessEntityID = Person.BusinessEntityID where RowN >= 20 and rown<=39 order by LastName,BusinessEntityID;
Which does similarly filter the rows before doing the index lookup
It does however still involve a scan of 19,972 rows of which 19,932 are irrelevant to our final result set. You may of noticed in the OFFSET versions that the TOP operator is used to filter the data and ‘stop’ the scan once it has reached the last row that we are interested in. What if we could do something similar.
What about this ?:
with cteKeySeek as ( Select BusinessEntityID,LastName, row_number() over (order by LastName,BusinessEntityID)-1 as RowN from Person.Person ) Select top(20) cteKeySeek.LastName,FirstName,cteKeySeek.BusinessEntityID ,RowN from cteKeySeek inner loop join Person.Person on cteKeySeek.BusinessEntityID = Person.BusinessEntityID where RowN >= 20 and RowN<=39 order by LastName,BusinessEntityID;
That does have the rather interesting effect of doing exactly that:
So, this is looking (at least in-terms of rowcounts) very similar to the OFFSET functionality. If we look for a page of data further on (rows 200 to 219) and look at an profiler trace we can see how the three type of query compare.
So as you can see over a medium size (ish) dataset the fake and real offset are comparable in terms of IO.
Hope this helps someone, who needs to do paging