SQL Server 2005 Paging – The Holy Grail

Question

SQL Server 2005 Paging – The Holy Grail

Viewing 15 posts - 31 through 45 (of 64 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004498 More actions · Answer 1

C. Westra (3/12/2009)
I'm afraid differences in resources are partly induced by Sql Server caching (intermediate) results between the first and latter queries.
If you use DBCC DROPCLEANBUFFERS between the queries, caches are emptied and differences in time and physical reads may better reflect actual use of resources.

Heh... or do what I did... just run the code more than once so you can see that even in the face of caching, some solutions just shouldn't be used. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004498 More actions · Answer 2

Jeff Moden (3/12/2009)
C. Westra (3/12/2009)
I'm afraid differences in resources are partly induced by Sql Server caching (intermediate) results between the first and latter queries.
If you use DBCC DROPCLEANBUFFERS between the queries, caches are emptied and differences in time and physical reads may better reflect actual use of resources.
Heh... or do what I did... just run the code more than once so you can see that even in the face of caching, some solutions just shouldn't be used. 😉

... and, just to drive the point home, here's the code with both cache clearing commands in it just before each test run AND I put the "Holy Grail" code as the last code to give it the absolute best chance AND I listed the results from the 1st run after I rebuild the table. Caching has nothing to do with how bad the "Holy Grail" method takes a beating...

--===== Define the starting row and page size

DECLARE @StartRow INT ; SET @StartRow = 900000

DECLARE @PageSize INT ; SET @PageSize = 50

PRINT '--============================================================================='

PRINT '-- The "No RBAR/No Join" method'

PRINT '--============================================================================='

--===== Clear the guns

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

--===== Turn on the timers

SET STATISTICS IO ON

SET STATISTICS TIME ON

--===== The "No RBAR/No Join" method

;WITH

cteCols AS

(

SELECT NULL AS SomeInt, NULL AS SomeLetters2, 0 AS Seq, Rows AS TotRows

FROM sys.Partitions

WHERE Object_ID = OBJECT_ID('dbo.JBMTest')

AND Index_ID = 1

UNION ALL --------------------------------------------------------------------

SELECT SomeInt, SomeLetters2,

ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,

NULL AS TotRows

FROM dbo.JBMTest

)

SELECT Seq, SomeInt, SomeLetters2, TotRows

FROM cteCols

WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1

OR Seq = 0

ORDER BY Seq

--===== Turn off the timers

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT '--============================================================================='

PRINT '-- A different No Join method'

PRINT '--============================================================================='

--===== Clear the guns

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

--===== Turn on the timers

SET STATISTICS IO ON

SET STATISTICS TIME ON

--===== A different No Join method

;WITH

cteCols AS

(

SELECT SomeInt, SomeLetters2,

ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,

NULL AS TotRows

FROM dbo.JBMTest

)

SELECT Seq, SomeInt, SomeLetters2, (SELECT Rows

FROM sys.Partitions

WHERE Object_ID = OBJECT_ID('dbo.JBMTest')

AND Index_ID = 1) AS TotRows

FROM cteCols

WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1

OR Seq = 0

ORDER BY Seq

--===== Turn off the timers

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT '--============================================================================='

PRINT '-- Peso''s Embedded "2 Bite" method'

PRINT '--============================================================================='

--===== Clear the guns

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

--===== Turn on the timers

SET STATISTICS IO ON

SET STATISTICS TIME ON

--===== Embedded "2 Bite" method

;WITH

cteCols AS

(

SELECT SomeInt, SomeLetters2,

ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,

NULL AS TotRows

FROM dbo.JBMTest

)

SELECT Seq, SomeInt, SomeLetters2, (SELECT COUNT(*) FROM dbo.JBMTest) AS TotRows

FROM cteCols

WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1

OR Seq = 0

ORDER BY Seq

--===== Turn off the timers

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

PRINT '--============================================================================='

PRINT '-- The "Holy Grail" method'

PRINT '--============================================================================='

--===== Clear the guns

DBCC FREEPROCCACHE

DBCC DROPCLEANBUFFERS

--===== Turn on the timers

SET STATISTICS IO ON

SET STATISTICS TIME ON

--===== The "Holy Grail" method of getting a page of info

;WITH

cteCols AS

(

SELECT SomeInt, SomeLetters2,

ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,

ROW_NUMBER() OVER(ORDER BY SomeInt DESC, SomeLetters2 DESC) AS TotRows

FROM dbo.JBMTest

)

SELECT Seq, SomeInt, SomeLetters2, TotRows + Seq - 1 AS TotRows

FROM cteCols

WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1

ORDER BY Seq

--===== Turn off the timers

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

Here're the results...

--=============================================================================

-- The "No RBAR/No Join" method

--=============================================================================

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(51 row(s) affected)

Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'sysrowsets'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 1391 ms, elapsed time = 1653 ms.

--=============================================================================

-- A different No Join method

--=============================================================================

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(50 row(s) affected)

Table 'sysrowsets'. Scan count 50, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 1203 ms, elapsed time = 1371 ms.

--=============================================================================

-- Peso's Embedded "2 Bite" method

--=============================================================================

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(50 row(s) affected)

Table 'JBMTest'. Scan count 2, logical reads 3970, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 1453 ms, elapsed time = 2253 ms.

--=============================================================================

-- The "Holy Grail" method

--=============================================================================

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(50 row(s) affected)

Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 7859 ms, elapsed time = 12073 ms.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

R Michael SSCarpal Tunnel Points: 4211 More actions · Answer 3

Thank you, everyone, for the eye-opening research. You guys have contributed some caveats to consider when trying to identify the best solution.

Steve's lead-in was a bit misleading in that I do not consider the 'holy grail' method to be "the best" in all cases (although it remains the best in cases that I've had to work with). I was defining the 'holy grail' as being able to get the count and the page without adding a bunch of I/O overhead.

Every time I've needed to implement server-side paging, the code would be paging the results of a query, sometimes simple, sometimes complex. As a result Jeff's 'no join' methods are out the window, which is a shame because that's a very cool trick! In the edge cases, these queries would page tens of thousands of records (nowhere near the 1-2 million record test cases we've seen on this thread, although I should have done tests like that for this article)

I started looking for a better approach to getting a count because the underlying query that had to be returned to the user was very complex and expensive. Doing the 'embedded two-bite' method or others add considerable cost and time to execute the query.

For my ends, the 'holy grail' approach is absolutely the fastest and most efficient technique and I am convinced that there are other people out there who can benefit from this technique (which is why I wrote the article) Will this approach be the best in all cases? No

There is definitely a lot of good feedback and information in this thread. I'll update the article with the information you shared.

SQL guy and Houston Magician

Jeff Moden SSC Guru Points: 1004498 More actions · Answer 4

For my ends, the 'holy grail' approach is absolutely the fastest and most efficient technique

Heh... sure... 7 to 8 seconds compared to just over a second. I'm sure your customers will think the same thing while they're waiting. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

R Michael SSCarpal Tunnel Points: 4211 More actions · Answer 5

Jeff Moden (3/13/2009)
Heh... sure... 7 to 8 seconds compared to just over a second. I'm sure your customers will think the same thing while they're waiting. 😉

LOL Thankfully the queries being paged aren't quite as large as those you used to give my technique a beating. 😉

This is the right tool for me only because the paged result sets are fairly small, and the cost of the query that builds the resultset is what causes my customers to wait.

Your comments caused me to take another look at my procs, and the worst case scenario is elapsed time overhead in the tens of milliseconds. I, and my customers, can live with that.

I've submitted a minor revision to the article, suggesting that this approach is only beneficial in a narrow set of circumstances.

SQL guy and Houston Magician

aaron.bertrand Ten Centuries Points: 1260 More actions · Answer 6

Just for the benefit of the folks talking about grabbing the count of all rows in the table separately, note that in a lot of cases we can't look at sys.indexes or the more reliable sys.dm_db_partition_stats because our queries have WHERE clauses or are JOINed to other tables.

Jeff Moden SSC Guru Points: 1004498 More actions · Answer 7

Robert Cary (3/14/2009)
Jeff Moden (3/13/2009)
Heh... sure... 7 to 8 seconds compared to just over a second. I'm sure your customers will think the same thing while they're waiting. 😉
LOL Thankfully the queries being paged aren't quite as large as those you used to give my technique a beating. 😉
This is the right tool for me only because the paged result sets are fairly small, and the cost of the query that builds the resultset is what causes my customers to wait.
Your comments caused me to take another look at my procs, and the worst case scenario is elapsed time overhead in the tens of milliseconds. I, and my customers, can live with that.
I've submitted a minor revision to the article, suggesting that this approach is only beneficial in a narrow set of circumstances.

Heh... understood... but the problem is that other folks, who may not have such limited numbers of rows, are reading your good article and I want them to know that your method is rather slow in the face of scalability.

To prevent problems like this, I will always write code against a million row test table. It's a good habit to get into so that you're not called up at midnight on the last day of each month by your rather ticked of boss wondering why your code has, yet again, caused him to miss the SLA deadlines for month end runs.

Never justify performance challenged code with low row counts. It's an accident waiting to happen.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

alan.kelly.london Valued Member Points: 69 More actions · Answer 8

The original article was very well written and it's sooo nice to statistics to back up the arguments presented.

However, having read the forums here I have to agree with Peso that overall time is perhaps more important than just I/O reads, certainly in terms of customer experience anyway!

I've tried Peso's method on several tables of real data with rows ranging from a few hundred to over 500,000 and in every single case Peso's method is quicker by a factor of at least 10.

Thanks for the post, it has made a massive difference to my paging search times.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 9

Thank you for the feedback!

N 56°04'39.16"
E 12°55'05.25"

sicsemperspam SSC Enthusiast Points: 110 More actions · Answer 10

Hello all,

After benefitting from the code examples and explanations above, I thought I'd make a contribution.

Below is a generic stored procedure that will retrieve the page of rows based on the parameters fed to it. It determines whether the table called for is in fact a table or a view. If it's a table, it uses the sys-tables to get the total row count (thanks to a post above this one!) and if it's a view, it uses the original strategy of calculating the inverse row # for a given row. I didn't compute the total rows because I thought whatever was calling it would only have to make the computation once rather than once-per-row if I had done it here.

Before the Efficiency Police hall me away, this was slapped together as a proof of concept and is not (in any way) presented as the best way of doing this. And there is a rule. Your ORDER-BY clause must include ASC where appropriate. This made inverting the order-by merely ugly, down from absolutely hideous.

[Code]

set ANSI_NULLS ON

set QUOTED_IDENTIFIER ON

go

ALTER PROCEDURE [dbo].[GetPage]

-- Add the parameters for the stored procedure here

(

@TableName varchar(1000)

,@Fields varchar(1000)

,@OrderBy varchar(1000)

,@StartPosition int

,@PageLen int

)

AS

Begin

DECLARE

@strSQL varchar(max)

,@RowCount int

,@RightNow DATETIME

,@TableType varchar(20)

,@RevOrderBy varchar(1000)

Set @RightNow=GETDATE()

Set @PageLen = @PageLen - 1

if not EXISTS (select * from INFORMATION_SCHEMA.tables where table_name = @TableName)

begin

Select 'Table/View not found' as Error

return (-1)

end

select @TableType = TABLE_TYPE from INFORMATION_SCHEMA.tables where table_name = @TableName

if @TableType = 'BASE TABLE'

begin

SELECT @RowCount=P.Rows

FROM sys.partitions P

INNER JOIN sys.indexes I ON (P.object_id = I.object_id) AND (P.index_id = I.index_id)

INNER JOIN sys.objects O ON (P.object_id = O.object_id)

WHERE (I.type IN(0,1)) AND (O.name = PARSENAME(@TableName, 1) AND

O.[schema_id] = IsNull(SCHEMA_ID(PARSENAME(@TableName, 2)), O.[schema_id]))

ORDER BY O.name

set @strSQL =' SELECT * ' +

', ' + convert(varchar(10),@RowCount) + ' as [RowCount] ' +

' FROM (SELECT ' + @Fields + ', ROW_NUMBER() OVER(ORDER BY ' + @OrderBy + ') AS RowNumber ' +

' FROM [' + @TableName + ']) as [' + @TableName + '_A] ' +

' WHERE RowNumber BETWEEN ' + convert(varchar(10),@StartPosition) +

' AND ' + convert(varchar(10),(@StartPosition + @PageLen)) + ' ORDER BY RowNumber'

end

else

begin

set @RevOrderBy = upper(@OrderBy) + ','

set @RevOrderBy = replace(replace(replace(@RevOrderBy ,' ASC,',' ~A,'),' DESC,',' ASC,'),' ~A',' DESC')

set @RevOrderBy = left(@RevOrderBy ,len(@RevOrderBy)-1)

set @strSQL =' SELECT *' +

' FROM (' +

'SELECT ' + @Fields + ', ROW_NUMBER() OVER(ORDER BY ' + @OrderBy + ') AS RowNumber, ' +

' ROW_NUMBER() OVER(ORDER BY ' + @RevOrderBy + ') AS RowNumberInverse ' +

' FROM [' + @TableName + ']) as [' + @TableName + '_A] ' +

' WHERE RowNumber BETWEEN ' + convert(varchar(10),@StartPosition) +

' AND ' + convert(varchar(10),(@StartPosition + @PageLen)) + ' ORDER BY RowNumber'

end

--SELECT SQL=@strSQL

EXEC(@strSQL)

End

[/code]

Enjoy!

fred_ch9400_cn SSC Rookie Points: 49 More actions · Answer 11

werner.broser said

Nice method, but works only for unique fields!

-- I agree.

When the order-expression does not bear an unique value, "The Holy Grail" can make mistakes:

DECLARE @t1 TABLE ([x] INT, [y] INT)

INSERT INTO @t1 ([x], [y])

SELECT 1, 1

UNION ALL

SELECT 1, 2

SELECT *

FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY [x] ASC)

+ ROW_NUMBER() OVER (ORDER BY [x] DESC)

- 1 AS [Sum]

FROM @t1) t

The query result gives

x y Sum

1 1 1

1 2 3

which obviously cannot satisfy us.

For rows over which order-expression gives the same value, their relative order will not change regardless of "ASC" or "DESC" option, but only following the order how they are physiclely stored. The previous example proves this.

We should know this limit and do not apply "The Holy Grail" in the wrong place.

peter.krickner SSC Enthusiast Points: 144 More actions · Answer 12

Thank you for this. I've used it now, but haven't studied reads and performance gains (or losses) yet. Realized though that the column(s) used for sorting has to be unique in order for the total row count to be accurate. If there are several equal records in the sorted column, the total row count will not be accurate. You have to make sure that the sorting is 100% opposite to the initial sorting. Just a simple addition to the solution after trying it out. Thank you again!

/ Peter.

peter.krickner SSC Enthusiast Points: 144 More actions · Answer 13

Sorry for stating an already noticed "feature". Only read the earliest posts...

th-spam SSC Journeyman Points: 78 More actions · Answer 14

Really nice solution for the total row count and paging problem; and you can't miss the analogy to the classical Gauss argument for the sum of the N first integers 1+2+..+N= (N+1)*N / 2

1 + 2 + ... N

N + (N-1) ... 1

🙂

ningaraju.n SSCrazy Points: 2322 More actions · Answer 15

Hi Friend,

Can you get me the other alternative not using trigger