March 12, 2009 at 9:10 pm
C. Westra (3/12/2009)
I'm afraid differences in resources are partly induced by Sql Server caching (intermediate) results between the first and latter queries.If you use DBCC DROPCLEANBUFFERS between the queries, caches are emptied and differences in time and physical reads may better reflect actual use of resources.
Heh... or do what I did... just run the code more than once so you can see that even in the face of caching, some solutions just shouldn't be used. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
March 12, 2009 at 9:26 pm
Jeff Moden (3/12/2009)
C. Westra (3/12/2009)
I'm afraid differences in resources are partly induced by Sql Server caching (intermediate) results between the first and latter queries.If you use DBCC DROPCLEANBUFFERS between the queries, caches are emptied and differences in time and physical reads may better reflect actual use of resources.
Heh... or do what I did... just run the code more than once so you can see that even in the face of caching, some solutions just shouldn't be used. 😉
... and, just to drive the point home, here's the code with both cache clearing commands in it just before each test run AND I put the "Holy Grail" code as the last code to give it the absolute best chance AND I listed the results from the 1st run after I rebuild the table. Caching has nothing to do with how bad the "Holy Grail" method takes a beating...
--===== Define the starting row and page size
DECLARE @StartRow INT ; SET @StartRow = 900000
DECLARE @PageSize INT ; SET @PageSize = 50
PRINT '--============================================================================='
PRINT '-- The "No RBAR/No Join" method'
PRINT '--============================================================================='
--===== Clear the guns
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON
--===== The "No RBAR/No Join" method
;WITH
cteCols AS
(
SELECT NULL AS SomeInt, NULL AS SomeLetters2, 0 AS Seq, Rows AS TotRows
FROM sys.Partitions
WHERE Object_ID = OBJECT_ID('dbo.JBMTest')
AND Index_ID = 1
UNION ALL --------------------------------------------------------------------
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
NULL AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
OR Seq = 0
ORDER BY Seq
--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT '--============================================================================='
PRINT '-- A different No Join method'
PRINT '--============================================================================='
--===== Clear the guns
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON
--===== A different No Join method
;WITH
cteCols AS
(
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
NULL AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, (SELECT Rows
FROM sys.Partitions
WHERE Object_ID = OBJECT_ID('dbo.JBMTest')
AND Index_ID = 1) AS TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
OR Seq = 0
ORDER BY Seq
--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT '--============================================================================='
PRINT '-- Peso''s Embedded "2 Bite" method'
PRINT '--============================================================================='
--===== Clear the guns
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON
--===== Embedded "2 Bite" method
;WITH
cteCols AS
(
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
NULL AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, (SELECT COUNT(*) FROM dbo.JBMTest) AS TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
OR Seq = 0
ORDER BY Seq
--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT '--============================================================================='
PRINT '-- The "Holy Grail" method'
PRINT '--============================================================================='
--===== Clear the guns
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON
--===== The "Holy Grail" method of getting a page of info
;WITH
cteCols AS
(
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
ROW_NUMBER() OVER(ORDER BY SomeInt DESC, SomeLetters2 DESC) AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, TotRows + Seq - 1 AS TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
ORDER BY Seq
--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
Here're the results...
--=============================================================================
-- The "No RBAR/No Join" method
--=============================================================================
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(51 row(s) affected)
Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysrowsets'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1391 ms, elapsed time = 1653 ms.
--=============================================================================
-- A different No Join method
--=============================================================================
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(50 row(s) affected)
Table 'sysrowsets'. Scan count 50, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1203 ms, elapsed time = 1371 ms.
--=============================================================================
-- Peso's Embedded "2 Bite" method
--=============================================================================
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(50 row(s) affected)
Table 'JBMTest'. Scan count 2, logical reads 3970, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1453 ms, elapsed time = 2253 ms.
--=============================================================================
-- The "Holy Grail" method
--=============================================================================
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
(50 row(s) affected)
Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 7859 ms, elapsed time = 12073 ms.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 13, 2009 at 11:41 am
Thank you, everyone, for the eye-opening research. You guys have contributed some caveats to consider when trying to identify the best solution.
Steve's lead-in was a bit misleading in that I do not consider the 'holy grail' method to be "the best" in all cases (although it remains the best in cases that I've had to work with). I was defining the 'holy grail' as being able to get the count and the page without adding a bunch of I/O overhead.
Every time I've needed to implement server-side paging, the code would be paging the results of a query, sometimes simple, sometimes complex. As a result Jeff's 'no join' methods are out the window, which is a shame because that's a very cool trick! In the edge cases, these queries would page tens of thousands of records (nowhere near the 1-2 million record test cases we've seen on this thread, although I should have done tests like that for this article)
I started looking for a better approach to getting a count because the underlying query that had to be returned to the user was very complex and expensive. Doing the 'embedded two-bite' method or others add considerable cost and time to execute the query.
For my ends, the 'holy grail' approach is absolutely the fastest and most efficient technique and I am convinced that there are other people out there who can benefit from this technique (which is why I wrote the article) Will this approach be the best in all cases? No
There is definitely a lot of good feedback and information in this thread. I'll update the article with the information you shared.
SQL guy and Houston Magician
March 13, 2009 at 10:38 pm
For my ends, the 'holy grail' approach is absolutely the fastest and most efficient technique
Heh... sure... 7 to 8 seconds compared to just over a second. I'm sure your customers will think the same thing while they're waiting. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
March 14, 2009 at 1:16 am
Jeff Moden (3/13/2009)
Heh... sure... 7 to 8 seconds compared to just over a second. I'm sure your customers will think the same thing while they're waiting. 😉
LOL Thankfully the queries being paged aren't quite as large as those you used to give my technique a beating. 😉
This is the right tool for me only because the paged result sets are fairly small, and the cost of the query that builds the resultset is what causes my customers to wait.
Your comments caused me to take another look at my procs, and the worst case scenario is elapsed time overhead in the tens of milliseconds. I, and my customers, can live with that.
I've submitted a minor revision to the article, suggesting that this approach is only beneficial in a narrow set of circumstances.
SQL guy and Houston Magician
March 20, 2009 at 1:08 pm
Just for the benefit of the folks talking about grabbing the count of all rows in the table separately, note that in a lot of cases we can't look at sys.indexes or the more reliable sys.dm_db_partition_stats because our queries have WHERE clauses or are JOINed to other tables.
March 20, 2009 at 4:00 pm
Robert Cary (3/14/2009)
Jeff Moden (3/13/2009)
Heh... sure... 7 to 8 seconds compared to just over a second. I'm sure your customers will think the same thing while they're waiting. 😉LOL Thankfully the queries being paged aren't quite as large as those you used to give my technique a beating. 😉
This is the right tool for me only because the paged result sets are fairly small, and the cost of the query that builds the resultset is what causes my customers to wait.
Your comments caused me to take another look at my procs, and the worst case scenario is elapsed time overhead in the tens of milliseconds. I, and my customers, can live with that.
I've submitted a minor revision to the article, suggesting that this approach is only beneficial in a narrow set of circumstances.
Heh... understood... but the problem is that other folks, who may not have such limited numbers of rows, are reading your good article and I want them to know that your method is rather slow in the face of scalability.
To prevent problems like this, I will always write code against a million row test table. It's a good habit to get into so that you're not called up at midnight on the last day of each month by your rather ticked of boss wondering why your code has, yet again, caused him to miss the SLA deadlines for month end runs.
Never justify performance challenged code with low row counts. It's an accident waiting to happen.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 23, 2009 at 4:42 am
The original article was very well written and it's sooo nice to statistics to back up the arguments presented.
However, having read the forums here I have to agree with Peso that overall time is perhaps more important than just I/O reads, certainly in terms of customer experience anyway!
I've tried Peso's method on several tables of real data with rows ranging from a few hundred to over 500,000 and in every single case Peso's method is quicker by a factor of at least 10.
Thanks for the post, it has made a massive difference to my paging search times.
April 23, 2009 at 5:00 am
Thank you for the feedback!
N 56°04'39.16"
E 12°55'05.25"
April 30, 2009 at 4:40 pm
Hello all,
After benefitting from the code examples and explanations above, I thought I'd make a contribution.
Below is a generic stored procedure that will retrieve the page of rows based on the parameters fed to it. It determines whether the table called for is in fact a table or a view. If it's a table, it uses the sys-tables to get the total row count (thanks to a post above this one!) and if it's a view, it uses the original strategy of calculating the inverse row # for a given row. I didn't compute the total rows because I thought whatever was calling it would only have to make the computation once rather than once-per-row if I had done it here.
Before the Efficiency Police hall me away, this was slapped together as a proof of concept and is not (in any way) presented as the best way of doing this. And there is a rule. Your ORDER-BY clause must include ASC where appropriate. This made inverting the order-by merely ugly, down from absolutely hideous.
[Code]
set ANSI_NULLS ON
set QUOTED_IDENTIFIER ON
go
ALTER PROCEDURE [dbo].[GetPage]
-- Add the parameters for the stored procedure here
(
@TableName varchar(1000)
,@Fields varchar(1000)
,@OrderBy varchar(1000)
,@StartPosition int
,@PageLen int
)
AS
Begin
DECLARE
@strSQL varchar(max)
,@RowCount int
,@RightNow DATETIME
,@TableType varchar(20)
,@RevOrderBy varchar(1000)
Set @RightNow=GETDATE()
Set @PageLen = @PageLen - 1
if not EXISTS (select * from INFORMATION_SCHEMA.tables where table_name = @TableName)
begin
Select 'Table/View not found' as Error
return (-1)
end
select @TableType = TABLE_TYPE from INFORMATION_SCHEMA.tables where table_name = @TableName
if @TableType = 'BASE TABLE'
begin
SELECT @RowCount=P.Rows
FROM sys.partitions P
INNER JOIN sys.indexes I ON (P.object_id = I.object_id) AND (P.index_id = I.index_id)
INNER JOIN sys.objects O ON (P.object_id = O.object_id)
WHERE (I.type IN(0,1)) AND (O.name = PARSENAME(@TableName, 1) AND
O.[schema_id] = IsNull(SCHEMA_ID(PARSENAME(@TableName, 2)), O.[schema_id]))
ORDER BY O.name
set @strSQL =' SELECT * ' +
', ' + convert(varchar(10),@RowCount) + ' as [RowCount] ' +
' FROM (SELECT ' + @Fields + ', ROW_NUMBER() OVER(ORDER BY ' + @OrderBy + ') AS RowNumber ' +
' FROM [' + @TableName + ']) as [' + @TableName + '_A] ' +
' WHERE RowNumber BETWEEN ' + convert(varchar(10),@StartPosition) +
' AND ' + convert(varchar(10),(@StartPosition + @PageLen)) + ' ORDER BY RowNumber'
end
else
begin
set @RevOrderBy = upper(@OrderBy) + ','
set @RevOrderBy = replace(replace(replace(@RevOrderBy ,' ASC,',' ~A,'),' DESC,',' ASC,'),' ~A',' DESC')
set @RevOrderBy = left(@RevOrderBy ,len(@RevOrderBy)-1)
set @strSQL =' SELECT *' +
' FROM (' +
'SELECT ' + @Fields + ', ROW_NUMBER() OVER(ORDER BY ' + @OrderBy + ') AS RowNumber, ' +
' ROW_NUMBER() OVER(ORDER BY ' + @RevOrderBy + ') AS RowNumberInverse ' +
' FROM [' + @TableName + ']) as [' + @TableName + '_A] ' +
' WHERE RowNumber BETWEEN ' + convert(varchar(10),@StartPosition) +
' AND ' + convert(varchar(10),(@StartPosition + @PageLen)) + ' ORDER BY RowNumber'
end
--SELECT SQL=@strSQL
EXEC(@strSQL)
End
[/code]
Enjoy!
June 18, 2009 at 9:23 am
werner.broser said
Nice method, but works only for unique fields!
-- I agree.
When the order-expression does not bear an unique value, "The Holy Grail" can make mistakes:
DECLARE @t1 TABLE ([x] INT, [y] INT)
INSERT INTO @t1 ([x], [y])
SELECT 1, 1
UNION ALL
SELECT 1, 2
SELECT *
FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY [x] ASC)
+ ROW_NUMBER() OVER (ORDER BY [x] DESC)
- 1 AS [Sum]
FROM @t1) t
The query result gives
x y Sum
1 1 1
1 2 3
which obviously cannot satisfy us.
For rows over which order-expression gives the same value, their relative order will not change regardless of "ASC" or "DESC" option, but only following the order how they are physiclely stored. The previous example proves this.
We should know this limit and do not apply "The Holy Grail" in the wrong place.
October 14, 2009 at 8:33 am
Thank you for this. I've used it now, but haven't studied reads and performance gains (or losses) yet. Realized though that the column(s) used for sorting has to be unique in order for the total row count to be accurate. If there are several equal records in the sorted column, the total row count will not be accurate. You have to make sure that the sorting is 100% opposite to the initial sorting. Just a simple addition to the solution after trying it out. Thank you again!
/ Peter.
October 14, 2009 at 8:37 am
Sorry for stating an already noticed "feature". Only read the earliest posts...
October 20, 2009 at 3:15 pm
Really nice solution for the total row count and paging problem; and you can't miss the analogy to the classical Gauss argument for the sum of the N first integers 1+2+..+N= (N+1)*N / 2
1 + 2 + ... N
N + (N-1) ... 1
🙂
February 11, 2010 at 5:33 am
Hi Friend,
Can you get me the other alternative not using trigger
Viewing 15 posts - 31 through 45 (of 64 total)
You must be logged in to reply to this topic. Login to reply