March 11, 2009 at 8:46 am
Thanks for the article. I'm working on a Query that returns a list of users, but I've never been able to do it with the ROW_NUMBER() because of the orderby. Here's what i have that works - SQL 2000 way. Is there any better way of writing this?
ALTER PROCEDURE [dbo].[procSelectUsers] (
@p_vchLetterFilter VARCHAR(1),@p_vchOrderBy VARCHAR(20), @p_vchSortDirection VARCHAR(20), @p_vchQuickSearch VARCHAR(50), @p_intPage int, @p_intPageSize int
)
AS
BEGIN
DECLARE @TempTable TABLE(RowID INT IDENTITY, UserID int)
INSERT INTO @TempTable(UserID)
SELECT intUserID
FROM tblUsers U, tblSecurityRoles SR, tblWorkgroups W
WHERE U.sysDateDeleted = 0 AND SR.intSecurityRoleID = U.intSecurityRoleID
AND W.intWorkgroupID = U.intWorkgroupID
AND vchLastName LIKE @p_vchLetterFilter + '%'
AND vchLastName LIKE '%' + @p_vchQuickSearch + '%'
ORDER BY
CASE
WHEN @p_vchOrderBy = 'last' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchLastName ASC, vchFirstName ASC))
WHEN @p_vchOrderBy = 'last' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchLastName DESC, vchFirstName DESC))
WHEN @p_vchOrderBy = 'first' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchFirstName ASC, vchLastName ASC))
WHEN @p_vchOrderBy = 'first' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchFirstName DESC, vchLastName DESC))
WHEN @p_vchOrderBy = 'username' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchUserName DESC))
WHEN @p_vchOrderBy = 'username' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchUserName ASC))
WHEN @p_vchOrderBy = 'security' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchRoleTitle DESC, vchLastName DESC, vchFirstName DESC))
WHEN @p_vchOrderBy = 'security' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchRoleTitle ASC, vchLastName ASC, vchFirstName ASC))
WHEN @p_vchOrderBy = 'workgroup' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchDescription DESC, vchLastName DESC, vchFirstName DESC))
WHEN @p_vchOrderBy = 'workgroup' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchDescription ASC, vchLastName ASC, vchFirstName ASC))
END
SELECTUserID, vchFirstName, vchLastName,vchUserName, vchPassword,
vchEmail, IsNull(U.intSecurityRoleID,0) as intSecurityRoleID,
IsNull(U.intWorkgroupID,0) as intWorkgroupID, vchInitials, vchBadgeNumber,
SR.vchRoleTitle AS vchSecurityRole, W.vchDescription AS vchWorkgroupDesc
FROM @TempTable T, tblUsers U, tblSecurityRoles SR, tblWorkgroups W
WHERE U.sysDateDeleted = 0 AND SR.intSecurityRoleID = U.intSecurityRoleID
AND T.UserID = U.intUserID
AND W.intWorkgroupID = U.intWorkgroupID
AND vchLastName LIKE @p_vchLetterFilter + '%'
AND vchLastName LIKE '%' + @p_vchQuickSearch + '%'
AND t.RowID BETWEEN ((@p_intPage - 1) * @p_intPageSize + 1) AND (@p_intPage * @p_intPageSize)
END /* CREATE PROCEDURE procSelectAllUsers */
This is what I had before, but look at how it printed out the results - so it wouldn't work the way I needed:
ALTER PROCEDURE [dbo].[procSelectUsers] (
@p_vchLetterFilter VARCHAR(1),@p_vchOrderBy VARCHAR(20), @p_vchSortDirection VARCHAR(20), @p_vchQuickSearch VARCHAR(50), @p_intPage int, @p_intPageSize int
)
AS
BEGIN
DECLARE @startRowIndex INT
SET @startRowIndex = (@p_intPage * @p_intPageSize) + 1;
WITH Users as (
SELECT ROW_NUMBER() OVER (ORDER BY intUserID ASC) as Row,
intUserID, vchFirstName, vchLastName,vchUserName, vchPassword,
vchEmail, IsNull(U.intSecurityRoleID,0) as intSecurityRoleID,
IsNull(U.intWorkgroupID,0) as intWorkgroupID, vchInitials, vchBadgeNumber,
SR.vchRoleTitle AS vchSecurityRole, W.vchDescription AS vchWorkgroupDesc
FROM tblUsers U, tblSecurityRoles SR, tblWorkgroups W
WHERE U.sysDateDeleted = 0 AND SR.intSecurityRoleID = U.intSecurityRoleID
AND W.intWorkgroupID = U.intWorkgroupID
AND vchLastName LIKE @p_vchLetterFilter + '%'
AND vchLastName LIKE '%' + @p_vchQuickSearch + '%'
)
SELECT
intUserID, vchFirstName, vchLastName,vchUserName, vchPassword,
vchEmail, intSecurityRoleID, intWorkgroupID, vchInitials, vchBadgeNumber,
vchSecurityRole, vchWorkgroupDesc
FROM Users
WHERE Row BETWEEN @startRowIndex AND @startRowIndex + @p_intPageSize - 1
ORDER BY
CASE
WHEN @p_vchOrderBy = 'last' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchLastName ASC, vchFirstName ASC))
WHEN @p_vchOrderBy = 'last' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchLastName DESC, vchFirstName DESC))
WHEN @p_vchOrderBy = 'first' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchFirstName ASC, vchLastName ASC))
WHEN @p_vchOrderBy = 'first' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchFirstName DESC, vchLastName DESC))
WHEN @p_vchOrderBy = 'username' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchUserName DESC))
WHEN @p_vchOrderBy = 'username' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchUserName ASC))
WHEN @p_vchOrderBy = 'security' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchSecurityRole DESC, vchLastName DESC, vchFirstName DESC))
WHEN @p_vchOrderBy = 'security' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchSecurityRole ASC, vchLastName ASC, vchFirstName ASC))
WHEN @p_vchOrderBy = 'workgroup' AND @p_vchSortDirection = 'ASC' THEN (RANK() OVER (ORDER BY vchWorkgroupDesc DESC, vchLastName DESC, vchFirstName DESC))
WHEN @p_vchOrderBy = 'workgroup' AND @p_vchSortDirection = 'DESC' THEN (RANK() OVER (ORDER BY vchWorkgroupDesc ASC, vchLastName ASC, vchFirstName ASC))
END
END /* CREATE PROCEDURE procSelectAllUsers */
EXEC [procSelectUsers] '', '', 'ASC', '', 0, 6
A1
A2
B1
B2
C1
C2
EXEC [procSelectUsers] '', '', 'ASC', '', 1, 6
A3
A4
B3
B4
C3
C4
Results show an example of the output - not the actual output
March 11, 2009 at 8:53 am
I just read today that ADO.Net Data Services 1.5 CTP (coming out in a few months) will have built-in support for server-side data paging.
http://blogs.msdn.com/astoriateam/archive/2009/03/01/announcing-ado-net-data-services-v1-5-ctp1.aspx
March 11, 2009 at 11:08 am
I agree with Peso, tried the holy grail method and it took my procedure from 7 (using the count(*) method) to 12 seconds to execute. Those 5 extra seconds would unfortunately leave unhappy customer, nice idea though!
March 11, 2009 at 11:57 am
hhcosmin, I agree with what you say the caching policy needs to be defined and suitable. It can be dynamic these days and linked to the data but a lot of the time for what I do I can cache a set of ID’s for the duration so it becomes very cheap.
One thing that frustrates me about some sites is where the data does change and paging just becomes messy and isn’t always the best way to present it. For example page 1 becomes page 2 because the dataset has changed and so when I click next page I get half of what I was just looking at on page 1 and the first half of what was page 2 etc. Or you go back page and what was page 2 isn’t page 2 anymore.
I’m not quite sure I understand how the last page becomes very expensive using this approach as once the initial query has been performed anything else is PK based. Unless you are talking about having to read from the beginning of say 1000000 PK’s to the end as appose to the current page? Which is true, that would be a nightmare scenario!
So what you say about very large datasets is true and I would typically limit the results set to 1000 or at the extreme 5000 records using the SQL command. If the user wants to go beyond this it does make it complicated but I can’t remember the last time I went beyond page three of any search.
Typically in my experience if a user gets more than a handful of pages they are likely to want to refine their search or maybe sort it to get the records they want in the top pages. Enticing a user down this road is pretty easy….. “you got over 1000 records … try refining your search below!” then give them a bunch of useful refinements.
Obviously every solution has its pros and cons and I wouldn’t present any one as the holy grail as every application has its own requirements.
March 11, 2009 at 12:45 pm
Another alternative that I currently have (I haven't measured/compared it) is to use a COMPUTE clause (if the client can cope with multiple result sets).
select a, b, c
from tableX
where a = 1
compute count(a)
The best part is that it works in .NET 1.1 and SQL 2000 (most of the systems I support) 🙂
Re-checking: the compute that I used was for obtaining the total of the result set (that was paginated). For pagination it's not so interesting because the number of rows retrieved is controlled by the client (the count should be for the whole table/view).
Select top 200 a, b, c
from tableX
where a = 1
compute sum(c)
March 11, 2009 at 2:27 pm
These are all well and good, but I rarely need such results in a T-SQL window. Where I need them is in an application where I want to paginate using ASP.net etc.
In those cases, the best I can come up with is two calls. To be clear - I use dynamic sql to build all queries in my list windows. There I need the total count and the page of data required.
March 11, 2009 at 3:16 pm
jcraddock (3/11/2009)
These are all well and good, but I rarely need such results in a T-SQL window. Where I need them is in an application where I want to paginate using ASP.net etc.In those cases, the best I can come up with is two calls. To be clear - I use dynamic sql to build all queries in my list windows. There I need the total count and the page of data required.
This is exactly the scenario I was imagining when putting this solution together, with the idea being that you can return both the total count and page n in a single call/read.
There has been some very interesting comments and ideas in this thread. As far as this idea goes, it works well in some cases, and not so well in others (as highlighted by other posters).
I built a paging proc based on this principle and, after extensive testing, found it to be the best approach. I saw only a few ms of cpu & elapsed time overhead. I would be very interested to know the size of some of the datasets peso and others are using. This would be very useful in determining the best solution for a given problem.
Thank you, everyone, for you comments and insight!
SQL guy and Houston Magician
March 11, 2009 at 3:29 pm
Actually, rereading the thread, I see that Peso did post the size of his dataset; 2,000,000 rows.
It would be interesting to determine a relative cut off point where the row_number trick is the most efficient and where the two-bite approach becomes best.
For my purposes, the biggest cost was the query itself. The underlying tables hold hundreds of millions of records, but the data is pretty well filtered down by the time we get to the record set to page. To avoid running that query twice was a big win.
I deliberately avoided including time statistics in my comparisons because they can be very subjective (unless they demonstrate a clear performance difference as a few posters noted).
SQL guy and Houston Magician
March 11, 2009 at 9:23 pm
Um, how are you going to declare the variables and such inside one statement from an ASP.Net page/class/etc? This looks like you have to use a proc, and procs are out with the queries I generate.
March 11, 2009 at 10:32 pm
jcraddock (3/11/2009)
Um, how are you going to declare the variables and such inside one statement from an ASP.Net page/class/etc? This looks like you have to use a proc, and procs are out with the queries I generate.
I'm not sure what you're saying here. We use stored procedures that generate dynamic queries to retrieve datasets similar to this all the time. Our datalayer takes a datarow containing the values for whatever parameters are necessary for the procedures and iterates through that to pass the necessary parameters into our stored proc, then the stored proc makes sure any sql injection chars etc are stripped, generates the dynamic select query with paging etc. based on whatever values have been passed into it and returns a dataset with the requested page of data and a full count of the records. Why would procs be out in what you're doing?
March 12, 2009 at 12:18 am
First of all... very well written article. Nice to see examples like that with some performance stats. Unfortunately, the most important performance for the user (duration) wasn't measured and the test data was kinda shy on row count.
It turns out that the "2 Bite" method is very fast provided that, as someone else already pointed out, that you use the system tables to give you a leg up. And, if you have a clustered index on the table to keep it from being just a heap, the first "Bite" requires no joins... that means you can have the best of both worlds... performance and fewer reads.
Just to give everyone some common data to test on, here's a million rows of data to play with... only takes about 41 seconds including the index builds...
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Jeff Moden
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
--===== Create and index for the lookups we expect
CREATE INDEX IX_JBMTest_SomeInt_SomeLetters2
ON dbo.JBMTest (SomeInt,SomeLetters2)
... and here's some code that test two newer methods, the original "Holy Grail", and Peso's method. If someone else wants to test another method, please add it to this code and run the whole thing because machine speeds vary and it would be nice to have it all together to compare to...
--===== Define the starting row and page size
DECLARE @StartRow INT ; SET @StartRow = 900000
DECLARE @PageSize INT ; SET @PageSize = 50
PRINT '--============================================================================='
PRINT '-- The "Holy Grail" method'
PRINT '--============================================================================='
--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON
--===== The "Holy Grail" method of getting a page of info
;WITH
cteCols AS
(
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
ROW_NUMBER() OVER(ORDER BY SomeInt DESC, SomeLetters2 DESC) AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, TotRows + Seq - 1 AS TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
ORDER BY Seq
--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT '--============================================================================='
PRINT '-- The "No RBAR/No Join" method'
PRINT '--============================================================================='
--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON
--===== The "No RBAR/No Join" method
;WITH
cteCols AS
(
SELECT NULL AS SomeInt, NULL AS SomeLetters2, 0 AS Seq, Rows AS TotRows
FROM sys.Partitions
WHERE Object_ID = OBJECT_ID('dbo.JBMTest')
AND Index_ID = 1
UNION ALL --------------------------------------------------------------------
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
NULL AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
OR Seq = 0
ORDER BY Seq
--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT '--============================================================================='
PRINT '-- A different No Join method'
PRINT '--============================================================================='
--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON
--===== A different No Join method
;WITH
cteCols AS
(
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
NULL AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, (SELECT Rows
FROM sys.Partitions
WHERE Object_ID = OBJECT_ID('dbo.JBMTest')
AND Index_ID = 1) AS TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
OR Seq = 0
ORDER BY Seq
--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT '--============================================================================='
PRINT '-- Peso''s Embedded "2 Bite" method'
PRINT '--============================================================================='
--===== Turn on the timers
SET STATISTICS IO ON
SET STATISTICS TIME ON
--===== Embedded "2 Bite" method
;WITH
cteCols AS
(
SELECT SomeInt, SomeLetters2,
ROW_NUMBER() OVER(ORDER BY SomeInt, SomeLetters2) AS Seq,
NULL AS TotRows
FROM dbo.JBMTest
)
SELECT Seq, SomeInt, SomeLetters2, (SELECT COUNT(*) FROM dbo.JBMTest) AS TotRows
FROM cteCols
WHERE Seq BETWEEN @StartRow AND @StartRow + @PageSize - 1
OR Seq = 0
ORDER BY Seq
--===== Turn off the timers
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
Here's how that played out on my humble single 1.8Ghz/1G Ram desktop running 2k5 sp2....
--=============================================================================
-- The "Holy Grail" method
--=============================================================================
(50 row(s) affected)
Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 7594 ms, elapsed time = 9261 ms.
--=============================================================================
-- The "No RBAR/No Join" method
--=============================================================================
(51 row(s) affected)
Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysrowsets'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1265 ms, elapsed time = 1314 ms.
--=============================================================================
-- A different No Join method
--=============================================================================
(50 row(s) affected)
Table 'sysrowsets'. Scan count 50, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'JBMTest'. Scan count 1, logical reads 1985, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1157 ms, elapsed time = 1383 ms.
--=============================================================================
-- Peso's Embedded "2 Bite" method
--=============================================================================
(50 row(s) affected)
Table 'JBMTest'. Scan count 2, logical reads 3970, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1406 ms, elapsed time = 1466 ms.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 12, 2009 at 2:44 am
Nice article. But I miss a solution for handling a custom sorting when using a paging functionality.
In practice (lets say a ASP.NET application with a GridView component) you need to sort dynamically by another column than the ID. So you have to take care of this before you do the paging.
Look here: http://aspnet.4guysfromrolla.com/articles/032206-1.aspx#
If you combine the custom sort capability together with the discussed Total Row retrieval you can get most out of this...
Cheers
March 12, 2009 at 4:41 am
I'm afraid differences in resources are partly induced by Sql Server caching (intermediate) results between the first and latter queries.
If you use DBCC DROPCLEANBUFFERS between the queries, caches are emptied and differences in time and physical reads may better reflect actual use of resources.
March 12, 2009 at 8:44 am
jackie.williams (3/11/2009)
jcraddock (3/11/2009)
Um, how are you going to declare the variables and such inside one statement from an ASP.Net page/class/etc? This looks like you have to use a proc, and procs are out with the queries I generate.I'm not sure what you're saying here. We use stored procedures that generate dynamic queries to retrieve datasets similar to this all the time. Our datalayer takes a datarow containing the values for whatever parameters are necessary for the procedures and iterates through that to pass the necessary parameters into our stored proc, then the stored proc makes sure any sql injection chars etc are stripped, generates the dynamic select query with paging etc. based on whatever values have been passed into it and returns a dataset with the requested page of data and a full count of the records. Why would procs be out in what you're doing?
Well, all my applications are actually one application. All table information, validation rules, permissions, roles, etc, are stored in metadata. I generate the queries for list windows dynamically from the ASP.NET page with full filtering and sorting - one list page works for every single table in all systems, similarly one editpage works for every table in every system...to do what you propose, I would have to move my OBJECT class into Sql, as it contains all the metadata. I have no interest in running .NET from my SQL Server for several reasons - number one being total rewrite of the architecture. I was just hoping there was a way to do it directly from the ASP code instead of relying on a procedure. At the top of this window you can see the record count and then the page is displayed. I was hoping for a way (without rearchitecting my class onto SQL Server and taking the performance hit for turning that on) to get both numbers at once.
Here is a Screenshot:
March 12, 2009 at 9:57 am
Thomas Bühler (3/12/2009)
Nice article. But I miss a solution for handling a custom sorting when using a paging functionality.
Yeah, I deliberately left that bit out because it had been covered by so many other articles. The 'real world' problem I chose this approach for does implement custom sorting by using conditional logic in the OVER(ORDER BY).
SQL guy and Houston Magician
Viewing 15 posts - 16 through 30 (of 64 total)
You must be logged in to reply to this topic. Login to reply