April 8, 2008 at 3:25 am
@ Jeff:
Just need RowNum on the middle select don't you (as opposed to *)?:
...
SELECT t.*
FROM dbo.Test t,
(
SELECT TOP (@PageSize) RowNum
FROM dbo.Test WITH (NOLOCK)
WHERE RowNum NOT IN (SELECT TOP (@Skip) RowNum
FROM dbo.Test
ORDER BY RowNum)
ORDER BY CustID
d
WHERE t.RowNum = d.RowNum
It's a shame that this is the best solution for paging in TSQL (and it is) - I wish there was a linear-performing algorithm. The larger @Skip gets the more the performance suffers...
Just curious - does anyone know if either Oracle or MySQL have a better solution?
Regards,
Jacob
April 8, 2008 at 6:30 am
Sure, you could sub RowNum for *... not sure it matters for performance here as it might in other places. * allows it to pick the best index it can here.
Yep... as the pages progress, it takes longer... still pretty happy with getting the last 100 line page in a 1.2 million row table in about 485 milliseconds. The only way I can think of to get a true linear reaction is to keep a copy of the table with a properly numbered column. Using the Running Count method in the following URL, it would only take 7 seconds or so...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
But, that doesn't solve the problem of which order you want to see things in. Could do that with several abbreviated sister tables... They could be created using SELECT/INTO and IDENTITY in a matter of seconds.
Still, it's not a real time update and unless you use "twin sister" tables and repoint either a view or a synonym, users will see the delay.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 8, 2008 at 9:19 am
Jeff Moden (4/8/2008)
Sure, you could sub RowNum for *... not sure it matters for performance here as it might in other places. * allows it to pick the best index it can here.Yep... as the pages progress, it takes longer... still pretty happy with getting the last 100 line page in a 1.2 million row table in about 485 milliseconds. The only way I can think of to get a true linear reaction is to keep a copy of the table with a properly numbered column. Using the Running Count method in the following URL, it would only take 7 seconds or so...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
But, that doesn't solve the problem of which order you want to see things in. Could do that with several abbreviated sister tables... They could be created using SELECT/INTO and IDENTITY in a matter of seconds.
Still, it's not a real time update and unless you use "twin sister" tables and repoint either a view or a synonym, users will see the delay.
Keep in mind that if you have a "properly numbered table" (meaning - no gaps) - your query becomes quite a bit easier:
SELECT t.*
FROM dbo.Test t,
(
SELECT TOP (@PageSize) RowNum
FROM dbo.Test WITH (NOLOCK)
WHERE RowNum >@Skip
ORDER BY CustID)
d
WHERE t.RowNum = d.RowNum
and THAT should be damn near linear....
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 8, 2008 at 7:59 pm
@ Jeff: Is this
...still pretty happy with getting the last 100 line page in a 1.2 million row table in about 485 milliseconds.
a real-world figure? I put together a quick-and-dirty test on one of our less capable boxes based on your example code. DDL:
CREATE TABLE [dbo].[Test](
[RowNum] [int] IDENTITY(1,1) NOT NULL,
[CustID] [int] NOT NULL,
[EmailAddress] [varchar](255) NOT NULL,
[FirstName] [varchar](50) NOT NULL,
[LastName] [varchar](50) NOT NULL,
CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED
(
[RowNum] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
go
CREATE NONCLUSTERED INDEX IX_Test_CustID ON dbo.Test
(
CustID
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
go
Test code:
set statistics time on
--===== Declare the local variables
DECLARE @PageSize INT --How many rows to appear per page
DECLARE @PageNum INT --What page number to appear
DECLARE @Skip INT --Working variable for where to start for page
--===== Set the local variables for pagesize and page
-- PageSize and PageNum would be parameters in a stored proc
SET @PageSize = 100
SET @PageNum = 40
SET @Skip = @PageSize*@PageNum
SELECT t.*
FROM dbo.Test t,
(
SELECT TOP (@PageSize) RowNum
FROM dbo.Test WITH (NOLOCK)
WHERE RowNum NOT IN (SELECT TOP (@Skip) RowNum
FROM dbo.Test
ORDER BY RowNum)
ORDER BY CustID
) d
WHERE t.RowNum = d.RowNum
set statistics time off
and populated it with just under 3 million sanitized rows from one of our prod DBs. Ran your test query against it a few times whilst varying @PageNum. Totally non-scientific results (no buffer flushes, manual runs eyeballing STATISTICS TIME results, 5 runs per test):
@PageSize = 100, @PageNum = 40: avg 1.2 sec
@PageSize = 100, @PageNum = 400: avg 41.4 sec
@PageSize = 100, @PageNum = 4000: 6123 sec (I only ran this one once 😛 )
Granted this is an underspecced server with 2GB RAM, but still...
Regards,
Jacob
April 8, 2008 at 8:09 pm
@ Matt:
and THAT should be damn near linear....
Yep - ran the same test as above in my reply to Jeff, same data, same server except using your test code. Avg execution time even at @PageNum = 8000 was < 45 ms. Certainly linear and at such small response times may as well be constant time.
Depends on the actual task at hand but sometimes it might be better to materialise the inner select query with a contiguous RowNum sequence as Jeff suggests (especially with his quick Running Count method) and use your approach to page through it. I guess it depends on the expected ratio of initial searches to paging requests.
Regards,
Jacob
April 8, 2008 at 8:57 pm
Wow... lemme say that again... WOW! SQL Server 2005 sucks for this! Yeah... it was a real world example.... In the real world, we have SQL Server 2000 and we had to use dynamic SQL because there is no programmable TOP in 2k (as all of you know). Our server is the Enterprise Edition running on 16 3.2 GHz CPU's and enough memory to choke a horse with some awesome harddisks.
I ran the TOP() code on my (humble) desktop machine at the 4000th 100 line page and never did let it finish. I run dynamic sql using just plain ol' EXEC and concatenating the values for page size, etc in, and BLAM!!! It runs in only a half a second on my desktop... talk about parameter sniffing... 9000th page takes 1.2 seconds. Of course, my desktop box is no match for the server at work.
Also, I made a couple of mistakes modifying the code for the forum... I left CustID in the ORDER BY instead of using RowNum... And, I used SELECT * instead of RowNum.
Here's the code I just used...
[font="Courier New"]
set statistics time on
--===== Declare the local variables
DECLARE @PageSize INT --How many rows to appear per page
DECLARE @PageNum INT --What page number to appear
DECLARE @Skip INT --Working variable for where to start for page
DECLARE @SQL VARCHAR(8000)
--===== Set the local variables for pagesize and page
-- PageSize and PageNum would be parameters in a stored proc
SET @PageSize = 100
SET @PageNum = 9000
SET @Skip = @PageSize*@PageNum
--SELECT t.RowNum
-- FROM dbo.Test t,
-- (
-- SELECT TOP (@PageSize) RowNum
-- FROM dbo.Test WITH (NOLOCK)
-- WHERE RowNum NOT IN (SELECT TOP (@Skip) RowNum
-- FROM dbo.Test
-- ORDER BY RowNum)
-- ORDER BY RowNum
-- ) d
-- WHERE t.RowNum = d.RowNum
SET @SQL = '
SELECT t.RowNum
FROM dbo.Test t,
(
SELECT TOP '+STR(@PageSize)+' RowNum
FROM dbo.Test WITH (NOLOCK)
WHERE RowNum NOT IN (SELECT TOP '+STR(@Skip)+' RowNum
FROM dbo.Test
ORDER BY RowNum)
ORDER BY RowNum
) d
WHERE t.RowNum = d.RowNum'
EXEC (@SQL)
set statistics time off[/font]
And, get this... on my box, running the same dynamic code on 2k instead of 2k5 only took about half the time... what did they do to the bloody optimizer in 2k5? And, yes, I'm running sp2 on 2k5 although I don't have anything beyond that (was holding out for sp3).
--Jeff Moden
Change is inevitable... Change for the better is not.
April 8, 2008 at 9:29 pm
I can verify your results: changing to the dynamic SQL version runs at 1.2 sec at @PageNum = 8000 on my test data/server.
Now to speculate on why...
I can't compare the query plans directly atm as the server is now busy, but looking at the dynamic version it's similar to the other plan - with one major difference: both want to do a Left Anti Semi Join, but the normal version uses a Nested Loops operator for this whereas the dynamic version uses a Merge Join.
I wonder if it's the variable-based TOP implementation?
Regards,
Jacob
April 8, 2008 at 9:51 pm
Jacob Luebbers (4/8/2008)
I can verify your results: changing to the dynamic SQL version runs at 1.2 sec at @PageNum = 8000 on my test data/server.Now to speculate on why...
I can't compare the query plans directly atm as the server is now busy, but looking at the dynamic version it's similar to the other plan - with one major difference: both want to do a Left Anti Semi Join, but the normal version uses a Nested Loops operator for this whereas the dynamic version uses a Merge Join.
I wonder if it's the variable-based TOP implementation?
Regards,
Jacob
Jacob - you hit it on the nose. Forcing the join hint makes the difference.
I was getting there from the other side, in that the variable top(@variable) in the subquery was throwing things off, horribly.
interestingly enough - with the hint in place - 2005 actually beats the dynamic SQL. Not much consolation, I know...
set statistics time on
--===== Declare the local variables
DECLARE @PageSize INT --How many rows to appear per page
DECLARE @PageNum INT --What page number to appear
DECLARE @Skip INT --Working variable for where to start for page
DECLARE @sql VARCHAR(8000)
--===== Set the local variables for pagesize and page
-- PageSize and PageNum would be parameters in a stored proc
SET @PageSize = 100
SET @PageNum = 9000
SET @Skip = 900000
SELECT t.RowNum
FROM dbo.Test t,
(
SELECT TOP (@PageSize) test.RowNum
FROM dbo.Test WITH (NOLOCK)
left outer merge join (SELECT TOP(@skip) RowNum
FROM dbo.Test WITH (NOLOCK)
ORDER BY RowNum) r
on test.rownum=r.rownum
where r.rownum is null
ORDER BY test.RowNum
) d
where t.RowNum = d.RowNum
set statistics time off
print replicate('=',50)
set statistics time on
SET @sql = '
SELECT t.RowNum
FROM dbo.Test t,
(
SELECT TOP '+STR(@PageSize)+' RowNum
FROM dbo.Test WITH (NOLOCK)
WHERE RowNum NOT IN (SELECT TOP '+STR(@Skip)+' RowNum
FROM dbo.Test
ORDER BY RowNum)
ORDER BY RowNum
) d
WHERE t.RowNum = d.RowNum'
EXEC (@SQL)
set statistics time off
2005 solution: 1063 ms
2000 solution: 670+670=1340 ms
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 8, 2008 at 10:01 pm
Ouch! Yep, that does the trick. I wonder if there will ever be a situation where the merge join would be suboptimal. Thinking about it, probably only if there were a small number of rows in the table - probably safe to leave it on always if not using the dynamic version.
Regards,
Jacob
April 8, 2008 at 10:15 pm
one of these days - I'm going to figure out how to colorize my code without turning it into a wreck........hehe
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 9, 2008 at 12:15 am
Matt Miller (4/8/2008)
Jacob - you hit it on the nose. Forcing the join hint makes the difference.
Crud... SQL Server is getting more like Oracle every day 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
April 9, 2008 at 12:17 am
Matt Miller (4/8/2008)
one of these days - I'm going to figure out how to colorize my code without turning it into a wreck........hehe
:hehe: How'z 'bout taday 😛
http://extras.sqlservercentral.com/prettifier/prettifier.aspx
--Jeff Moden
Change is inevitable... Change for the better is not.
April 9, 2008 at 7:16 am
Jeff Moden (4/9/2008)
Matt Miller (4/8/2008)
one of these days - I'm going to figure out how to colorize my code without turning it into a wreck........hehe:hehe: How'z 'bout taday 😛
http://extras.sqlservercentral.com/prettifier/prettifier.aspx
[font="Courier New"]SET STATISTICS time ON
--===== Declare the local variables
DECLARE @PageSize INT --How many rows to appear per page
DECLARE @PageNum INT --What page number to appear
DECLARE @Skip INT --Working variable for where to start for page
DECLARE @SQL VARCHAR(8000)
--===== Set the local variables for pagesize and page
-- PageSize and PageNum would be parameters in a stored proc
SET @PageSize = 100
SET @PageNum = 9000
SET @Skip = 900000
SELECT t.RowNum
FROM dbo.Test t,
(
SELECT TOP (@PageSize) test.RowNum
FROM dbo.Test WITH (NOLOCK)
LEFT OUTER merge JOIN (SELECT TOP(@skip) RowNum
FROM dbo.Test WITH (NOLOCK)
ORDER BY RowNum) r
ON test.rownum=r.rownum
WHERE r.rownum IS NULL
ORDER BY test.RowNum
) d
WHERE t.RowNum = d.RowNum
SET STATISTICS time OFF
PRINT REPLICATE('=',50)
SET STATISTICS time ON
SET @SQL = '
SELECT t.RowNum
FROM dbo.Test t,
(
SELECT TOP '+STR(@PageSize)+' RowNum
FROM dbo.Test WITH (NOLOCK)
WHERE RowNum NOT IN (SELECT TOP '+STR(@Skip)+' RowNum
FROM dbo.Test
ORDER BY RowNum)
ORDER BY RowNum
) d
WHERE t.RowNum = d.RowNum'
EXEC (@SQL)
SET STATISTICS time OFF
[/font]
Cool! The last time I tried it - I got nothing but, well, markup, and the code available couldn't be copied, etc... It was either ugly when posted or ugly when copied out of the post. Both seem to be working now though!
Must be those fancy changes Steve made to the rendering of code. Thanks!
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 9, 2008 at 11:26 am
Jeff Moden (4/9/2008)
Matt Miller (4/8/2008)
Jacob - you hit it on the nose. Forcing the join hint makes the difference.
Crud... SQL Server is getting more like Oracle every day 😛
You mean the optimizer gets dumber and dumber? grumble grumble grumble.....
For what it's worth - simply throwing
OPTION (RECOMPILE)
on the end of your version of the 2005 query would also have the same effect (speeding it up and surpassing the 2000 version). It just seems like another place parameter sniffing rears its ugly little head.....
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 9, 2008 at 7:44 pm
Dumber and dumber is right... they should have left it alone... 2k had a fine optimizer... well, the idiocy started at SP4 which is why I still run sp3a on my desktop.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 30 total)
You must be logged in to reply to this topic. Login to reply