Optimising Server-Side Paging - Part I

Question

Post reply

Optimising Server-Side Paging - Part I

Paul White

SSC Guru

Points: 150468
More actions
April 24, 2010 at 11:29 am

#386780

Comments posted to this topic are about the item Optimising Server-Side Paging - Part I
Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Viewing 15 posts - 1 through 15 (of 55 total)

You must be logged in to reply to this topic. Login to reply

gitendrashah.net SSC Rookie Points: 41 More actions · Answer 1

Nice post ......but i have been using this paging before also....so i didn't get any difference......

Aleksandar Cebov SSC Rookie Points: 41 More actions · Answer 2

Hmmm... it seems to me that this will work only if where statement is not being used. What if we need to do server side paging and sorting and apply filter on which rows should be returned (maybe in the example I need to get only those records where the thread_id=some_value and sort them by create_dt)?

desade Valued Member Points: 59 More actions · Answer 3

any idea why key seek method uses much more CPU for 10 pages then it uses for 100 or 200? It looks as some glitch in testing, but maybe there is logical explanation

Paul White SSC Guru Points: 150468 More actions · Answer 4

Aleksandar Cebov (4/26/2010)
Hmmm... it seems to me that this will work only if where statement is not being used.

It works whenever there is a useful index to obtain the keys, in the required order.

What if we need to do server side paging and sorting and apply filter on which rows should be returned (maybe in the example I need to get only those records where the thread_id=some_value and sort them by create_dt)?

The appropriate Key-Seek index in that case would be on (thread_id, create_dt). There is a full reproduction script included with the article. I encourage you to download it, and experiment for yourself.

Custom sorting and filtering is covered in more depth in part III - this first part is just about establishing the basic method.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 5

desade (4/26/2010)
any idea why key seek method uses much more CPU for 10 pages then it uses for 100 or 200? It looks as some glitch in testing, but maybe there is logical explanation

There's no special reason that I am aware of - the test results are shown exactly as they appeared. I just put it down to the small numbers involved, the limited timing resolution available, and random chance...

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 6

gitendrashah.net (4/26/2010)
Nice post ......but i have been using this paging before also....so i didn't get any difference......

Well hopefully you found something interesting or useful - if not, perhaps you will in parts 2 and 3.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 7

Aleksandar Cebov (4/26/2010)
What if we need to do server side paging and sorting and apply filter on which rows should be returned (maybe in the example I need to get only those records where the thread_id=some_value and sort them by create_dt)?

Ok, here's some code to demonstrate what I meant in my previous reply:

-- Key Seek index

CREATE INDEX [IX dbo.Post thread_id, create_dt]

ON dbo.Post (thread_id, create_dt);

DECLARE @PageNumber BIGINT, -- Page number to fetch

@PageSize BIGINT; -- Rows per page

SET @PageSize = 50;

SET @PageNumber = 10;

-- The thread_id to filter on

DECLARE @ThreadID INTEGER;

SET @ThreadID = 6;

-- Key-Seek algorithm

WITH Keys

AS (

-- Step 1 : Number the rows from the non-clustered index

-- Maximum number of rows = @PageNumber * @PageSize

SELECT TOP (@PageNumber * @PageSize)

rn = ROW_NUMBER() OVER (ORDER BY P1.create_dt ASC),

P1.post_id,

P1.create_dt

FROM dbo.Post P1

WHERE P1.thread_id = @ThreadID

ORDER BY

P1.create_dt ASC

),

SelectedKeys

AS (

-- Step 2 : Get the primary keys for the rows on the page we want

-- Maximum number of rows from this stage = @PageSize

SELECT TOP (@PageSize)

SK.rn,

SK.post_id,

SK.create_dt

FROM Keys SK

WHERE SK.rn > ((@PageNumber - 1) * @PageSize)

ORDER BY

SK.create_dt ASC

)

SELECT -- Step 3 : Retrieve the off-index data

-- We will only have @PageSize rows by this stage

SK.rn,

P2.post_id,

P2.thread_id,

P2.member_id,

P2.create_dt,

P2.title,

P2.body

FROM SelectedKeys SK

JOIN dbo.Post P2

ON P2.post_id = SK.post_id

ORDER BY

SK.create_dt ASC

OPTION (RECOMPILE);

Actual execution plan:

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 8

That query plan highlights something I took out of the article at the last minute - to keep things simple.

You might wonder why SQL Server chooses a non-clustered index seek and key lookup, rather than a clustered index seek in the last step.

The answer is again down to the width of the clustered index: for a relatively small number of rows, the cost of the seek + lookup is less than scanning even a very small range on the cluster.

For larger page sizes, SQL Server may choose a partial scan of the cluster, due to the mounting cost of the random I/O associated with the lookups.

A final point: notice that the final loop joins to perform the seek and lookup both have the WithOrderedPrefetch: True and Optimized: False attributes. This is a read-ahead optimization, looking ahead in the index and issuing asynchronous I/O for rows that will be needed for the joins. More details:

http://blogs.msdn.com/craigfr/archive/2008/10/07/random-prefetching.aspx

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

peter-757102 SSCertifiable Points: 6877 More actions · Answer 9

I scanned the article quicly and it is well presented.

But I do have to dive into this particular one as I feel something odd is happening. The optimized statement is still just one statement and the steps in the with clause are no more then inline views. I am interested in why this construct manages to forces a way better execution plan.

No time no to go deep into this, but I will 😉

Thx for the article in advance!

nabidavid SSC Eights! Points: 815 More actions · Answer 10

Great article, Paul. I look forward to Part II. 😀

If you are not already covering it in a future article, I am curious if you have any advice on the best way (as in best performance) to incorporate the total record count for calculating total pages in this method, especially when custom filtering is involved.

Of course, if you are already planning to cover this topic, I am happy to patiently wait until then. 😉

Thanks!

Paul White SSC Guru Points: 150468 More actions · Answer 11

nabidavid (4/26/2010)
If you are not already covering it in a future article, I am curious if you have any advice on the best way (as in best performance) to incorporate the total record count for calculating total pages in this method, especially when custom filtering is involved.

I do cover various methods for obtaining the total record count in part II, and part III is concerned with custom filtering and ordering. Of course it is never possible to cover every possible need - but I hope you will find it useful.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 12

peter-757102 (4/26/2010)
I scanned the article quicly and it is well presented.

Thanks very much.

But I do have to dive into this particular one as I feel something odd is happening. The optimized statement is still just one statement and the steps in the with clause are no more then inline views. I am interested in why this construct manages to forces a way better execution plan.

You are right to suspect that there is rather more to it than that 😉

I hope you find time for a proper read of it very soon. Please also take a look at the attached scripts if you can.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

sknox SSChampion Points: 12388 More actions · Answer 13

Good article. One thing I noticed is that you're using a contiguous identity for your tests, but relying on ROW_NUMBER() for the paging.

I'm assuming that ROW_NUMBER() is used because you're planning for missing identity values due to deleted rows? Given that, I'd like to see the tests run with some random rows deleted, to better reflect a real-world scenario.

Paul White SSC Guru Points: 150468 More actions · Answer 14

sknox (4/26/2010)
Good article. One thing I noticed is that you're using a contiguous identity for your tests, but relying on ROW_NUMBER() for the paging. I'm assuming that ROW_NUMBER() is used because you're planning for missing identity values due to deleted rows?

The cluster on the identity column is primarily for architectural reasons - as you know, you can never depend on IDENTITY columns to be contiguous, regardless of whether rows are ever deleted.

The ROW_NUMBER is just used to assign a sequential number in selected key order, and to make the selection of keys for the page we want easy.

The only reason I order by the underlying key (post_id) rather than the ROW_NUMBER (rn) is purely a limitation of the optimiser. We both know that the order of ROW_NUMBER assignment matches the ORDER BY of the ranking function, but the current optimiser does not include that logic. Sorting on the key avoids a small extra sort - so it's all rather technical, but then so are many things with SQL Server 🙂

Given that, I'd like to see the tests run with some random rows deleted, to better reflect a real-world scenario.

The full test rig is available for download in the Resources section. 😉

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi