April 24, 2010 at 11:29 am
Comments posted to this topic are about the item Optimising Server-Side Paging - Part I
April 26, 2010 at 1:17 am
Nice post ......but i have been using this paging before also....so i didn't get any difference......
April 26, 2010 at 2:28 am
Hmmm... it seems to me that this will work only if where statement is not being used. What if we need to do server side paging and sorting and apply filter on which rows should be returned (maybe in the example I need to get only those records where the thread_id=some_value and sort them by create_dt)?
April 26, 2010 at 2:37 am
any idea why key seek method uses much more CPU for 10 pages then it uses for 100 or 200? It looks as some glitch in testing, but maybe there is logical explanation
April 26, 2010 at 2:49 am
Aleksandar Cebov (4/26/2010)
Hmmm... it seems to me that this will work only if where statement is not being used.
It works whenever there is a useful index to obtain the keys, in the required order.
What if we need to do server side paging and sorting and apply filter on which rows should be returned (maybe in the example I need to get only those records where the thread_id=some_value and sort them by create_dt)?
The appropriate Key-Seek index in that case would be on (thread_id, create_dt). There is a full reproduction script included with the article. I encourage you to download it, and experiment for yourself.
Custom sorting and filtering is covered in more depth in part III - this first part is just about establishing the basic method.
Paul
April 26, 2010 at 3:14 am
desade (4/26/2010)
any idea why key seek method uses much more CPU for 10 pages then it uses for 100 or 200? It looks as some glitch in testing, but maybe there is logical explanation
There's no special reason that I am aware of - the test results are shown exactly as they appeared. I just put it down to the small numbers involved, the limited timing resolution available, and random chance...
April 26, 2010 at 3:16 am
gitendrashah.net (4/26/2010)
Nice post ......but i have been using this paging before also....so i didn't get any difference......
Well hopefully you found something interesting or useful - if not, perhaps you will in parts 2 and 3.
April 26, 2010 at 3:48 am
Aleksandar Cebov (4/26/2010)
What if we need to do server side paging and sorting and apply filter on which rows should be returned (maybe in the example I need to get only those records where the thread_id=some_value and sort them by create_dt)?
Ok, here's some code to demonstrate what I meant in my previous reply:
-- Key Seek index
CREATE INDEX [IX dbo.Post thread_id, create_dt]
ON dbo.Post (thread_id, create_dt);
DECLARE @PageNumber BIGINT, -- Page number to fetch
@PageSize BIGINT; -- Rows per page
SET @PageSize = 50;
SET @PageNumber = 10;
-- The thread_id to filter on
DECLARE @ThreadID INTEGER;
SET @ThreadID = 6;
-- Key-Seek algorithm
WITH Keys
AS (
-- Step 1 : Number the rows from the non-clustered index
-- Maximum number of rows = @PageNumber * @PageSize
SELECT TOP (@PageNumber * @PageSize)
rn = ROW_NUMBER() OVER (ORDER BY P1.create_dt ASC),
P1.post_id,
P1.create_dt
FROM dbo.Post P1
WHERE P1.thread_id = @ThreadID
ORDER BY
P1.create_dt ASC
),
SelectedKeys
AS (
-- Step 2 : Get the primary keys for the rows on the page we want
-- Maximum number of rows from this stage = @PageSize
SELECT TOP (@PageSize)
SK.rn,
SK.post_id,
SK.create_dt
FROM Keys SK
WHERE SK.rn > ((@PageNumber - 1) * @PageSize)
ORDER BY
SK.create_dt ASC
)
SELECT -- Step 3 : Retrieve the off-index data
-- We will only have @PageSize rows by this stage
SK.rn,
P2.post_id,
P2.thread_id,
P2.member_id,
P2.create_dt,
P2.title,
P2.body
FROM SelectedKeys SK
JOIN dbo.Post P2
ON P2.post_id = SK.post_id
ORDER BY
SK.create_dt ASC
OPTION (RECOMPILE);
Actual execution plan:
April 26, 2010 at 6:14 am
That query plan highlights something I took out of the article at the last minute - to keep things simple.
You might wonder why SQL Server chooses a non-clustered index seek and key lookup, rather than a clustered index seek in the last step.
The answer is again down to the width of the clustered index: for a relatively small number of rows, the cost of the seek + lookup is less than scanning even a very small range on the cluster.
For larger page sizes, SQL Server may choose a partial scan of the cluster, due to the mounting cost of the random I/O associated with the lookups.
A final point: notice that the final loop joins to perform the seek and lookup both have the WithOrderedPrefetch: True and Optimized: False attributes. This is a read-ahead optimization, looking ahead in the index and issuing asynchronous I/O for rows that will be needed for the joins. More details:
http://blogs.msdn.com/craigfr/archive/2008/10/07/random-prefetching.aspx
April 26, 2010 at 6:36 am
I scanned the article quicly and it is well presented.
But I do have to dive into this particular one as I feel something odd is happening. The optimized statement is still just one statement and the steps in the with clause are no more then inline views. I am interested in why this construct manages to forces a way better execution plan.
No time no to go deep into this, but I will 😉
Thx for the article in advance!
April 26, 2010 at 6:52 am
Great article, Paul. I look forward to Part II. 😀
If you are not already covering it in a future article, I am curious if you have any advice on the best way (as in best performance) to incorporate the total record count for calculating total pages in this method, especially when custom filtering is involved.
Of course, if you are already planning to cover this topic, I am happy to patiently wait until then. 😉
Thanks!
April 26, 2010 at 7:35 am
nabidavid (4/26/2010)
If you are not already covering it in a future article, I am curious if you have any advice on the best way (as in best performance) to incorporate the total record count for calculating total pages in this method, especially when custom filtering is involved.
I do cover various methods for obtaining the total record count in part II, and part III is concerned with custom filtering and ordering. Of course it is never possible to cover every possible need - but I hope you will find it useful.
April 26, 2010 at 7:37 am
peter-757102 (4/26/2010)
I scanned the article quicly and it is well presented.
Thanks very much.
But I do have to dive into this particular one as I feel something odd is happening. The optimized statement is still just one statement and the steps in the with clause are no more then inline views. I am interested in why this construct manages to forces a way better execution plan.
You are right to suspect that there is rather more to it than that 😉
I hope you find time for a proper read of it very soon. Please also take a look at the attached scripts if you can.
April 26, 2010 at 7:40 am
Good article. One thing I noticed is that you're using a contiguous identity for your tests, but relying on ROW_NUMBER() for the paging.
I'm assuming that ROW_NUMBER() is used because you're planning for missing identity values due to deleted rows? Given that, I'd like to see the tests run with some random rows deleted, to better reflect a real-world scenario.
April 26, 2010 at 7:58 am
sknox (4/26/2010)
Good article. One thing I noticed is that you're using a contiguous identity for your tests, but relying on ROW_NUMBER() for the paging. I'm assuming that ROW_NUMBER() is used because you're planning for missing identity values due to deleted rows?
The cluster on the identity column is primarily for architectural reasons - as you know, you can never depend on IDENTITY columns to be contiguous, regardless of whether rows are ever deleted.
The ROW_NUMBER is just used to assign a sequential number in selected key order, and to make the selection of keys for the page we want easy.
The only reason I order by the underlying key (post_id) rather than the ROW_NUMBER (rn) is purely a limitation of the optimiser. We both know that the order of ROW_NUMBER assignment matches the ORDER BY of the ranking function, but the current optimiser does not include that logic. Sorting on the key avoids a small extra sort - so it's all rather technical, but then so are many things with SQL Server 🙂
Given that, I'd like to see the tests run with some random rows deleted, to better reflect a real-world scenario.
The full test rig is available for download in the Resources section. 😉
Viewing 15 posts - 1 through 15 (of 55 total)
You must be logged in to reply to this topic. Login to reply