Optimising Server-Side Paging - Part I

Question

Post reply

Optimising Server-Side Paging - Part I

Viewing 15 posts - 31 through 45 (of 55 total)

You must be logged in to reply to this topic. Login to reply

TomThomson SSC Guru Points: 104773 More actions · Answer 1

TomThomson

SSC Guru

Points: 104773

April 26, 2010 at 7:08 pm

#1157952

A brilliant article. Keep it up!

Tom

Paul White SSC Guru Points: 150468 More actions · Answer 2

Thank you Jeff, Lutz, and Tom.

No pressure then 😉

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

ondrej.bouda SSC Enthusiast Points: 164 More actions · Answer 3

I wonder if the TOP specification has any special impact (as claimed) on the query execution - I tried to omit it (replacing it with WHERE PG.rn BETWEEN lowRn AND hignRn) and the IO statistics were the same.

Query plans are nearly identical: both contain Filter and Top side by side, just in reversed order.

Tried in 2005 and 2008 with the same results.

Paul White SSC Guru Points: 150468 More actions · Answer 4

ondrej.bouda (4/27/2010)
I wonder if the TOP specification has any special impact (as claimed) on the query execution - I tried to omit it (replacing it with WHERE PG.rn BETWEEN lowRn AND hignRn) and the IO statistics were the same. Query plans are nearly identical: both contain Filter and Top side by side, just in reversed order. Tried in 2005 and 2008 with the same results.

I did make that claim, so I had better back it up 🙂

My preference here is to always use an extra TOP, since it gives the query optimiser explicit information about the maximum number of rows than can flow past that iterator. Giving the QO more and better information is pretty much always a good thing.

Using BETWEEN is not quite the same thing. It might be obvious to you, as a human, that BETWEEN 1 AND 10 (for example) will return 10 rows at most - but the QO can't currently make that logical leap. It is not able to trace the 'rn' column back to the ROW_NUMBER ranking function, and infer the same guarantee you do.

So, your change subtly introduces scope for a cardinality estimation error: compare the execution plans - the QO has to make a guess (currently 9% selectivity) for the BETWEEN expression. This results in an estimate of 900 rows for the last page of the 10,000-row sample. (For larger data sets, the error would obviously be much larger.)

Clearly this is wrong: we know that a page can have at most @PageSize rows. Using the extra TOP lets the optimiser have that information too.

Using BETWEEN therefore introduces scope for a sub-optimal plan choice after the Filter iterator - something that we can easily avoid, with some good old defensive programming.

Hope that helps.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

ondrej.bouda SSC Enthusiast Points: 164 More actions · Answer 5

Thanks!

I checked my execution plans and it already makes the difference - the plan with TOP is much more accurate in the estimated number of rows.

Thats a good point of view: Tell to the optimizer all that you can know. There's no need to care whether some instruction might be useless (because the optimizer might be smart enough to know it too).

Paul White SSC Guru Points: 150468 More actions · Answer 6

ondrej.bouda (4/27/2010)
Thats a good point of view: Tell to the optimizer all that you can know.

Absolutely right!

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

Paul White NZ (4/27/2010)
ondrej.bouda (4/27/2010)
Thats a good point of view: Tell to the optimizer all that you can know.
Absolutely right!

+2!!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

SQLRNNR SSC Guru Points: 281334 More actions · Answer 8

Nice article Paul. Thanks for sharing.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 9

Hey Paul

Thanks for sharing this great article!

Best wishes,

Flo

tobyteel SSC Journeyman Points: 91 More actions · Answer 10

What's the point of the second CTE in your Key Seek solution? It seems like that could easily be incorporated into the final select statement.

WITH Keys

AS (

-- Step 1 : Number the rows from the non-clustered index

-- Maximum number of rows = @PageNumber * @PageSize

SELECT TOP (@PageNumber * @PageSize)

rn = ROW_NUMBER() OVER (ORDER BY P1.post_id ASC),

P1.post_id

FROM dbo.Post P1

ORDER BY

P1.post_id ASC

)

SELECT TOP (@PageSize)-- Step 3 : Retrieve the off-index data

-- We will only have @PageSize rows by this stage

SK.rn,

P2.post_id,

P2.thread_id,

P2.member_id,

P2.create_dt,

P2.title,

P2.body

FROM Keys SK

JOIN dbo.Post P2

ON P2.post_id = SK.post_id

WHERE SK.rn > ((@PageNumber - 1) * @PageSize)

ORDER BY

SK.post_id ASC;

Seems to run with the exact same IO. Query execution plan simply moves the Top to the end of the plan from the middle. Why the added complexity? I tried 500 results on page 10, and 50 results on page 100.

Strangely, when executing your FetchPageKeySeek and my version above, whichever runs second runs the fastest. I'm running on SQL Server 2008 (10.0.2531) from SSMS.

Also, for those doing their own benchmarks: I find that SSMS's rendering of the resultsets take a significant amount of time. To control against it, I insert the results into a temp table instead.

Great article though. Can't wait for the next installments. Any ETA?

Paul White SSC Guru Points: 150468 More actions · Answer 11

Florian Reischl (5/1/2010)
Hey Paul
Thanks for sharing this great article!
Best wishes,
Flo

Hey Flo, thanks very much! I'm a big fan of your blog by the way.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 12

tobyteel (5/3/2010)
What's the point of the second CTE in your Key Seek solution? It seems like that could easily be incorporated into the final select statement. Seems to run with the exact same IO. Query execution plan simply moves the Top to the end of the plan from the middle. Why the added complexity? I tried 500 results on page 10, and 50 results on page 100.

Three reasons:

1. I intended it to be clearer for those new to the idea to follow what was happening and why.

2. Separating the steps out might make for easier maintenance. Compacting the code makes it less readable in my view, without benefiting the execution plan.

3. Having the TOP operator before the look-ups tells the optimiser that a maximum of @PageSize look-ups will be performed. The query may optimise differently because of this. There was a small discussion previously concerning why it is better to give the optimiser all the information you can. With the TOP at the end, if there were other iterators (such as a filter) after the look-ups, the optimiser might choose a sub-optimal plan for the look-ups based on an incorrect cardinality estimation. The difference between a @PageSize row goal (TOP at the end) and a guaranteed maximum (TOP before the look-ups) is fairly subtle, but can be important.

Strangely, when executing your FetchPageKeySeek and my version above, whichever runs second runs the fastest. I'm running on SQL Server 2008 (10.0.2531) from SSMS.

This is the effect of cached data pages. The downloadable code in the Resources section provides a full test rig which eliminates this problem and makes a fairer comparison.

Also, for those doing their own benchmarks: I find that SSMS's rendering of the resultsets take a significant amount of time. To control against it, I insert the results into a temp table instead.

True. For a larger number of rows I tend to do the same myself. Feel free to modify the test rig to do this if you wish.

Great article though. Can't wait for the next installments. Any ETA?

Part II was accepted for publication a week or so ago. Just waiting for Steve to schedule it. Are you listening, Steve? 😛

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

Paul White NZ (5/3/2010)
Florian Reischl (5/1/2010)
Hey Paul
Thanks for sharing this great article!
Best wishes,
Flo
Hey Flo, thanks very much! I'm a big fan of your blog by the way.

Can you give a link to Florian's blog please? That's gotta be a keeper!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Paul White SSC Guru Points: 150468 More actions · Answer 14

TheSQLGuru (5/4/2010)
Can you give a link to Florian's blog please? That's gotta be a keeper!

It's http://florianreischl.blogspot.com/ but I apologise in advance - I have a amateur interest in SQLCLR and C# in particular, and Flo writes some good stuff bridging the gap between .NET and SQL Server. May not be for everyone, but please do check it out.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 15

Paul White NZ (5/3/2010)
Florian Reischl (5/1/2010)
Hey Paul
Thanks for sharing this great article!
Best wishes,
Flo
Hey Flo, thanks very much! I'm a big fan of your blog by the way.

Hey Paul

Thanks a bunch! Sadly I had no time over the last three month to publish some new things.

(Just started in a new company - yesterday).

Greets

Flo