April 26, 2010 at 7:08 pm
A brilliant article. Keep it up!
Tom
April 26, 2010 at 10:21 pm
April 27, 2010 at 4:22 am
I wonder if the TOP specification has any special impact (as claimed) on the query execution - I tried to omit it (replacing it with WHERE PG.rn BETWEEN lowRn AND hignRn) and the IO statistics were the same.
Query plans are nearly identical: both contain Filter and Top side by side, just in reversed order.
Tried in 2005 and 2008 with the same results.
April 27, 2010 at 5:43 am
ondrej.bouda (4/27/2010)
I wonder if the TOP specification has any special impact (as claimed) on the query execution - I tried to omit it (replacing it with WHERE PG.rn BETWEEN lowRn AND hignRn) and the IO statistics were the same. Query plans are nearly identical: both contain Filter and Top side by side, just in reversed order. Tried in 2005 and 2008 with the same results.
I did make that claim, so I had better back it up 🙂
My preference here is to always use an extra TOP, since it gives the query optimiser explicit information about the maximum number of rows than can flow past that iterator. Giving the QO more and better information is pretty much always a good thing.
Using BETWEEN is not quite the same thing. It might be obvious to you, as a human, that BETWEEN 1 AND 10 (for example) will return 10 rows at most - but the QO can't currently make that logical leap. It is not able to trace the 'rn' column back to the ROW_NUMBER ranking function, and infer the same guarantee you do.
So, your change subtly introduces scope for a cardinality estimation error: compare the execution plans - the QO has to make a guess (currently 9% selectivity) for the BETWEEN expression. This results in an estimate of 900 rows for the last page of the 10,000-row sample. (For larger data sets, the error would obviously be much larger.)
Clearly this is wrong: we know that a page can have at most @PageSize rows. Using the extra TOP lets the optimiser have that information too.
Using BETWEEN therefore introduces scope for a sub-optimal plan choice after the Filter iterator - something that we can easily avoid, with some good old defensive programming.
Hope that helps.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
April 27, 2010 at 6:13 am
Thanks!
I checked my execution plans and it already makes the difference - the plan with TOP is much more accurate in the estimated number of rows.
Thats a good point of view: Tell to the optimizer all that you can know. There's no need to care whether some instruction might be useless (because the optimizer might be smart enough to know it too).
April 27, 2010 at 6:19 am
ondrej.bouda (4/27/2010)
Thats a good point of view: Tell to the optimizer all that you can know.
Absolutely right!
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
April 27, 2010 at 7:29 am
Paul White NZ (4/27/2010)
ondrej.bouda (4/27/2010)
Thats a good point of view: Tell to the optimizer all that you can know.Absolutely right!
+2!!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
April 28, 2010 at 10:40 am
Nice article Paul. Thanks for sharing.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
May 1, 2010 at 9:20 am
Hey Paul
Thanks for sharing this great article!
Best wishes,
Flo
May 3, 2010 at 3:56 pm
What's the point of the second CTE in your Key Seek solution? It seems like that could easily be incorporated into the final select statement.
WITH Keys
AS (
-- Step 1 : Number the rows from the non-clustered index
-- Maximum number of rows = @PageNumber * @PageSize
SELECT TOP (@PageNumber * @PageSize)
rn = ROW_NUMBER() OVER (ORDER BY P1.post_id ASC),
P1.post_id
FROM dbo.Post P1
ORDER BY
P1.post_id ASC
)
SELECT TOP (@PageSize)-- Step 3 : Retrieve the off-index data
-- We will only have @PageSize rows by this stage
SK.rn,
P2.post_id,
P2.thread_id,
P2.member_id,
P2.create_dt,
P2.title,
P2.body
FROM Keys SK
JOIN dbo.Post P2
ON P2.post_id = SK.post_id
WHERE SK.rn > ((@PageNumber - 1) * @PageSize)
ORDER BY
SK.post_id ASC;
Seems to run with the exact same IO. Query execution plan simply moves the Top to the end of the plan from the middle. Why the added complexity? I tried 500 results on page 10, and 50 results on page 100.
Strangely, when executing your FetchPageKeySeek and my version above, whichever runs second runs the fastest. I'm running on SQL Server 2008 (10.0.2531) from SSMS.
Also, for those doing their own benchmarks: I find that SSMS's rendering of the resultsets take a significant amount of time. To control against it, I insert the results into a temp table instead.
Great article though. Can't wait for the next installments. Any ETA?
May 3, 2010 at 8:19 pm
Florian Reischl (5/1/2010)
Hey PaulThanks for sharing this great article!
Best wishes,
Flo
Hey Flo, thanks very much! I'm a big fan of your blog by the way.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 3, 2010 at 8:32 pm
tobyteel (5/3/2010)
What's the point of the second CTE in your Key Seek solution? It seems like that could easily be incorporated into the final select statement. Seems to run with the exact same IO. Query execution plan simply moves the Top to the end of the plan from the middle. Why the added complexity? I tried 500 results on page 10, and 50 results on page 100.
Three reasons:
1. I intended it to be clearer for those new to the idea to follow what was happening and why.
2. Separating the steps out might make for easier maintenance. Compacting the code makes it less readable in my view, without benefiting the execution plan.
3. Having the TOP operator before the look-ups tells the optimiser that a maximum of @PageSize look-ups will be performed. The query may optimise differently because of this. There was a small discussion previously concerning why it is better to give the optimiser all the information you can. With the TOP at the end, if there were other iterators (such as a filter) after the look-ups, the optimiser might choose a sub-optimal plan for the look-ups based on an incorrect cardinality estimation. The difference between a @PageSize row goal (TOP at the end) and a guaranteed maximum (TOP before the look-ups) is fairly subtle, but can be important.
Strangely, when executing your FetchPageKeySeek and my version above, whichever runs second runs the fastest. I'm running on SQL Server 2008 (10.0.2531) from SSMS.
This is the effect of cached data pages. The downloadable code in the Resources section provides a full test rig which eliminates this problem and makes a fairer comparison.
Also, for those doing their own benchmarks: I find that SSMS's rendering of the resultsets take a significant amount of time. To control against it, I insert the results into a temp table instead.
True. For a larger number of rows I tend to do the same myself. Feel free to modify the test rig to do this if you wish.
Great article though. Can't wait for the next installments. Any ETA?
Part II was accepted for publication a week or so ago. Just waiting for Steve to schedule it. Are you listening, Steve? 😛
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 4, 2010 at 7:44 am
Paul White NZ (5/3/2010)
Florian Reischl (5/1/2010)
Hey PaulThanks for sharing this great article!
Best wishes,
Flo
Hey Flo, thanks very much! I'm a big fan of your blog by the way.
Can you give a link to Florian's blog please? That's gotta be a keeper!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 4, 2010 at 8:00 am
TheSQLGuru (5/4/2010)
Can you give a link to Florian's blog please? That's gotta be a keeper!
It's http://florianreischl.blogspot.com/ but I apologise in advance - I have a amateur interest in SQLCLR and C# in particular, and Flo writes some good stuff bridging the gap between .NET and SQL Server. May not be for everyone, but please do check it out.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 4, 2010 at 8:25 am
Paul White NZ (5/3/2010)
Florian Reischl (5/1/2010)
Hey PaulThanks for sharing this great article!
Best wishes,
Flo
Hey Flo, thanks very much! I'm a big fan of your blog by the way.
Hey Paul
Thanks a bunch! Sadly I had no time over the last three month to publish some new things.
(Just started in a new company - yesterday).
Greets
Flo
Viewing 15 posts - 31 through 45 (of 55 total)
You must be logged in to reply to this topic. Login to reply