May 22, 2010 at 12:53 pm
Comments posted to this topic are about the item Optimising Server-Side Paging - Part II
May 23, 2010 at 10:24 am
Hi Paul
Hard stuff for sunday, but a really great article! 🙂
Greets
Flo
May 24, 2010 at 1:20 am
Fantastic work, great analysis.
If I may make the following observation: the Indexed View solution simply defers calculation from the SELECT query to the UPDATE/INSERT/DELETE queries that edit the source data, so we have simply moved the problem from an isolated domain (getting a count as part of a particular use case or procedure) to one that is not always obvious - any time when the unerlying data is modified.
What is the performance hit when the base table(s) are updated? Do ALL aggregates have to be recalculated? This is perhaps not ideal for a transactional database, but great for a data mart scenario in which updates are well defined and can be performance tuned.
IMHO the lack of paging support in SQL is a glaring omission, and the solutions that exist are all, in some measure, hacks. I guess the problem is, to some degree, moot - but that's another discussion!
Thanks again.
May 24, 2010 at 2:28 am
Hey Paul and thanks for this series, both articles were great.
I think the double-sort situation in the Double Row Number method can be avoided if you simply swap the ROW_NUMBER() OVER statements. That is, calculate the descending order first and then the ascending (or the opposite if you want last-to-first paging). I tried this on a paging query in our application (SQL Server 2005) and it works great, one of the two sorts is gone.
May 24, 2010 at 6:48 am
Florian Reischl (5/23/2010)
Hi PaulHard stuff for sunday, but a really great article! 🙂
Thanks Flo - I appreciate it.
May 24, 2010 at 7:12 am
jcaradocdavies (5/24/2010)
Fantastic work, great analysis.
Thank you.
If I may make the following observation: the Indexed View solution simply defers calculation from the SELECT query to the UPDATE/INSERT/DELETE queries that edit the source data, so we have simply moved the problem from an isolated domain (getting a count as part of a particular use case or procedure) to one that is not always obvious - any time when the unerlying data is modified.
That is a valid observation; however, this sort of pre-aggregation (or denormalisation, if you like) has been a useful optimisation for a very long time. Many (most?) systems read data much more often than they write it, so it can make sense to compute the value once (at write time) rather than many times (each time the SELECT is run).
In the past, these aggregates would have been maintained using triggers. Indexed views offer many benefits over triggers, not least of which is that SQL Server does all the maintenance for you. The maintenance is performed as part of the INSERT/UPDATE/DELETE/MERGE plan, rather than in a separate batch as for trigger code.
I won't bore you with a full discussion of the advantages of indexed views here, but it is useful to note that the optimiser in Enterprise Edition can take advantage of the aggregated view data in existing (and future) queries that do not reference the view directly, but would nevertheless benefit from using it - see Designing Indexed Views for more details on that feature.
What is the performance hit when the base table(s) are updated? Do ALL aggregates have to be recalculated? This is perhaps not ideal for a transactional database, but great for a data mart scenario in which updates are well defined and can be performance tuned.
SQL Server will only maintains the row-level aggregates which are affected by the change - and in a way that is lighter than using a trigger. Maintaining the view is very efficient and pretty cheap, especially compared to the cost of making the data change in the first place. I have personally used indexed views with great benefits on extremely busy OLTP-type systems (a large auction site for example).
Paul
May 24, 2010 at 7:20 am
Ioannis Tsakpinis (5/24/2010)
Hey Paul and thanks for this series, both articles were great.
Thanks, I'm glad you enjoyed them. There will be at least one more in this series.
I think the double-sort situation in the Double Row Number method can be avoided if you simply swap the ROW_NUMBER() OVER statements. That is, calculate the descending order first and then the ascending (or the opposite if you want last-to-first paging). I tried this on a paging query in our application (SQL Server 2005) and it works great, one of the two sorts is gone.
This is true, and something that I missed when writing the test rig. Nonetheless, even one full sort of the full million-row set is enough to make the double-row-number method a poor choice in my view. Thanks for pointing this out though!
Paul
May 24, 2010 at 7:43 am
To what extent can your results be applied to SQL Azure? Will the indexed view method work there?
May 24, 2010 at 8:09 am
sbrownell 30624 (5/24/2010)
To what extent can your results be applied to SQL Azure? Will the indexed view method work there?
The honest answer is: I have absolutely no idea, since I have no familiarity with that product at all.
Perhaps someone else will be able to help here?
edit: Just found the following reference that shows that indexed views are available on SQL Azure:
May 24, 2010 at 8:21 am
Thank you for the very good article.
One question I have, has to do with the need for an ORDER BY clause for the overall SQL statement in the first CTE (Keys). Yes, I understand that using a TOP without an ORDER BY does not guarantee results, however the combination of the ORDER BY in the Window function and the WHERE Clause should take care of that.
Is there something I am missing? Does it even matter?
May 24, 2010 at 9:52 am
Anthony Perkins (5/24/2010)
Thank you for the very good article.
Thanks for reading it 🙂
One question I have, has to do with the need for an ORDER BY clause for the overall SQL statement in the first CTE (Keys). Yes, I understand that using a TOP without an ORDER BY does not guarantee results, however the combination of the ORDER BY in the Window function and the WHERE Clause should take care of that. Is there something I am missing? Does it even matter?
Yes, it matters.
The Keys CTE is all about finding the TOP (@PageNumber * @PageSize) rows in (observation_date, observation_hour, observation_minute) order. If we want page 5, where each page has 10 rows, the Keys CTE will return just 5 * 10 = 50 rows (in that defined order) for the next CTE to work on (logically speaking).
The ORDER BY in the ranking function only applies to the ranking function, so we cannot assume it affects the TOP operation in any way. The WHERE clause is just there to specify the overall restriction on the date range we are interested in, so that does not help either.
For more information the technical details of TOP with ORDER BY, see the following post by Connor Cunningham:
http://blogs.msdn.com/queryoptteam/archive/2006/05/02/588731.aspx
Paul
May 24, 2010 at 10:53 am
The ORDER BY in the ranking function only applies to the ranking function, so we cannot assume it affects the TOP operation in any way. The WHERE clause is just there to specify the overall restriction on the date range we are interested in, so that does not help either.
You are of course absolutely right. I stand corrected.
Thanks again.
May 24, 2010 at 11:20 am
Anthony Perkins (5/24/2010)
You are of course absolutely right. I stand corrected.
No worries. It was a reasonable question, and I would be very surprised if you were the only one to wonder about it.
May 24, 2010 at 12:56 pm
As I've mentioned on other posts that discuss server side paging, most of the examples I have seen deal with Single Table paging examples.
In most cases that I have seen, paging tends to be implemented on more complex search-type queries, with significantly more complex base execution plans. I would be interested in seeing a comparison of the various methods based on a root query with 3-4 joins as a mix of inner and outer, and possibly some other filter operations (IN, LIKE) to mess with the execution plans.
I like the idea of an indexed view, and in those rare cases where I get to build the system from scratch, and design for thier later use, this is an excellent approach. However, I rarely get to do that as a consultant. More often, I am brought in to address an existing performance problem because the people who designed and wrote the system didn't know what they were doing, and most often don't even know what an execution plan is, much less have built a system that meets the strict pre-requisites of implementing indexed views.
As is, based on the article (which is well written and well researched, by the way), I can take a way a great way of handling a milllion+ single table paging solution. Though in the many years I have spent as a DBA, I have rarely run into this case. Thanks for the ideas and the research though, and hope to see more in the future.
May 24, 2010 at 1:08 pm
Paul Muharsky-474732 (5/24/2010)
As I've mentioned on other posts that discuss server side paging, most of the examples I have seen deal with Single Table paging examples.
The examples provided in Part I and Part II are simplified for several reasons, including ease of access for those new to the subject, and space constraints imposed by SSC.
The concepts do apply to more complex scenarios, and I will be addressing at least one of those in future parts. As I said in an earlier response, there will be at least one more part to this series, as I am keen to cover more complex base queries and dynamic search and ordering conditions. The challenge, of course, is to keep it interesting and relevant to as wide an audience as possible.
I do not intend to duplicate the fine work of Erland Sommarskog, but as Kevin Boles pointed out in the discussion for Part I, the Key Seek concept translates well to more complex requirements. Kevin, as you may know, is a SQL Server MVP of long standing and a very well-known and experienced consultant.
If you have any specific ideas that you would like to see me cover, please let me know.
Paul
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply