Returning a Subset of a Recordset

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/jwiner/returningasubsetofarecordset.asp

  • This is ludicrous...the same thing can be accomplished without the use of a cursor.

    PlanetJam Media Group


    PlanetJam Media Group

  • I appreciate your input, but I dont believe that this approach is ludicrous. It is merely one way to solve the problem. If you would like to contribute by sending in another approach, please do so. I'd love to offer it as a supplement to the article. As a programmer you should realize that there are many ways to solve the same problem, and some ways are better than other. But never is a solution ludicrous or stupid. This discussion area is a forum for developers to discuss problems and solutions in a friendly, non-threatening environment. Please consider this the next time you offer you opinions. Thank you.

  • I'm trying to wrap my mind around this one. Can you please give a example of where someone can use this.

  • In this example, my friend was attempting to limit the number of records returned from a recordset consisting of up to 50,000 records. This method allowed him to return a subset of that recordset and paginate through it easily and quickly without having to eat up resources by returning all 50,000 records. (Another note...the recordset (data reader) was being returned to a .NET webpage). Hope that helps.

  • Hey Jon,

    I dont think I'd call this ludicrous. Unusual maybe! Typically I'd look to solve any problem with a set based solution first, then move to a cursor as a last resort.

    I'd guess that you actually dont take a really bad performance hit on this, especially if the concurrency is low. Any solution has to take into account how much it will be used - if you called this one once a day or whatever, pretty much who cares how it runs?

    Rather than guess, lets find out - how about we try to come up with a couple other methods, then benchmark - I think would make a good companion article.

    Andy

  • Good idea Andy...Give me some time to come up with alternatives and I'll put something together...If anyone has some other methods they'd like to offer, I can publish them too.

  • Hey Jon,

    Here is my first attempt:

    alter proc usp_GetSubsetOfOrders @BatchSize int = 10, @BatchesToSkip as int = 0, @OrderBy as varchar(100)='orderdate' as

    --9/8/01 Andy Warren

    --Alternate method of returning a subset of records, typically used in 'paging' operations

    --where you only show x records per page and do not want to return the entire recordset

    --to the client.

    declare @RowsToIgnore int

    declare @sql varchar(1000)

    set @RowsToIgnore = @BatchesToSkip * @BatchSize

    set @sql='select top ' + convert(varchar(10), @BatchSize) + ' OrderDate, ShipName, ShipCity from orders where orderid > (select max(orderid) from (select top ' + convert(varchar(10), @RowsToIgnore) + ' orderid from orders order by ' + @OrderBy + ') A) order by ' + @OrderBy

    exec (@Sql)

    Just profiling using QA this appears to be a little faster than your solution.

  • Andy,

    Thanks for you input. I'll test it out. I'll try and write it up as a supplement to this article.

  • Andy,

    The goal of the original procedure was to present a list (or browse as I call it) of certain columns that can be sorted by any of the columns, and be able to page through the records without returning the entire recordset at once.

    There are several things wrong with your alternative. One (easy to fix) is that it will not work with @BatchesToSkip=0 (the default) since TOP 0 will return nothing.

    But more importantly is that it will not work when ordering by any column other than those that "happen" to be in orderid order, such as the orderdate column you're using by default. The solution to this problem is made more difficult when you attempt to order by a column like ShipName where there could be duplicates within that column, and even worse yet if the column you're ordering by could contain nulls. When you consider all of these possibilities and all the intricate code needed to handle them (I know, I've done it), the cursor approach starts looking like the better solution - for the intended goal as I stated to begin with.

    Jay Madren


    Jay Madren

  • Good eye on the Top 0 - I was trying to just get an alternative posted quickly and obviously didnt test well enough.

    I'll grant your point that my example doesnt meet your requirements in all cases (or heck, even in many maybe). I'll take another shot at it this weekend if time permits - I still believe in trying to achieve a set based solution first. Server side cursors are expensive. Keep in mind that I wasnt arguing against your solution - as I mentioned in my earlier post it depends on a lot of things before you call it a 'bad' solution.

    My job here is to try to evangelize best practices crossed with common sense and try to offer ideas for alternatives. In this case IF you could accomplish the goal with either a cursor or a set based solution, Im betting the set based one will use less resources and execute faster. The trick is - can you figure out a set based solution in the same amount of time you can figure out a cursor based one? Most developers know how to write loops, so when a problem looks complex they tend to revert to what they know. Sometimes that is the only way - might be in this case! The second part of the discussion is if you can't build a set based solution, I'd vote for doing the loop in the middle tier, not the data tier.

    We've got a similar discussion under way in the T-SQL forum, you might want to look at that as well. Hope you'll continue to participate, you'd started a discussion that I think a lot of our readers will benefit from. And remember, you've got Jon to thank for getting you into this!

    Andy

  • Andy,

    I agree with you - set based solutions are usually better than cursors, and I wasn't calling your solution a 'bad' one. I'm just relating from experience that when given the complexity of the code needed to fully support the goal as I see it, the cursor solution is so much easier and, in my opinion, not much different in performance or resources, within limits.

    I write a lot of appliations and have to keep setting up these 'browse' procedures for each one. I haven't built a 'univeral' set based procedure that could handle any set of columns from any table. If I could, then it would be a much better solution for me.

    Of course, now that I've mentioned it, you and everyone else will probably want to see it attempted. And I would like to build the 'universal' procedure, just haven't had the time to devote to it. Maybe with the assitance of others we can get it done.

    Also, which topic in the T-SQL forum are you talking about?

    Jay Madren


    Jay Madren

  • A universal browse function sounds interesting (and yes, we may have to try it!), but when you try to make a proc all things to all users, you lose the compilation benefit in most cases. You do keep in on the server (good or bad depending on your view and your needs) and the code is centralized.

    Performance on a pure execution basis is a bit different for those two solutions we have so far(and I havent forgotten that mine doesnt yet meet the spec). Running the orig solution generates about 6 times the disk reads that my solution does.

    An interesting problem.

    Andy

    This is the most recent one:

    http://www.sqlservercentral.com/forum/topic.asp?TOPIC_ID=1014&FORUM_ID=8&CAT_ID=1&Topic_Title=Selecting+the+n%27th+to+the+n%27th+%2B+x+set+of+rows&Forum_Title=T%2DSQL

    And an earlier one, a little different:

    http://www.sqlservercentral.com/forum/topic.asp?TOPIC_ID=989&FORUM_ID=8&CAT_ID=1&Topic_Title=T%2DSQL+for+%22Last+n+records%22+in+a+table&Forum_Title=T%2DSQL

  • I wouldn't call this ludicrous either. I, in general, hate cursors and temp tables, but they have value. While I wouldn't use this approach, it appears to work.

    Personally, I'd burden the programmer and develop a set of stored procedures (one for each sort) and then use a master proc to call one of them. This way the user still get's one proc, but I can optimize all of the individual procs. For those procs that are on unindexed columns, I'd probably combine them into one proc since a scan would occur anyway.

    Steve Jones

    steve@dkranch.net

  • Thats not a bad idea, though a bit of overhead to generate them all. Code generator would ease the pain.

    Andy

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply