Definition of "Set Based"

  • I can't find it... I know I saw it but I can't find it, again, even though it was on a thread I took part in. Someone listed the definition of "Set Based" in three relatively simple rules and it was a great definition. Had something about retrieving each row only once, etc, etc.

    I'm trying to teach my guys what set based really means and I would sure like to have a copy of those three marvelous little rules. Thanks folks...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Maybe it was this?

    As I read somewhere once, you don't tell SQL how to do it, you tell SQL what you want, and that's a great way of thinking about it. A procedural programmer gets bogged down with the details, and has to concentrate on breaking things down into small pieces, explicitly reading and processing one row of data at a time, and figuring out how to combine those results together at the end to make it all work. A set-based SQL programmer worries about none of those things: In the set-based world, you state your relations and join the tables together, add some grouping and criteria, and it is the database engine that worries about the specifics.

    from this blog

    http://weblogs.sqlteam.com/jeffs/archive/2007/04/30/60192.aspx


    N 56°04'39.16"
    E 12°55'05.25"

  • Sadly, no, but that's a hell of an article that I'm going to add to my collection... Thanks, Peter...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hot damn! I found it! I finally remembered it had something to do with one of my "running total" posts... here's the three marvelous rules I was looking for...

    - it opens the recordset (table in this case) once;

    - it accesses each value in the recordset once;

    - for any row calculations it uses only data from the same row or from memory.

    And, giving credit where credit is definitely due... thank you, Sergiy! I also had a doubt about cursors qualifying as set based according to the rules above and here's what Sergiy had to say about that...

    Not really.

    Does not fit 1st requirement.

    When using cursor you open recordset as many times as many rows it contains.

    And in most cases cursors are created to have a chance to grab relevant to this particular row data from other tables. So, 3rd requirement typically is not fulfilled as well.

    Just in case anyone else wants to see how all of that unfolded... here's the link to the thread.

    http://www.sqlservercentral.com/Forums/Topic395988-8-1.aspx

    Thanks for looking, folks... I saw lot's of people try to help me find this in the number of members who dwelled on the post.

    Edit -- Just adding the words "Definition of Set Based" to make it easier for me to find this, again 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I remember seeing those and we had had a bit of a disagreement. Not that they're not a reasonable starting point - I'm just not convinced they cover all of the aspects required for "set-based". I'm not even going to dig back up the part about "no order in a set".

    Questions:

    How do aggregates fit into that definition?

    How about processes that (within the context of a SELECT/UPDATE) require RBAR-style table-scans?

    triangle joins? correlated sub-queries?

    Nothing about being able to perform the process to multiple rows at the same time?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Heh... right there with you Matt... lot's of things to think about... the things you brought up are going to help me answer a lot of questions I suddenly have with the term "set based"...

    I personally don't think triangular joins are set based but what about other self joins? They certainly seem to violate the rules because each value is used more than once.

    I think aggragates are set based and they seem to match the 3 rules but do they always?

    UDF's seem to go either way depending on what they do.

    I'm pretty sure that all correlated sub-queries are disqualified as "set based" because they open a record set for every row in another record set... but I've seen some very fast correlated sub-queries (found a nasty fast "missing sequence" correlated sub-query).

    I guess another question might be (especially in light of the mini-triangular joins in some dupe checks), should we be telling people "use set based" or should we simply be recommending "avoid the loop"? Should we get more sophisticated and stop using the apparently overused term "set based" in favor of something a bit more long winded such as "avoid the loop and minimize the number of internal rows generated by a query all while going for a tradeoff between the least amount of CPU time and execution duration"?

    Lot's of good stuff to think about, but I first had to find the rules I'd seen previously.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • To add another twist.

    I had an argument with Joe Celko about ORDER BY.

    It seems that an ORDER BY operation needs [internally] a CURSOR.


    N 56°04'39.16"
    E 12°55'05.25"

  • Heh... you already know this, but the emphasize, loop joins (especially) and a bunch of other "internal" things are really high speed "cursors" that operate at machine language levels.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I kind of chuckled the other day when I came across one of the Microsoft blogs referring to the "running totals" syntax we keep reusing as the "inner loop" and/or "inner cursor" process.

    Cursors definitely have their place...

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Ooohh.... heh... the only time I ever explicity declare cursors is to demo how slow they are when I do... if SQL Server wants to use cursors internally, that's fine... I just can't bring myself to do it. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply