What is meant by set based programming?

  • Jeff Moden (8/13/2012)


    Evil Kraig F (8/10/2012)


    Not that I don't admire Jeff's abilities, but he doesn't speak the language sometimes... or better stated, he speaks the language of database.

    I respectfully submit that even someone who knows nothing of databases can relate to the simplicty of rows vs columns. 😀

    Well, one would hope... I've seen it not work that way too often, unfortunately. I fear we're biased to the simplicity.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (8/13/2012)


    Jeff Moden (8/13/2012)


    Evil Kraig F (8/10/2012)


    Not that I don't admire Jeff's abilities, but he doesn't speak the language sometimes... or better stated, he speaks the language of database.

    I respectfully submit that even someone who knows nothing of databases can relate to the simplicty of rows vs columns. 😀

    Well, one would hope... I've seen it not work that way too often, unfortunately. I fear we're biased to the simplicity.

    Coming from an APL background, I was frequently called upon to think in terms of arrays of rank 6 or higher. Rows and columns represent merely rank 2.

    Anybody who's taken just a little bit of math in university should be able to relate to a matrix.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • ...

    Anybody who's taken just a little bit of math in university should be able to relate to a matrix.

    Ough! Yes! I have seen this film! Also, I've seen The Matrix Reloaded and The Matrix Revolutions. They are cool and set-based!

    :w00t::hehe::w00t:

    To OP:

    Have you ever done anything with XSLT? It's one of the closest things to set-based from non-database programming world.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • From what I remember, The Matrix trilogy is about a loop and a conditional BREAK. So they wouldn't be set-based. I would call them simple RBAR. :-D:-P

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (8/14/2012)


    From what I remember, The Matrix trilogy is about a loop and a conditional BREAK. So they wouldn't be set-based. I would call them simple RBAR. :-D:-P

    Nah. The Matrix is a story about a guy who wants to be the biggest mass murderer in human history. Goes by the screen-name "Morpheus".

    "We don't pull anyone out The Matrix past a certain age...blah blah blah" implied that it will drive them insane and/or kill them through shock.

    So "freeing" humanity from The Matrix will kill (or at least drive insane) everyone over a certain age. Mass murder.

    Also, their city, Zion, doesn't appear to have the necessary medical and daycare facilities for everyone below that certain age, implied to be children by the scenes with The Oracle. Every one of them will be suffering severe headaches and eye pain ("why do my eyes hurt?" scene), and severe muscle atrophy (needles and wires scene). So he's essentially planning to subject all of them to death by dehydration (they won't be able to obtain water, food, etc., and dehydration will kill faster than starvation), while suffering from headaches, severe disorientation, and so on. Mass murder of children.

    So, even ignoring the medical misinformation (unused eyes post-birth results in atrophy of the optic nerves, resulting in lifelong blindness, not "my eyes hurt", but scientific accuracy is not one of The Matrix's many failings), Morpheus and his cohort in crimes against humanity, Neo, are essentially planning on nothing but mass murder on a scale that makes Hitler and Stalin, added together, look like rank amateurs.

    Of course, that's par for the course. Think about all poor people at the end of Logan's Run. No survival-related skills at all, let loose in North America, near Washington, DC. They'll all die in the winter after the city falls, if not sooner.

    The list goes on, movie after movie, all resulting in mass death for innocent people. Isn't Hollywood fun! :w00t:

    (Yeah, I have too much idle time and watch way too many movies.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Shadab Shah (8/12/2012)


    Jeff Moden (8/10/2012)


    Shadab Shah (8/9/2012)


    Hi all,

    I have come across various section on this website where it is mention that SQL Server is a set based programming language.

    My understanding about it is that : Set based means that the operations are perform on a set of values but i am not able to relate this defination with operations being perform with SQL Server.

    I am from C language background and i don't see much difference in applying logic over here but still these are early days for me in SQL Server, so i think that i am missing something or unable to graps the concept.

    Perhaps the easiest way to make this clear is to use an actual example of code. If you're game, post the code that you would use if someone told you they simply wanted a single column result set from 1 to a million and we'll discuss whatever you come up with.

    Hi,

    Let me consider a example over here as

    Select subject1 + subject2 + subject3 As TotalMarks from Student where class= 'ComputerScience'

    As per my understanding this operation would be perform row by row. The first row with the class ComputerScience is selected and then the other and then it goes on. so does that means that the above SQL Statement does not represent what we called as set based operation.

    I think after you answer this question my understanding for set based programming would be more clear.Thanks

    One thing to keep in mind is that set based execution is not any faster than c based manuallly coded loops except of course that database engine programmers are probably better than the average c programmer. What does make a HUGE difference is how slow T-SQL the language interpreter is. If you were to compare a simple loop coded in T-SQL against pretty much any other language, its just slower to a tremendous degree. So a single line of T-SQL is going to be slow in any case, but if that single line of T-SQL results in an operation against many rows of data, the cost of that slow T-SQL can be amortized across a much greater amount of work.

  • patrickmcginnis59 (8/14/2012)


    Shadab Shah (8/12/2012)


    Jeff Moden (8/10/2012)


    Shadab Shah (8/9/2012)


    Hi all,

    I have come across various section on this website where it is mention that SQL Server is a set based programming language.

    My understanding about it is that : Set based means that the operations are perform on a set of values but i am not able to relate this defination with operations being perform with SQL Server.

    I am from C language background and i don't see much difference in applying logic over here but still these are early days for me in SQL Server, so i think that i am missing something or unable to graps the concept.

    Perhaps the easiest way to make this clear is to use an actual example of code. If you're game, post the code that you would use if someone told you they simply wanted a single column result set from 1 to a million and we'll discuss whatever you come up with.

    Hi,

    Let me consider a example over here as

    Select subject1 + subject2 + subject3 As TotalMarks from Student where class= 'ComputerScience'

    As per my understanding this operation would be perform row by row. The first row with the class ComputerScience is selected and then the other and then it goes on. so does that means that the above SQL Statement does not represent what we called as set based operation.

    I think after you answer this question my understanding for set based programming would be more clear.Thanks

    One thing to keep in mind is that set based execution is not any faster than c based manuallly coded loops except of course that database engine programmers are probably better than the average c programmer. What does make a HUGE difference is how slow T-SQL the language interpreter is. If you were to compare a simple loop coded in T-SQL against pretty much any other language, its just slower to a tremendous degree. So a single line of T-SQL is going to be slow in any case, but if that single line of T-SQL results in an operation against many rows of data, the cost of that slow T-SQL can be amortized across a much greater amount of work.

    I'd like to see some evidence to back up these assertions.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (8/14/2012)


    patrickmcginnis59 (8/14/2012)


    Shadab Shah (8/12/2012)


    Jeff Moden (8/10/2012)


    Shadab Shah (8/9/2012)


    Hi all,

    I have come across various section on this website where it is mention that SQL Server is a set based programming language.

    My understanding about it is that : Set based means that the operations are perform on a set of values but i am not able to relate this defination with operations being perform with SQL Server.

    I am from C language background and i don't see much difference in applying logic over here but still these are early days for me in SQL Server, so i think that i am missing something or unable to graps the concept.

    Perhaps the easiest way to make this clear is to use an actual example of code. If you're game, post the code that you would use if someone told you they simply wanted a single column result set from 1 to a million and we'll discuss whatever you come up with.

    Hi,

    Let me consider a example over here as

    Select subject1 + subject2 + subject3 As TotalMarks from Student where class= 'ComputerScience'

    As per my understanding this operation would be perform row by row. The first row with the class ComputerScience is selected and then the other and then it goes on. so does that means that the above SQL Statement does not represent what we called as set based operation.

    I think after you answer this question my understanding for set based programming would be more clear.Thanks

    One thing to keep in mind is that set based execution is not any faster than c based manuallly coded loops except of course that database engine programmers are probably better than the average c programmer. What does make a HUGE difference is how slow T-SQL the language interpreter is. If you were to compare a simple loop coded in T-SQL against pretty much any other language, its just slower to a tremendous degree. So a single line of T-SQL is going to be slow in any case, but if that single line of T-SQL results in an operation against many rows of data, the cost of that slow T-SQL can be amortized across a much greater amount of work.

    I'd like to see some evidence to back up these assertions.

    I will admit that I had no impartial testing facilities when I did my informal testing. The SQL server is pretty much a rack mount server based machine, and the programming for dot net loops was done on a desktop. For set operations, the server is really really fast, T-SQL loops not so much. The difference was so dramatic that I informally reached this conclusion. Happy to post here and be proven wrong about it by all means 🙂

    edit: I went ahead and ran on the same computer the following code. The vb.net loops ran at most 3 seconds. The T-SQL loops ran for 9:15, or 555 seconds. Its important to give a bit of work to the computer inside the loops otherwise the compiler could very well eliminate some code. Arguably I'm not smart enough so the dot net compiler could still be folding up some code, but in all actuality, I'd want T-SQL to do it too 😉

    The routines are simple enought that I also tried perl and vba but couldn't find anything slower than T-SQL. Obviously testing a c version would be sort of silly. Any counterpoints or criticisms are clearly welcome, but otherwise I'm going to continue in my belief that T-SQL is just not that fast a programming environment.

    DECLARE @X INT

    DECLARE @Y INT

    DECLARE @Z INT

    SET @X = 1

    SET @Z = 1

    WHILE @X < 20000

    BEGIN

    SET @Y = 1

    WHILE @Y < 20000

    BEGIN

    SET @Y = @Y + 1

    SET @Z = (@Z * 2) + 1

    IF @Z > 50000

    BEGIN

    SET @Z = @Z - 50000

    END

    END

    SET @X = @X + 1

    END

    SELECT @Z

    Module Module1

    Sub Main()

    Dim x As Integer

    Dim y As Integer

    Dim z As Integer

    x = 1

    z = 1

    While x < 20000

    y = 1

    While y < 20000

    y = y + 1

    z = (z * 2) + 1

    If z > 50000 Then

    z = z - 50000

    End If

    End While

    x = x + 1

    End While

    Console.WriteLine(z)

    End Sub

    End Module

  • patrickmcginnis59 (8/14/2012)


    I will admit that I had no impartial testing facilities when I did my informal testing. The SQL server is pretty much a rack mount server based machine, and the programming for dot net loops was done on a desktop. For set operations, the server is really really fast, T-SQL loops not so much. The difference was so dramatic that I informally reached this conclusion. Happy to post here and be proven wrong about it by all means 🙂

    edit: I went ahead and ran on the same computer the following code. The vb.net loops ran at most 3 seconds. The T-SQL loops ran for 9:15, or 555 seconds. Its important to give a bit of work to the computer inside the loops otherwise the compiler could very well eliminate some code. Arguably I'm not smart enough so the dot net compiler could still be folding up some code, but in all actuality, I'd want T-SQL to do it too 😉

    The routines are simple enought that I also tried perl and vba but couldn't find anything slower than T-SQL. Obviously testing a c version would be sort of silly. Any counterpoints or criticisms are clearly welcome, but otherwise I'm going to continue in my belief that T-SQL is just not that fast a programming environment.

    DECLARE @X INT

    DECLARE @Y INT

    DECLARE @Z INT

    SET @X = 1

    SET @Z = 1

    WHILE @X < 20000

    BEGIN

    SET @Y = 1

    WHILE @Y < 20000

    BEGIN

    SET @Y = @Y + 1

    SET @Z = (@Z * 2) + 1

    IF @Z > 50000

    BEGIN

    SET @Z = @Z - 50000

    END

    END

    SET @X = @X + 1

    END

    SELECT @Z

    Module Module1

    Sub Main()

    Dim x As Integer

    Dim y As Integer

    Dim z As Integer

    x = 1

    z = 1

    While x < 20000

    y = 1

    While y < 20000

    y = y + 1

    z = (z * 2) + 1

    If z > 50000 Then

    z = z - 50000

    End If

    End While

    x = x + 1

    End While

    Console.WriteLine(z)

    End Sub

    End Module

    If this is the kind of programming you're attempting to test - it's not set-based so there would be no reason to do such thing on a SQL server. You're not even accessing any data so I don't know why you'd ever ask a DB engine to ever do this. I can't say I am surprised that it wouldn't be paricularly good at executing this.

    Running a program on a set-based processing engine isn't the same as "set-based programming". A hammer and a screwdriver are good tools - they're just not good at the same things.

    That said - if I gave you a file or a table with 10,000,000 invoices and I needed to get a balance by customer, I guarantee you SQL server will tear the doors off of anything procedural you might come up with.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (#4) (8/14/2012)

    If this is the kind of programming you're attempting to test - it's not set-based so there would be no reason to do such thing on a SQL server. You're not even accessing any data so I don't know why you'd ever ask a DB engine to ever do this. I can't say I am surprised that it wouldn't be paricularly good at executing this.

    The post I was replying to asked if there was anything to back up my theory that RBAR was slow because of the T-SQL overhead. I posted my little test.

    Running a program on a set-based processing engine isn't the same as "set-based programming". A hammer and a screwdriver are good tools - they're just not good at the same things.

    Nobody is disputing this, in fact my little test confirms this.

    That said - if I gave you a file or a table with 10,000,000 invoices and I needed to get a balance by customer, I guarantee you SQL server will tear the doors off of anything procedural you might come up with.

    I could probably beat it with assembler just by virtue of fitting in level 1 cache 😉 But if set based code worked, why would I not want to use SQL?

  • patrickmcginnis59 (8/14/2012)


    <snip> A series of non-data retrieval encodings.

    I believe there is possible confusion on definitions here. Patrick, what do you define as 'set-based programming'? For reference, the 'set based coding' that's used around here is basically the difference between (rediculous to make it obvious, of course):

    Set based:

    SELECT

    *

    FROM

    tblA AS a

    JOIN

    tblB AS b

    ONa.id = b.id

    Not Set Based:

    DECLARE @id INT

    DECLARE Ugh CURSOR FOR SELECT id FROM tblA

    OPEN Ugh

    FETCH NEXT FROM Ugh INTO @id

    WHILE @@FETCH_STATUS = 0

    BEGIN

    INSERT INTO #results

    SELECT

    *

    FROM

    tblB

    WHERE

    id = @id

    FETCH NEXT FROM Ugh INTO @id

    END

    Close ... deallocate...

    Obviously a contrived example but you'd be amazed at how often only slightly more complex problems get turned into while loops and cursors.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (8/14/2012)

    ...stuff...

    The post I was replying to asked if there was anything to back up my theory that RBAR was slow because of the T-SQL overhead. I posted my little test.

    I think maybe you are simply not reading the thread you are posting in. But that is also just a theory 😉

    edit: apologies to Kraig, I will in the future keep my opinions of your thread following skills to myself.

  • patrickmcginnis59 (8/14/2012)


    Matt Miller (#4) (8/14/2012)

    If this is the kind of programming you're attempting to test - it's not set-based so there would be no reason to do such thing on a SQL server. You're not even accessing any data so I don't know why you'd ever ask a DB engine to ever do this. I can't say I am surprised that it wouldn't be paricularly good at executing this.

    The post I was replying to asked if there was anything to back up my theory that RBAR was slow because of the T-SQL overhead. I posted my little test.

    Oh - well that wasn't my understanding of what you were replying to. I was under the impression you were providing evidence supporting this:

    One thing to keep in mind is that set based execution is not any faster than c based manuallly coded loops except of course that database engine programmers are probably better than the average c programmer.

    Perhaps there's a misunderstanding, but to me there's a mountain of difference between set-based processing and manually-coded loops. Never mind that the code looks nothing like the procedural version - the processing and the speed are dramatically different. Keep in mind RBAR is NOT set-based coding, or is not GOOD set-based coding.

    As to the assembler: I severely doubt you could fit 10 million invoices into level 1 cache. Even if you could, you'd have to spend months and years building the types of optimizations SQL Server has been building into its engine to deal with these kind of uses. But yes - with enough time and effort you might come up with something that might match it. Assuming you do - package it and we can talk about giving microsoft a run for its money then....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • patrickmcginnis59 (8/14/2012)


    Evil Kraig F (8/14/2012)

    ...stuff...

    The post I was replying to asked if there was anything to back up my theory that RBAR was slow because of the T-SQL overhead. I posted my little test.

    I think maybe you are simply not reading the thread you are posting in. But that is also just a theory 😉

    No, I fear you're misunderstanding the term RBAR. Row-By-Agonizing-Row. This specifically remarks on the difference between accessing the rows in an index or table one at a time instead of as an internal tableset to the optimizer.

    Your examples are simple iterative incrementation and numeric manipulation. This can be a 'set' based on context (particularly if you have a mathmatical background), thus why I asked you for your definition. You are quite correct, however, that .NET and other programming languages are more suited to the task of simple incrementation and variable mathmatics. For a particular example; string manipulation in SQL is outpaced simply by using a CLR function instead.

    So you're aware, your smiley does not remove either the sting nor the implications from your words. This is a T-SQL/SQL Server based site and we concentrate on the database layer itself, which implies data retrieval as the primary purpose for any task. I asked merely to maintain we were discussing apples to apples and provided an example for clarity. Set based programming, as a term, gets tossed around a lot as an idea and the context is important. Screw me for being polite and not jamming you.

    You made an assertation, and were asked to back it up. You then did so in a way that is completely out of context for 99% of the tasks performed in this language. I'd assumed (stupid me!) that you merely misunderstood what set-based programming meant in a database environment. Also, even then, had you read the thread, set based programming is generally understood to be declarative programming, not procedural... and thus neither of your examples are actually an example. See Springtown's and Dwain's posts above. G2 and Jeff then get into local lingo discussions of the difference.

    We'll go to the wiki for an outside reference:

    http://en.wikipedia.org/wiki/Set_(computer_science)

    Paraphrased: The set is a structure that stores values, not the creation or maintenance of said values.

    Set based programming is working with the group in its entirety. RBAR, or iterative programming, are your examples. We're quite aware that works very poorly inside the SQL engine, and just about any other DB engine after the DB2 generation, since that was mostly external engines coded as a hybrid connection, but to your specific assertation:

    One thing to keep in mind is that set based execution is not any faster than c based manuallly coded loops except of course that database engine programmers are probably better than the average c programmer. What does make a HUGE difference is how slow T-SQL the language interpreter is. If you were to compare a simple loop coded in T-SQL against pretty much any other language, its just slower to a tremendous degree. So a single line of T-SQL is going to be slow in any case, but if that single line of T-SQL results in an operation against many rows of data, the cost of that slow T-SQL can be amortized across a much greater amount of work.

    That the internal set access loop is not any faster than c-based looping. In this, because C based coding can mimic assembler with how close to memory access it can achieve, I'll agree with, as a set has to be in memory for access. This also would assume that the C-based coding included concepts such as index retrievals, key lookups, and any other number of items that are used to speed set-based operations within any DB engine. I imagine that's a LOT of work. Like a few years worth of a dedicated team to create the basics.... oh, right, that's MySQL.

    The language interpreter for any piece of code is fired up once, same as .NET and its recent ancestors. Slow or fast, both sides need a 'compile time'. Most .NET languages do that work up front and create an executable, SQL Server stores this in memory and is transparent about its compile times. The compilation of the code is non-applicable during runtime except immediately after a reboot under most circumstances.

    You then state a single line of T-SQL is going to be slow in any case. Truly? PRINT GETDATE() returns in microseconds. The majority of the time of the process is network transfer. This requires some exposition if you wish to maintain that as a fact.

    Now, loops, which I skipped over briefly. You're not going to hear any of the older hands here state that a loop isn't the slowest method of accessing a piece of data in the database engines, and you're quite correct that T-SQL isn't optimized to hand built incremental iterators. You've proven grass is green, but nothing important to the work. Of all the things to prove, that was the one that noone would disagree with.

    So, about reading the thread... would you like to try it yourself? No, that's not meant to either be funny nor polite.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • ...

    That said - if I gave you a file or a table with 10,000,000 invoices and I needed to get a balance by customer, I guarantee you SQL server will tear the doors off of anything procedural you might come up with.

    I could probably beat it with assembler just by virtue of fitting in level 1 cache

    ...

    Assembler? Try!

    I will not be surprised that your assembler code will be slower, until you are really good in it.

    Just a loop wouldn't be good enough. You will need to code some "bubbled" loops to get it faster even in assembler. And doing it in pure assembler (even for intel-like cpu's) will not be a simple task.

    So, stick to SQL 😉

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

Viewing 15 posts - 16 through 30 (of 34 total)

You must be logged in to reply to this topic. Login to reply