A few reflections on RBAR by a weary application developer

  • Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I didn't see any explanation on why RBAR is so slow, so rated this article 1 star.

    This is how RBAR comes to be in systems. Many people just don't know and so they lean towards what they do. It gets the job done at the expense of future scalability and rework.

    I'm interested in why RBAR is so slow in SQL systems, and some of the following psychology that develops around this is pretty interesting too. I've seen stuff on the net like "this non-T-SQL program is so slow, I bet it uses RBAR inside".

    I think its getting to be on the level of those cargo-cult societies.

    edit: adjusted cargo-cult sentence 🙂

    Hi Patrick,

    I'm still neck deep in it at work but I did have the time to look at some of the attempts at explaining why RBAR is slower. I actually have a physical example that you'll be able to look at and, after quaffing a mighty and well deserved ale after the nightmare known as "today", I'll try to get it to you. It depends a lot on when I get done today.

    Thank you for the question. The answer might also explain the difference between RBAR and necessary RBR.

    Looking forward to it!

    Here was my first situation that started me looking at T-SQL itself. I was initially interested in the results because I had seen some poor performance with RBAR looking T-SQL so I drilled down and had a go at the language itself independent of data:

    http://www.sqlservercentral.com/Forums/FindPost1344895.aspx

    and I remember it didn't get rave reviews in that thread either, one of my favorite replies:

    http://www.sqlservercentral.com/Forums/FindPost1345055.aspx

  • patrickmcginnis59 10839 (12/9/2013)


    Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I didn't see any explanation on why RBAR is so slow, so rated this article 1 star.

    This is how RBAR comes to be in systems. Many people just don't know and so they lean towards what they do. It gets the job done at the expense of future scalability and rework.

    I'm interested in why RBAR is so slow in SQL systems, and some of the following psychology that develops around this is pretty interesting too. I've seen stuff on the net like "this non-T-SQL program is so slow, I bet it uses RBAR inside".

    I think its getting to be on the level of those cargo-cult societies.

    edit: adjusted cargo-cult sentence 🙂

    Hi Patrick,

    I'm still neck deep in it at work but I did have the time to look at some of the attempts at explaining why RBAR is slower. I actually have a physical example that you'll be able to look at and, after quaffing a mighty and well deserved ale after the nightmare known as "today", I'll try to get it to you. It depends a lot on when I get done today.

    Thank you for the question. The answer might also explain the difference between RBAR and necessary RBR.

    Looking forward to it!

    Here was my first situation that started me looking at T-SQL itself. I was initially interested in the results because I had seen some poor performance with RBAR looking T-SQL so I drilled down and had a go at the language itself independent of data:

    http://www.sqlservercentral.com/Forums/FindPost1344895.aspx

    and I remember it didn't get rave reviews in that thread either, one of my favorite replies:

    http://www.sqlservercentral.com/Forums/FindPost1345055.aspx

    You can't separate the data from t-sql though since it's a a language specifically designed to work with data in a sql server engine. Ignoring data manipulation it's expected that it would be slower than a procedural language.

    And regardless rbar is a separate issue than language performance in a loop.

  • So let's take an example of a SELECT that does not make any use of indexes.

    RBAR fetches more rows from the table. False. It has to get the data from somewhere.

    RBAR is worse because it avoid the use of indexes. False. Sort of. There may or may not be indexes that will help the query.

    RBAR increases memory pressure inside the application. False. I can open the query in such a way that my application only sees one row at a time. Very low memory consumption for my application either way.

    RBAR precludes many opportunities for parallel processing. True.

    RBAR increases the amount of bandwidth taken up by my connection. True. All those rows are going to affect your network. The boss won't be happy when he can't get to his (Stock Quotes, Sports Scores, Cat Videos, Hot Babes, Hot Guys, whatever).

    Does RBAR take more time? Depends. Given my constraint of no indexes then you have to test each situation. We had a good set based query that summed up some data. It typically took 27 seconds but was taking longer and longer. We had a time out of 30 seconds and we started to hit that more and more often. We move a predicate from the WHERE clause to the JOIN clause and the query now runs in 4 seconds. It was not RBAR to begin with but was way better after the fix.

    ATBCharles Kincaid

  • ZZartin (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I didn't see any explanation on why RBAR is so slow, so rated this article 1 star.

    This is how RBAR comes to be in systems. Many people just don't know and so they lean towards what they do. It gets the job done at the expense of future scalability and rework.

    I'm interested in why RBAR is so slow in SQL systems, and some of the following psychology that develops around this is pretty interesting too. I've seen stuff on the net like "this non-T-SQL program is so slow, I bet it uses RBAR inside".

    I think its getting to be on the level of those cargo-cult societies.

    edit: adjusted cargo-cult sentence 🙂

    Hi Patrick,

    I'm still neck deep in it at work but I did have the time to look at some of the attempts at explaining why RBAR is slower. I actually have a physical example that you'll be able to look at and, after quaffing a mighty and well deserved ale after the nightmare known as "today", I'll try to get it to you. It depends a lot on when I get done today.

    Thank you for the question. The answer might also explain the difference between RBAR and necessary RBR.

    Looking forward to it!

    Here was my first situation that started me looking at T-SQL itself. I was initially interested in the results because I had seen some poor performance with RBAR looking T-SQL so I drilled down and had a go at the language itself independent of data:

    http://www.sqlservercentral.com/Forums/FindPost1344895.aspx

    and I remember it didn't get rave reviews in that thread either, one of my favorite replies:

    http://www.sqlservercentral.com/Forums/FindPost1345055.aspx

    You can't separate the data from t-sql though since it's a a language specifically designed to work with data in a sql server engine. Ignoring data manipulation it's expected that it would be slower than a procedural language.

    And regardless rbar is a separate issue than language performance in a loop.

    Well thats what I was wondering, then and now. What makes RBAR so slow if its not the language? If its not the language, then cursors should be fast because the first thing done in a cursor is to build a set based query to "fetch" from and then it just loops on the rows fed to it. But I'm all the time reading that cursors are not a good thing performancewise.

  • Charles Kincaid (12/9/2013)


    So let's take an example of a SELECT that does not make any use of indexes.

    RBAR fetches more rows from the table. False. It has to get the data from somewhere.

    Just iterating over records in a set of records is not RBAR, even with in set based solutions this happens internally. What is however is, is that you either:

    1. explicitly coded a loop around your query, which means many times the set processing set-up costs and much more intense use of data structures and memory.

    2. per read record you decide which other records you are going to fetch (like a naive join you see beginning programmers often do from within the application layer, because they don't know joins yet).

    It isn't hard to see how using SQL these ways will cause headaches with respect to performance. They also scale very badly as the data sets get larger.

    More hidden are some uses of correlated sub-queries where per record you execute a sub-select to get a sum of all preceding records. Here too it becomes obvious about how inefficient this must work, without special language support like the recent improved windowing functions. Executed naively such a solution will create a mega storm of IO and duplicate processing.

    Charles Kincaid (12/9/2013)


    So let's take an example of a SELECT that does not make any use of indexes.

    RBAR increases memory pressure inside the application. False. I can open the query in such a way that my application only sees one row at a time. Very low memory consumption for my application either way.

    As noted before that is not RBAR per see, even tho most of the times code does this, it will be (I haven't used a cursor in 8 or so years now). It is likely to cause inefficient use of CPU, Disk IO and Network IO and also consume more memory then you think. While your cursor is open, you hold locks on the source table and if you want to mitigate this, you need to make a copy of the result set first with the appropriate cursor options! Lastly if you fetch row by row, you likely generate a bit more network traffic and inevitably claim server resources for longer (buffering mitigates this of course).

    Do not even think about programming a second select for each of the records you fetch with the first, man that is like setting off a NUKE!

  • peter-757102 (12/9/2013)


    Charles Kincaid (12/9/2013)


    So let's take an example of a SELECT that does not make any use of indexes.

    RBAR fetches more rows from the table. False. It has to get the data from somewhere.

    Just iterating over records in a set of records is not RBAR, even with in set based solutions this happens internally.

    Now this is interesting, because doing the running totals thing would be awesome in a cursor! Would a cursor based solution beat the triangular join method? Would a cursor based solution beat a quirky update? Which of the three methods are set oriented versus RBAR?

  • patrickmcginnis59 10839 (12/9/2013)


    Now this is interesting, because doing the running totals thing would be awesome in a cursor! Would a cursor based solution beat the triangular join method? Would a cursor based solution beat a quirky update? Which of the three methods are set oriented versus RBAR?

    well no. Asking SQL to do aggregates like SUM() is better than a cursor. The cursor based thing you have to set up a variable, initialize it (could be one step), loop through every row, add to the variable, repeat ad nauseam. To have SQL do that internally in a built in function is better and should perform better too. Doing it yourself is reinventing the wheel and its an already very good wheel at that.

    ATBCharles Kincaid

  • peter-757102 (12/9/2013)


    2. per read record you decide which other records you are going to fetch (like a naive join you see beginning programmers often do from within the application layer, because they don't know joins yet).

    We saw that exact behavior done by an Entity Framework solution that was not set up correctly. It was asked to produce a list of containers that showed the container type. Of course we are normalized so there is a container type table with the specs for each type (and the type name) and a container table with a container type ID. We wondered why this thing was so slow. The trace showed that it got a list of containers and then selected the container type table for each and every one of them. We fixed the EF stuff so that it did not work that way and stayed queryable until we were ready to render and now it does a proper join. (OK, I'll confess. It was way worse that this. It actually did that on two other tables beside the type table.

    Do not even think about programming a second select for each of the records you fetch with the first, man that is like setting off a NUKE!

    Right you are.

    ATBCharles Kincaid

  • Charles Kincaid (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Now this is interesting, because doing the running totals thing would be awesome in a cursor! Would a cursor based solution beat the triangular join method? Would a cursor based solution beat a quirky update? Which of the three methods are set oriented versus RBAR?

    well no. Asking SQL to do aggregates like SUM() is better than a cursor. The cursor based thing you have to set up a variable, initialize it (could be one step), loop through every row, add to the variable, repeat ad nauseam. To have SQL do that internally in a built in function is better and should perform better too. Doing it yourself is reinventing the wheel and its an already very good wheel at that.

    True, but I was thinking about the question applying before the windowing functions could do it, I think windowing functions can do running totals in version 2012. I did find someone posting a comparison of some methods:

    http://stackoverflow.com/questions/1153879/how-do-i-calculate-a-running-total-in-sql-without-using-a-cursor

  • patrickmcginnis59 10839 (12/9/2013)


    ZZartin (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I didn't see any explanation on why RBAR is so slow, so rated this article 1 star.

    This is how RBAR comes to be in systems. Many people just don't know and so they lean towards what they do. It gets the job done at the expense of future scalability and rework.

    I'm interested in why RBAR is so slow in SQL systems, and some of the following psychology that develops around this is pretty interesting too. I've seen stuff on the net like "this non-T-SQL program is so slow, I bet it uses RBAR inside".

    I think its getting to be on the level of those cargo-cult societies.

    edit: adjusted cargo-cult sentence 🙂

    Hi Patrick,

    I'm still neck deep in it at work but I did have the time to look at some of the attempts at explaining why RBAR is slower. I actually have a physical example that you'll be able to look at and, after quaffing a mighty and well deserved ale after the nightmare known as "today", I'll try to get it to you. It depends a lot on when I get done today.

    Thank you for the question. The answer might also explain the difference between RBAR and necessary RBR.

    Looking forward to it!

    Here was my first situation that started me looking at T-SQL itself. I was initially interested in the results because I had seen some poor performance with RBAR looking T-SQL so I drilled down and had a go at the language itself independent of data:

    http://www.sqlservercentral.com/Forums/FindPost1344895.aspx

    and I remember it didn't get rave reviews in that thread either, one of my favorite replies:

    http://www.sqlservercentral.com/Forums/FindPost1345055.aspx

    You can't separate the data from t-sql though since it's a a language specifically designed to work with data in a sql server engine. Ignoring data manipulation it's expected that it would be slower than a procedural language.

    And regardless rbar is a separate issue than language performance in a loop.

    Well thats what I was wondering, then and now. What makes RBAR so slow if its not the language? If its not the language, then s should be fast because the first thing done in a cursor is to build a set based query to "fetch" from and then it just loops on the rows fed to it. But I'm all the time reading that cursors are not a good thing performancewise.

    Think about it like this, if you have a stack of papers on your desk what's faster? Going through one by one and each time you want throw one away getting up throwing it away then starting over? Or going through them all getting all the ones you want gone and making one trip to the trash can?

    Without even talking about record locking or any other issues with cursors just the increase in overhead on the sql engine makes rbar a slower choice.

  • ZZartin (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    ZZartin (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Jeff Moden (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    I didn't see any explanation on why RBAR is so slow, so rated this article 1 star.

    This is how RBAR comes to be in systems. Many people just don't know and so they lean towards what they do. It gets the job done at the expense of future scalability and rework.

    I'm interested in why RBAR is so slow in SQL systems, and some of the following psychology that develops around this is pretty interesting too. I've seen stuff on the net like "this non-T-SQL program is so slow, I bet it uses RBAR inside".

    I think its getting to be on the level of those cargo-cult societies.

    edit: adjusted cargo-cult sentence 🙂

    Hi Patrick,

    I'm still neck deep in it at work but I did have the time to look at some of the attempts at explaining why RBAR is slower. I actually have a physical example that you'll be able to look at and, after quaffing a mighty and well deserved ale after the nightmare known as "today", I'll try to get it to you. It depends a lot on when I get done today.

    Thank you for the question. The answer might also explain the difference between RBAR and necessary RBR.

    Looking forward to it!

    Here was my first situation that started me looking at T-SQL itself. I was initially interested in the results because I had seen some poor performance with RBAR looking T-SQL so I drilled down and had a go at the language itself independent of data:

    http://www.sqlservercentral.com/Forums/FindPost1344895.aspx

    and I remember it didn't get rave reviews in that thread either, one of my favorite replies:

    http://www.sqlservercentral.com/Forums/FindPost1345055.aspx

    You can't separate the data from t-sql though since it's a a language specifically designed to work with data in a sql server engine. Ignoring data manipulation it's expected that it would be slower than a procedural language.

    And regardless rbar is a separate issue than language performance in a loop.

    Well thats what I was wondering, then and now. What makes RBAR so slow if its not the language? If its not the language, then s should be fast because the first thing done in a cursor is to build a set based query to "fetch" from and then it just loops on the rows fed to it. But I'm all the time reading that cursors are not a good thing performancewise.

    Think about it like this, if you have a stack of papers on your desk what's faster? Going through one by one and each time you want throw one away getting up throwing it away then starting over? Or going through them all getting all the ones you want gone and making one trip to the trash can?

    Without even talking about record locking or any other issues with cursors just the increase in overhead on the sql engine makes rbar a slower choice.

    Just to make sure I have the analogy correct, which data operation do I equate with the walk to the trash can?

  • patrickmcginnis59 10839 (12/9/2013)


    Just to make sure I have the analogy correct, which data operation do I equate with the walk to the trash can?

    The Commit Transaction step



    ----------------
    Jim P.

    A little bit of this and a little byte of that can cause bloatware.

  • Jim P. (12/9/2013)


    patrickmcginnis59 10839 (12/9/2013)


    Just to make sure I have the analogy correct, which data operation do I equate with the walk to the trash can?

    The Commit Transaction step

    No penalty as long as begin tran and commit are outside the loops! I like it!

  • Here's my non-technical view of RBAR vs Set

    Each process/request requires sending a request to the software/hardware/person which then has to understand the request, convert it into their understanding, plan it, and then go about executing it.

    The time savings of batch-based actions come in when you can do more tasks but keep the request send phase at the same amount or increasing it by only a small amount. So if a request takes 5 seconds to do per item and 1 second to understand beforehand then doing 100 items in a per item basis will take 100*(5 +1 ) = 600 seconds, whereas sending the request in the batch of 100 might up the understanding time to say 20 seconds but this would still be 100*(5 ) + 20 = 520 seconds.

    This is not a SQL only concept and works for all actions including manual ones. You leverage the difference between two changing series values - records (n) and preparation (kn) - such that that if k<1 then you will always benefit from a batch process over an iterative process.

    Edit: For SQL - typically there will also be a constant on records (cn) that means the processing time per record also decreases since it can leverage set-based prgramming actions. i.e. t=cn +kn where k<c<1 is how we get a scalable query and the smaller k and c are the better and/or having an absolute preparation value e.g. from a saved execution plan.

  • @patrickmcginnis59 10839,

    Just so you don't think I abandoned you, I didn't get off from work until 8PM last night. I worked on the answer to you (and everyone else... too many people are coming up with their own definitions of what RBAR... since I'm the guy that coined the phrase, I kinda reserve the right to define it 😉 ) for a couple of hours in an "Alice's Restaurant" fashion because, with what you've gone through on this and other threads, I believe you absolutely deserve a good answer instead of just comparisons to other things. It actually might make a good lead in to an article.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 61 through 75 (of 116 total)

You must be logged in to reply to this topic. Login to reply