A few reflections on RBAR by a weary application developer

Question

A few reflections on RBAR by a weary application developer

Viewing 15 posts - 31 through 45 (of 116 total)

You must be logged in to reply to this topic. Login to reply

john.moreno Default port Points: 1485 More actions · Answer 1

patrickmcginnis59 10839 (12/9/2013)
I'm interested in why RBAR is so slow in SQL systems, and some of the following psychology that develops around this is pretty interesting too. I've seen stuff on the net like "this non-T-SQL program is so slow, I bet it uses RBAR inside".
I think its getting to be on the level of those cargo-cult societies.

RBAR is slow for the same reason a bubble sort is slow -- it does expoentially more work than is necessary.

Because RBAR is really nothing more than a particular type of inefficient processing, you **can** have it on non-sql code. The most common type of RBAR in my experience (cursors using stored procedures) are really nothing more than making an expensive calculation (the joins in the stored procedure) and then throwing the result away so it can be done all over again.

The main difference between non-sql RBAR and sql RBAR is that the sql version has a nice name to identify it.

PHYData DBA SSCertifiable Points: 7541 More actions · Answer 2

Solomon Rutzky (12/9/2013)
EDIT:
I would add that there is no parallel to cargo-cult societies given that it is scientifically provable that row-by-row processing is nearly always slower (within the realm of RDBMS's) than set-based processing (given that the operation can be handled in a set-based approach).

Totally agree with this. Would like to add that this is SET math, and has always been more efficient than value based math. Rocket science or Philosophy is not needed to determine if is slower to execute the same code one million times or one thousand times.

SET math is.. well maths. It has been maths since Bolzano published "The Paradoxes of the Infinite".

Executing an operation on a SET instead of inspecting and then updating each individual value in a Super SET is faster. Its a Fact! 😎

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 3

mtassin (12/9/2013)
Koen Verbeeck (12/9/2013)
Nice article.
One remark though: using SSIS doesn't necessarily mean RBAR.
You can stream the data into a bulk load just fine (as fast as the join) and still find the erroneous row.
Big +1 here

I second that in a BIG way as well.

I'll also add that in terms of row-by-row processing:

The offense is language-agnostic. The same sins that can be committed in T-SQL can also be committed in SSIS, C#, PowerShell, etc. Similarly, the good deeds that can be accomplished with T-SQL can also be accomplished in SSIS, C#, PowerShell, etc.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

PHYData DBA SSCertifiable Points: 7541 More actions · Answer 4

patrickmcginnis59 10839 (12/9/2013)
I think its getting to be on the level of those cargo-cult societies.

I think you are using words and phrases you do not understand. Cause you seem to be saying that you don't know what RBAR or Cargo Cult logic is.

Definitions from Wikipedia -

Cargo cult programming: a style of computer programming that is characterized by the ritual inclusion of code or program structures that serve no real purpose.

Cargo cult science: refers to practices that have the semblance of being scientific, but do not in fact follow the scientific method.

Cargo Cult society: A kind of Melanesian millenarian movement encompassing a diverse range of practices and occurring in the wake of contact with the commercial networks of colonizing societies. The name derives from the apparent belief that various ritualistic acts will lead to a bestowing of material wealth

Identifying and removing RBAR is the opposite of Cargo Cult anything.

It is Gods work and is one of the few things that can be easily identified and updated in an inefficient set of code to optimize it quickly and easily.

RBAR is just a cute way to say terrible logic structure that does not use good programing practices to create the end result. Of course there are those out it the IT world that would hire one million rats to jump up and down on a keyboard a hundred million times instead of hiring one tech writer to create a new and unique document.

Andy DBA SSCommitted Points: 1776 More actions · Answer 5

GeorgeCopeland (12/8/2013)
But here is the deal: it processes until it abends. It fails on the offending record and writes out a meaningful error message. After the problem is fixed, the package can be safely restarted.

Hurray! It failed on the FIRST offending record and flagged it for repair. But what if the input data has more than one offending record? :w00t: The RBAR is "safely" restarted and agonizingly grinds away for another eternity until it abends on the next offender which must also be scrutinized and fixed. Then, with fingers crossed, the "weakling developer" once again fires up this marvelous new invention as visions of a relaxing evening possibly involving the quaffing of ale slowly begin to fade away...

A set based approach can flag all offending records at once. If needed, it can perform a myriad of increasingly specific checks on "surviving" records in the same run and generate a series of reports ranging in complexity. All of which can be presented at the same time to the data originator for repair.

Consider a job that needs to import fixed-length-record text files from multiple sources. The first check simply flags input records with the wrong number of characters by bulk inserting into a simple temp table that is nothing but an identity column and one large varchar field. The len() function is used on the varchar field to report row numbers with the wrong length and to select from records with the right length into the next temp table containing a varchar field for each expected column (using substr). Those varchar fields are checked for correct data types and the "survivor" records are loaded into the next temp table containing fields of the desired data types (and maybe even an index or two!). Other more complicated data validation checks can then be made such as range checking, comparisons between input columns, and even comparisons with related data in other tables. Many quality assurance reports can be produced and hopefully all errors fixed in one iteration versus finding and fixing one erroneous record per submission.

x SSC-Insane Points: 23660 More actions · Answer 6

PHYData DBA (12/9/2013)
patrickmcginnis59 10839 (12/9/2013)
I think its getting to be on the level of those cargo-cult societies.
I think you are using words and phrases you do not understand. Cause you seem to be saying that you don't know what RBAR or Cargo Cult logic is.
Definitions from Wikipedia -
Cargo cult programming: a style of computer programming that is characterized by the ritual inclusion of code or program structures that serve no real purpose.
Cargo cult science: refers to practices that have the semblance of being scientific, but do not in fact follow the scientific method.
Cargo Cult society: A kind of Melanesian millenarian movement encompassing a diverse range of practices and occurring in the wake of contact with the commercial networks of colonizing societies. The name derives from the apparent belief that various ritualistic acts will lead to a bestowing of material wealth

I was using the third definition, if it matters.

Identifying and removing RBAR is the opposite of Cargo Cult anything.
It is Gods work and is one of the few things that can be easily identified and updated in an inefficient set of code to optimize it quickly and easily.
RBAR is just a cute way to say terrible logic structure that does not use good programing practices to create the end result. Of course there are those out it the IT world that would hire one million rats to jump up and down on a keyboard a hundred million times instead of hiring one tech writer to create a new and unique document.

I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.

edit: fixed quote tag

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 7

patrickmcginnis59 10839 (12/9/2013)
I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.

I believe I addressed this (for the most part) when I mentioned ACID compliance overhead in a post from a page or two ago in this thread. Meaning: the locking, unlocking, managing of contention on shared resources, transaction log writing, etc.

Take care,

Solomon...

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

x SSC-Insane Points: 23660 More actions · Answer 8

Solomon Rutzky (12/9/2013)
patrickmcginnis59 10839 (12/9/2013)
I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.
I believe I addressed this (for the most part) when I mentioned ACID compliance overhead in a post from a page or two ago in this thread. Meaning: the locking, unlocking, managing of contention on shared resources, transaction log writing, etc.

So should I expect a T-SQL loop with NO data access then to be really really fast?

MarbryHardin Old Hand Points: 363 More actions · Answer 9

One of the somewhat insidious problems with undetected or uncontrolled use of RBAR is that it may often work fine. At first.

As both the load and number of records silently increase the technique can cause previously unseen errors and bring the server to its knees. And of course that can potentially cause issues with unrelated applications that happen to be on the same instance.

It's the opposite of "scaling", whatever that might be called. If this isn't detected and corrected before it constitutes a large portion of the code base you're then looking at a significant effort to correct it.

I've fought the fight to wean people off cursors, it can be a hard battle. But when you have someone that writes a cursor to do a simple insert of a single literal value they need to admit they have problem.

sneumersky SSCertifiable Points: 7667 More actions · Answer 10

Completely agree. Most of the time SSIS is working "Buffer by Buffer" and NOT RBAR. Moreover, while I cannot speak for Jeff, the anti-RBAR master 🙂 , RBAR ceases to apply to most SSIS operations--which are done in memory.

Of course, an exception would be an OLEDB Cmd Destination Component performing an insert for each row in the SSIS pipeline.

However, I definitely understand the author's point. The error row routing functionalilty of a data flow component is very powerful and useful when properly utilized. As a matter of fact, it may be one of the most underutilized features I see absent in many packages.

In short and in my opinion:

RBAR generally applies to disk operations in an RDBMS.

RBAR generally does NOT apply to in memory operations in SSIS for at least two reasons:

1. Memory is blazingly faster than disk (usually).

2. Most of the SSIS operations are BBIB (Buffer By Invaluable Buffer)--pronounced BEE-BIB! 😎

Steve Neumersky

rstone SSCertifiable Points: 6213 More actions · Answer 11

I tend to always assume "it depends" when it comes to SQL Server. If it appears it does not, then I assume I'm missing something. We have existing applications that are designed and tested to insure locking is not an issue. If a newer system needs access to the existing system, set based queries might cause issues such as deadlocks. (The original design did not account for the integration.) I think RBAR is an option to avoid locking too many resources at once to avoid a costly re-design. (A business decision to not re-design can be valid.)

On the other hand, we have some crazy RBAR abuse - nested procedures with nested cursors. So perhaps a "No RBAR!" mallet is needed in some cases. However, I don't think it's fair to say RBAR is never to be used. Like de-normalization, it's not the place to start.

RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

sneumersky SSCertifiable Points: 7667 More actions · Answer 12

Koen Verbeeck (12/9/2013)
Nice article.
One remark though: using SSIS doesn't necessarily mean RBAR.
You can stream the data into a bulk load just fine (as fast as the join) and still find the erroneous row.

Completely agree. Most of the time SSIS is working "Buffer by Buffer" and NOT RBAR. Moreover, while I cannot speak for Jeff, the anti-RBAR master 🙂 , RBAR ceases to apply to most SSIS operations--which are done in memory.

Of course, an exception would be an OLEDB Cmd Destination Component performing an insert for each row in the SSIS pipeline.

However, I definitely understand the author's point. The error row routing functionalilty of a data flow component is very powerful and useful when properly utilized. As a matter of fact, it may be one of the most underutilized features I see absent in many packages.

In short and in my opinion:

RBAR generally applies to disk operations in an RDBMS.

RBAR generally does NOT apply to in memory operations in SSIS for at least two reasons:

1. Memory is blazingly faster than disk (usually).

2. Most of the SSIS operations are BBIB (Buffer By Invaluable Buffer)--pronounced BEE-BIB!

Steve Neumersky

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 13

patrickmcginnis59 10839 (12/9/2013)
Solomon Rutzky (12/9/2013)
patrickmcginnis59 10839 (12/9/2013)
I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.
I believe I addressed this (for the most part) when I mentioned ACID compliance overhead in a post from a page or two ago in this thread. Meaning: the locking, unlocking, managing of contention on shared resources, transaction log writing, etc.
So should I expect a T-SQL loop with NO data access then to be really really fast?

"Fast" is a relative term so what are you comparing it to? I was addressing the issue in terms of the most common type of operations to be done in a database; sometimes you have functions that don't interact with the system but the vast majority of database code interacts with the database.

Outside of that, I thought non-DB interaction was at least somewhat addressed by PHYData DBA who mentioned that RDBMS's are tuned for a particular type of work. I don't think it should be expected that databases do certain things as fast as C# or vice-versa. RDBMS's are tuned specifically to work on large sets of data: sorting, searching, matching 2 or more sets (i.e. JOINs), aggregations, etc. They various optimizations, nuances, requirements, etc of an RDBMS are all working towards that goal. This is not the goal of application programming languages so there should be no expectation that they perform the same for the same types of operations. You also need to consider that T-SQL is an interpreted language and will not be as fast as C#, just like C# / VB.Net running on the CLR are not going to be as fast as C++.

Take care,

Solomon...

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

Ken SSCoach Points: 17191 More actions · Answer 14

“Skoob Enilno” - Love it!

After reading this post and Jeff's reply, I have a sudden urge to read Tolkien.

Thanks to all. A very entertaining read.

Ken

x SSC-Insane Points: 23660 More actions · Answer 15

Solomon Rutzky (12/9/2013)
patrickmcginnis59 10839 (12/9/2013)
Solomon Rutzky (12/9/2013)
patrickmcginnis59 10839 (12/9/2013)
I'm not saying there isn't a penalty for RBAR in T-SQL. Rather, I think it would be useful to know why the penalty is so drastic.
I believe I addressed this (for the most part) when I mentioned ACID compliance overhead in a post from a page or two ago in this thread. Meaning: the locking, unlocking, managing of contention on shared resources, transaction log writing, etc.
So should I expect a T-SQL loop with NO data access then to be really really fast?
"Fast" is a relative term so what are you comparing it to?

Well when I did comparisons, it was to pretty much any of the languages I had access to, and T-SQL was dog slow even with no data access. I mean orders of magnitude slow.