Hidden RBAR: Triangular Joins

Question

Post reply

Hidden RBAR: Triangular Joins

Viewing 15 posts - 76 through 90 (of 258 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

Stealing my thunder Gila! I already posted on that yesterday at 8:54. 🙂

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 2

GilaMonster (12/8/2007)
An index on a column does not guarantee that data will be returned in that order. Unless an order by is specified, you cannot be sure that the data will always be returned in the order you want.
An index makes it likely, but not certain.
There's a few things that, even with an appropriate index, can make data return 'out of order'.
The storage engine selecting an allocation order scan to fetch the data, rather than an index order scan. If the scan on the exec plan shows Ordered false, then you've had one of these. Not likely except in read uncommitted isolation
Merry-go-round scan. Only on enterprise. The scan stared in the middle of the table, reads to the end, goes back to the beginning and reads the rest.
Parallelism.
In future editions, there may be even more.

Have you ever managed to get that inner loop technique to actually fail? because with the indexing hint - I have yet to see that happen. I understand it's not officially endorsed, that the next version might not allow it ,etc... but from what I can tell - I have yet to see if NOT use the indexing hint. Parallelism or no.

Just curious.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

corey lawson Hall of Fame Points: 3773 More actions · Answer 3

Or...

select t1.c1, t1.c2, ...

from t1 join t2 on t1.c1 <= t2.c1

...

This has the potential to blow up, but it does make some date/time-related queries possible whereas putting the inequality in the where clause doesn't quite work out as expected in all cases. The two methods aren't exactly commutative. Sorry I can't concoct a good enough example right now...

From what it seems to me, having the inequality (it's a "theta-join") in the join expression seems to work much faster than as part of a where expression.

Charles Kincaid SSChampion Points: 13593 More actions · Answer 4

The following aside is off-topic, sort of:

I was once quoted the following paradox:

Given set A, which is a set of all sets that do not contain themselves as proper sub sets, is set A in set A?

The mathematician that gave me that was a former coworker. I think that he got his doctorate on that. He said that it was the only true paradox.

ATBCharles Kincaid

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 5

Heh... only if Set A has an alias of Set B 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

TheSQLGuru (12/8/2007)
Stealing my thunder Gila! I already posted on that yesterday at 8:54. 🙂

I saw, I thought it was worth saying again in stronger terms.

Besides, isn't imitation the strongest form of flattery? 😀 😉

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

Hey, thanks for the flattery!! hehehe

You are correct though - seems that a LOT of people out there have misconceptions about sets and ordering.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

cs_troyk SSCertifiable Points: 5383 More actions · Answer 8

Matt Miller (12/7/2007)
cs_troyk (12/7/2007)
It is a computational model that says nothing of the physical implementation.
The one that doesn't say anything of the physical implementation is the Mathematical model. While the computational model doesn't SPECIFICALLY describe the inner workings of a set engine, there are a lot of constraints and assumptions built into that which do help to define what is and what isn't a set processor. Even Alan Turing (as I recall) used the concept of a set processor "black box", but it had the constraint that the atomic operation being performed could be done to set items within the set in any order, and is done to multiple items at the SAME TIME. so - no specifics, but definitions nonetheless. Call it a generic class definition if you wish.
You are right that we seem to be talking at cross purposes. For one I'm talking about "set-based processing", and you seem to be talking about the ivory tower call set-based PROGRAMMING. One's theory, the other one's applied theory otherwise known as practice. As another poster's tagline eloguently puts "In theory - practice and theory are the same thing. In practice - they're not".
Now I'll admit right here and now that I have at times used the terms interchangeably, because "set-based programming applied to the appropriate set-based engine to produce set-based processing" is just too much set-based. If that's what you're talking about, then 'tis true, and I'm sorry about that. But that's what we're talking about.
That being said - I like to be accurate. If you have a precise, complete definition of set-based programming that we're not following, please - produce it. I don't like to be inaccurate in any way, so I encourage you to set me straight (specifically). It just has to be something that can in some way be applied to practice, which in this case is to the specific set engine we call an RDBMS.
Really - don't give up on this conversation. This is why we're here.

Matt;

You're probably right that we're talking past each other if we consider "set-based processing" vs. "set-based programming", although I doubt that the former actually exists when you get close enough to the metal (at least in the SQL Server implementation of relational).

I'm just trying to point out that the industry has a particular definition in play for "set-based programming", and this is different than what I was getting from the article. I can't point you to a definitive source for a definition, but I think if you google "set-based solution", you will see that it is always used as a means of contrasting one programming approach (set-based) with another--usually cursor [or "loop"] based. In the article, then, the first query is, in fact, a set-based solution... even if it results in a less efficient execution plan.

I'll take this as evidence of the level confusion that products like SQL Server introduce in practicioners due to its poor implementation fidelity with respect to relational (in this case, the intermingling of logical and physical concerns). And, since my goal was to help clear this up, and I'm not sure my contributions are helping with that, I'll quit while I'm behind and simply hope that some are motivated to learn more and think about the implications of logical/physical separation.

Lastly, to reiterate, my opinion is that the article is very good and points out that we need to always be cognizant of what's going on at the physical layer, even when submitting a set-based solution (or single-statement solution, if you prefer).

TroyK

“He who loves practice without theory is like the sailor who boards ship without a rudder and compass and never knows where he may cast.”

Leonardo da Vinci, 1452-1519

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 9

I'm just trying to point out that the industry has a particular definition in play for "set-based programming", and this is different than what I was getting from the article. I can't point you to a definitive source for a definition, but I think if you google "set-based solution", you will see that it is always used as a means of contrasting one programming approach (set-based) with another--usually cursor [or "loop"] based. In the article, then, the first query is, in fact, a set-based solution... even if it results in a less efficient execution plan.

And I'm just trying to point out that the industry's simple definition of "contrasting one programming approach (set-based) with another--usually cursor [or "loop"] based" to explain what set based code is, is wrong.

My contention with that and similar definitions is simple... if you have to touch more internal rows than the count contained in the original set, then it is still not "set-based" even though an explicit loop was not used. Heh, it might be a "set generator" (as I've done frequently with Tally table generation), but it's just not set-based. In fact, set based programming has gotten a bad rap because of things like triangular joins meeting the "look and feel" criteria of the current over-simplified definition of what set based is.

You've hit the nail on the head when you talked about the physical layer... if you don't consider what's going on behind the scenes, what looks like set based code may actually be Hidden RBAR... and, that was the whole point of the article...

I hope they release the article I wrote back on 22 Nov pretty soon... it discusses another bit of "Hidden RBAR" that's given set based programming (code, whatever) a bad rap and it doesn't have even a hint of a triangular or cross join in it. People who are having problems with me saying that triangular joins aren't set based are gonna have to change their britches when they see what else I say isn't set based... with any luck at all, maybe we can get the industry to redefine what set based actually is 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ramesh Saive SSC-Insane Points: 24275 More actions · Answer 10

Matt Miller (12/8/2007)
GilaMonster (12/8/2007)
An index on a column does not guarantee that data will be returned in that order. Unless an order by is specified, you cannot be sure that the data will always be returned in the
order you want.
An index makes it likely, but not certain.
There's a few things that, even with an appropriate index, can make data return 'out of order'.
The storage engine selecting an allocation order scan to fetch the data, rather than an index order scan. If the scan on the exec plan shows Ordered false, then you've had one of
these. Not likely except in read uncommitted isolation
Merry-go-round scan. Only on enterprise. The scan stared in the middle of the table, reads to the end, goes back to the beginning and reads the rest.
Parallelism.
In future editions, there may be even more.
Have you ever managed to get that inner loop technique to actually fail? because with the indexing hint - I have yet to see that happen. I understand it's not officially
endorsed, that the next version might not allow it ,etc... but from what I can tell - I have yet to see if NOT use the indexing hint. Parallelism or no.
Just curious.

Gail,

Technically speaking, whatever you've clarified on index and hints is correct, only because of lack of enough documentation on index hints and their behaviour in queries.

Optionally, One can also enforce query hints like FORCE ORDER & MAXDOP to make the queries more trustable.

Like Matt, me too is as curious as he to find such a case.

In Mathematics, if one cannot prove a statement is true then it termed to be as false. In simple terms, if one says a=b and he cannot prove it then it assumed as a!=b.

--Ramesh

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 11

Proof! Oh heck yeah... I'm working on a bunch of proof right now! 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ChiragNS One Orange Chip Points: 26137 More actions · Answer 12

Ian

"Great article. I agree that there will always be idiots in the world that can only help to make me look good "

what do u mean by this

"Keep Trying"

Calvin Lawson SSChampion Points: 11030 More actions · Answer 13

Wow, if this is an indication of the new quality of articles at SQLServerCentral, color me impressed! Over impressed, I'm kinda wowed.

"if you have to touch more internal rows than the count contained in the original set, then it is still not "set-based" even though an explicit loop was not used. "

Hmmmm.....that is serious food for thought....

I delete dups like this all the time:

delete d

from table k

join table d on k.matching = d.matching

where d.primarykey < k.primarykey

And I could swear this is set based, but maybe it's not?

Signature is NULL

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 14

Ramesh (12/9/2007)
In Mathematics, if one cannot prove a statement is true then it termed to be as false. In simple terms, if one says a=b and he cannot prove it then it assumed as a!=b.

Mmmm... I don't remember learning that in my Mathematics degree. What if you also can't prove that a!=b?

John

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 15

I delete dups like this all the time:
delete d
from table k
join table d on k.matching = d.matching
where d.primarykey < k.primarykey
And I could swear this is set based, but maybe it's not?

Like I said in the article conclusion, triangular joins are great for doing such things as dupe checks... but it creates a pot wad of miniature triangular joins where there are dupes... hard to imagine a triangular join between only two or 3 rows, but it's there and you'll really see it when you get to a place where there are 4 dupes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)