Joining 2 Tables with Or condition

Question

Post reply

Joining 2 Tables with Or condition

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Sure, with the same general idea.

t1 INNER JOIN t2 on t1.Col1 = t2.Col2 OR t1.Col1 = t2.Col3 OR t1.Col1 = T2.Col4

Don't know what the performance will be, but I can't imagine good.

And you probably want to look at normalising that table.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 2

GilaMonster (1/10/2016)
Sure, with the same general idea.
t1 INNER JOIN t2 on t1.Col1 = t2.Col2 OR t1.Col1 = t2.Col3 OR t1.Col1 = T2.Col4
Don't know what the performance will be, but I can't imagine good.
And you probably want to look at normalising that table.

I didn't test it, Gail, but I expect that this will generate the same plan as my original suggestion with EXISTS and an IN list in the condition. Which Eirikur has already demonstrated to be the slowest of the bunch. :blush:

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 3

Hugo Kornelis (1/11/2016)
GilaMonster (1/10/2016)
Sure, with the same general idea.
t1 INNER JOIN t2 on t1.Col1 = t2.Col2 OR t1.Col1 = t2.Col3 OR t1.Col1 = T2.Col4
Don't know what the performance will be, but I can't imagine good.
And you probably want to look at normalising that table.
I didn't test it, Gail, but I expect that this will generate the same plan as my original suggestion with EXISTS and an IN list in the condition. Which Eirikur has already demonstrated to be the slowest of the bunch. :blush:

The inner join method is significantly faster than EXISTS IN but still around 10 slower than the unpivot method. The problem with the EXISTS IN is partly the cardinality estimation and the choice of operators implemented, the way the server does this is simply not scalable, for every row in the first set it scans the second set and for every row scanned in the second set it scans the second set or Y x Z^2. We would assume that we are matching a set of 1000 rows to X number of columns in another set of 1000 that at the most we would have 1000 against X times 1000 but in reality its 1000 against 1000000.

😎

MMartin1 One Orange Chip Points: 27799 More actions · Answer 4

Eirikur Eiriksson (1/11/2016)
Hugo Kornelis (1/11/2016)
GilaMonster (1/10/2016)
Sure, with the same general idea.
t1 INNER JOIN t2 on t1.Col1 = t2.Col2 OR t1.Col1 = t2.Col3 OR t1.Col1 = T2.Col4
Don't know what the performance will be, but I can't imagine good.
And you probably want to look at normalising that table.
I didn't test it, Gail, but I expect that this will generate the same plan as my original suggestion with EXISTS and an IN list in the condition. Which Eirikur has already demonstrated to be the slowest of the bunch. :blush:
The inner join method is significantly faster than EXISTS IN but still around 10 slower than the unpivot method. The problem with the EXISTS IN is partly the cardinality estimation and the choice of operators implemented, the way the server does this is simply not scalable, for every row in the first set it scans the second set and for every row scanned in the second set it scans the second set or Y x Z^2. We would assume that we are matching a set of 1000 rows to X number of columns in another set of 1000 that at the most we would have 1000 against X times 1000 but in reality its 1000 against 1000000.
😎

Thus if we have 10 rows for tables 1 and 2 each .... and table two has three columns to consider for one column from table1 -->

then it will scan 10 * ( 10* 10* 10) = 10* 1,000 = 10,000 times ?

Where as the unpivot method will only scan 10*10 = 100 times?

Would there be any difference if there was only one column to consider from table 2? I've not done any testing here but I am curious now.

----------------------------------------------------

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 5

MMartin1 (1/12/2016)
Thus if we have 10 rows for tables 1 and 2 each .... and table two has three columns to consider for one column from table1 -->
then it will scan 10 * ( 10* 10* 10) = 10* 1,000 = 10,000 times ?
Where as the unpivot method will only scan 10*10 = 100 times?

The unpivot method will only scan each table once, it uses a combination of a constant scan and a nested loop to provide the stream of [Num Cols] * [Rows] into an Inner Join for matching. This means that the overhead is the increased cardinality by [Num Cols] multiplication otherwise the plan looks like a single column inner join.

😎

Would there be any difference if there was only one column to consider from table 2? I've not done any testing here but I am curious now.

If there is only one column to consider from table 2 then there is no problem, either join on the column or use WHERE IN / EXISTS, the latter two will do exactly the same in this case.

VegasL Hall of Fame Points: 3006 More actions · Answer 6

Thanks Guys do you mean something like this, I'm trying to add t2.[Col A] to the first select query. Looks like running into error..

SELECT
t1.EmailAddress,t1.FirstName,t1.LastName,t2.[Col A]
FROM #test_table_1 AS t1
Join #test_table_2 AS t2 on t1.EmailAddress = t2.Email1 OR t1.EmailAddress = t2.Email1 OR t1.EmailAddress = T2.Email2
WHERE EmailAddress IN
(SELECT t2.EmailAddress
FROM #test_table_2 AS t2)
OR EmailAddress IN
(SELECT t2.Email1
FROM #test_table_2 AS t2)
OR EmailAddress IN
(SELECT t2.Email2
FROM #test_table_2 AS t2);

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 7

VegasL (1/14/2016)
Thanks Guys do you mean something like this

No, we mean it exactly as we wrote it. What you posted is a sort of random mish-mash of things we posted and things in your original code.

Plus you added a reference to a column that is not even in the repro code you posted.

Also, "Looks like running into error" is way too vague for us to offer any help. What is the error message you receive?

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/