January 10, 2016 at 10:52 pm
Sure, with the same general idea.
t1 INNER JOIN t2 on t1.Col1 = t2.Col2 OR t1.Col1 = t2.Col3 OR t1.Col1 = T2.Col4
Don't know what the performance will be, but I can't imagine good.
And you probably want to look at normalising that table.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 11, 2016 at 2:37 pm
GilaMonster (1/10/2016)
Sure, with the same general idea.t1 INNER JOIN t2 on t1.Col1 = t2.Col2 OR t1.Col1 = t2.Col3 OR t1.Col1 = T2.Col4
Don't know what the performance will be, but I can't imagine good.
And you probably want to look at normalising that table.
I didn't test it, Gail, but I expect that this will generate the same plan as my original suggestion with EXISTS and an IN list in the condition. Which Eirikur has already demonstrated to be the slowest of the bunch. :blush:
January 11, 2016 at 10:26 pm
Hugo Kornelis (1/11/2016)
GilaMonster (1/10/2016)
Sure, with the same general idea.t1 INNER JOIN t2 on t1.Col1 = t2.Col2 OR t1.Col1 = t2.Col3 OR t1.Col1 = T2.Col4
Don't know what the performance will be, but I can't imagine good.
And you probably want to look at normalising that table.
I didn't test it, Gail, but I expect that this will generate the same plan as my original suggestion with EXISTS and an IN list in the condition. Which Eirikur has already demonstrated to be the slowest of the bunch. :blush:
The inner join method is significantly faster than EXISTS IN but still around 10 slower than the unpivot method. The problem with the EXISTS IN is partly the cardinality estimation and the choice of operators implemented, the way the server does this is simply not scalable, for every row in the first set it scans the second set and for every row scanned in the second set it scans the second set or Y x Z^2. We would assume that we are matching a set of 1000 rows to X number of columns in another set of 1000 that at the most we would have 1000 against X times 1000 but in reality its 1000 against 1000000.
😎
January 12, 2016 at 4:16 pm
Eirikur Eiriksson (1/11/2016)
Hugo Kornelis (1/11/2016)
GilaMonster (1/10/2016)
Sure, with the same general idea.t1 INNER JOIN t2 on t1.Col1 = t2.Col2 OR t1.Col1 = t2.Col3 OR t1.Col1 = T2.Col4
Don't know what the performance will be, but I can't imagine good.
And you probably want to look at normalising that table.
I didn't test it, Gail, but I expect that this will generate the same plan as my original suggestion with EXISTS and an IN list in the condition. Which Eirikur has already demonstrated to be the slowest of the bunch. :blush:
The inner join method is significantly faster than EXISTS IN but still around 10 slower than the unpivot method. The problem with the EXISTS IN is partly the cardinality estimation and the choice of operators implemented, the way the server does this is simply not scalable, for every row in the first set it scans the second set and for every row scanned in the second set it scans the second set or Y x Z^2. We would assume that we are matching a set of 1000 rows to X number of columns in another set of 1000 that at the most we would have 1000 against X times 1000 but in reality its 1000 against 1000000.
😎
Thus if we have 10 rows for tables 1 and 2 each .... and table two has three columns to consider for one column from table1 -->
then it will scan 10 * ( 10* 10* 10) = 10* 1,000 = 10,000 times ?
Where as the unpivot method will only scan 10*10 = 100 times?
Would there be any difference if there was only one column to consider from table 2? I've not done any testing here but I am curious now.
----------------------------------------------------
January 12, 2016 at 8:47 pm
MMartin1 (1/12/2016)
Thus if we have 10 rows for tables 1 and 2 each .... and table two has three columns to consider for one column from table1 -->then it will scan 10 * ( 10* 10* 10) = 10* 1,000 = 10,000 times ?
Where as the unpivot method will only scan 10*10 = 100 times?
The unpivot method will only scan each table once, it uses a combination of a constant scan and a nested loop to provide the stream of [Num Cols] * [Rows] into an Inner Join for matching. This means that the overhead is the increased cardinality by [Num Cols] multiplication otherwise the plan looks like a single column inner join.
😎
Would there be any difference if there was only one column to consider from table 2? I've not done any testing here but I am curious now.
If there is only one column to consider from table 2 then there is no problem, either join on the column or use WHERE IN / EXISTS, the latter two will do exactly the same in this case.
January 14, 2016 at 9:47 pm
Thanks Guys do you mean something like this, I'm trying to add t2.[Col A] to the first select query. Looks like running into error..
SELECT
t1.EmailAddress,t1.FirstName,t1.LastName,t2.[Col A]
FROM #test_table_1 AS t1
Join #test_table_2 AS t2 on t1.EmailAddress = t2.Email1 OR t1.EmailAddress = t2.Email1 OR t1.EmailAddress = T2.Email2
WHERE EmailAddress IN
(SELECT t2.EmailAddress
FROM #test_table_2 AS t2)
OR EmailAddress IN
(SELECT t2.Email1
FROM #test_table_2 AS t2)
OR EmailAddress IN
(SELECT t2.Email2
FROM #test_table_2 AS t2);
January 15, 2016 at 5:21 am
VegasL (1/14/2016)
Thanks Guys do you mean something like this
No, we mean it exactly as we wrote it. What you posted is a sort of random mish-mash of things we posted and things in your original code.
Plus you added a reference to a column that is not even in the repro code you posted.
Also, "Looks like running into error" is way too vague for us to offer any help. What is the error message you receive?
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply