June 2, 2011 at 12:53 pm
I would like to join two tables. If there is no match for one column, I would like to say join with another column. When I use an OR in the ON statement, I get a Cartesian product. In plain English I would like to type in:
SELECT *
FROM T1 INNER JOIN T2
ON T1.Var1=T2.Var1 OR "If there is not match then" T1.Var2=T2.Var1
Does anyone know the code to do this? Is it a CASE statement?
Thank you,
Amy
June 2, 2011 at 1:00 pm
Have you tried a union? It could be slow based the volume of data, but duplicates would be removed.
SELECT *
FROM T1 INNER JOIN T2
ON T1.Var1=T2.Var1
UNION
SELECT *
FROM T1 INNER JOIN T2
ON T1.Var2=T2.Var1
For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
June 2, 2011 at 1:14 pm
I would expect a "conditional join" to perform poorly.
My suggestion would be to join to the table twice
SELECT *
FROM T1
LEFT INNER JOIN T2 ON T1.Var1=T2.Var1
LEFT INNER JOIN T2 T2a ON T1.Var2=T2a.Var1
It is then a simple matter to use CASE statements to return either the T2 or T2a variables based on whether or not T2.Var1 is null.
LEFT joins are used so that the query doesn't omit rows that fail either join condition. If both joins fail, you can filter out any rows where both T2.Var1 and T2a.Var1 are null.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
June 2, 2011 at 2:23 pm
Well, it wasn't the simplest CASE statement ever, but joining the table twice did the trick. thank you.:-D
June 2, 2011 at 2:27 pm
Actually, Coalesce or IsNull would probably handle the Case statement logic in this scenario.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 2, 2011 at 3:29 pm
Gus:
The reason I said use CASE is that you might have cases where the join to T2 succeeds but some other T2.column value is NULL anyway.
In that case, would you want to substitute the value from T2a?
I assume not, but if that assumption is wrong COALESCE or ISNULL would be appropriate.
Amy:
All CASE statements should test against T2.Var1 to see if the join succeeded. Would you post one or two of your actual CASE statements?
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
June 2, 2011 at 4:59 pm
Amy.G (6/2/2011)
I would like to join two tables. If there is no match for one column, I would like to say join with another column. When I use an OR in the ON statement, I get a Cartesian product. In plain English I would like to type in:SELECT *
FROM T1 INNER JOIN T2
ON T1.Var1=T2.Var1 OR "If there is not match then" T1.Var2=T2.Var1
Does anyone know the code to do this? Is it a CASE statement?
Thank you,
Amy
Can you explain why this is not just a simple OR ?
SELECT *
FROM T1 INNER JOIN T2
ON T1.Var1=T2.Var1
OR T1.Var2=T2.Var1
The only way I can see that not fulfilling your requirement is if you never want T1.Var1 = T2.Var1 AND T1.Var2 = T2.Var2. in which case you just add that condition
SELECT *
FROM T1 INNER JOIN T2
ON (T1.Var1=T2.Var1
OR T1.Var2=T2.Var1)
AND NOT (T1.Var1 = T2.Var1 AND T1.Var2 = T2.Var2)
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
June 3, 2011 at 6:41 am
The Dixie Flatline (6/2/2011)
Gus:The reason I said use CASE is that you might have cases where the join to T2 succeeds but some other T2.column value is NULL anyway.
In that case, would you want to substitute the value from T2a?
I assume not, but if that assumption is wrong COALESCE or ISNULL would be appropriate.
Amy:
All CASE statements should test against T2.Var1 to see if the join succeeded. Would you post one or two of your actual CASE statements?
That makes sense. Hadn't thought of it that way.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 3, 2011 at 8:30 am
Can you explain why this is not just a simple OR ?
Asked and answered.
I would expect a "conditional join" to perform poorly.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
June 3, 2011 at 10:00 am
I tested it out. The Double-LEFT join runs over 10 times faster when joining 100,000 rows to 9,000 rows. HOWEVER, the critical difference isn't performance. The Double-LEFT join produced the expected 100,000 rows in the result set. But the conditional joins produced over 180,000 rows in the result set. Apparently the optimizer produces a query plan that returns a row for EACH "OR" or "IN" condition.
create table sourceTable (RowID int identity(1,1) primary key, RowKey_1 int, RowKey_2 int)
--optional
--create nonclustered index IX_Source1 on sourcetable (Rowkey_1)
--create nonclustered index IX_Source2 on sourcetable (Rowkey_2)
create table joinTable (JoinID int identity(1,1) primary key, data varchar (100))
insert into joinTable (data)
select CAST(N as varchar)+' Data blah de blah de blah'
from tally
where N <=9000
declare @X INT = 10000
insert into sourceTable(RowKey_1,RowKey_2)
SELECT TOP 100000
ABS(CHECKSUM(NEWID())) % @X + 1 AS RowKey_1
,ABS(CHECKSUM(NEWID())) % @X + 1 AS RowKey_2
FROM Tally
declare @timer datetime = getdate()
select *
into #temp1
from sourceTable s
join joinTable j on (j.JoinID = s.RowKey_1 or j.JoinID = s.RowKey_2)
select DATEDIFF(ms,@timer,getdate()) as Conditional_Join
set @timer = getdate()
select *
into #temp3
from sourceTable s
join joinTable j on j.JoinID in (s.RowKey_1, s.RowKey_2)
select DATEDIFF(ms,@timer,getdate()) as IN_Join
set @timer = getdate()
select s.*,case when j.JoinID is null then j2.data else j.data end as data
into #temp2
from sourceTable s
left join joinTable j on j.JoinID = s.RowKey_1
left join joinTable j2 on j2.JoinID = s.RowKey_2
select DATEDIFF(ms,@timer,getdate()) as Double_Left_Join
select COUNT(*) as Conditional_Join from #temp1
select COUNT(*) as IN_Join from #temp3
select COUNT(*) as Double_Left_Join from #temp2
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply