July 10, 2008 at 6:13 am
I have created two tables and entered the data as shown below.
Create table tablea(col1 varchar(10), col2 varchar(10), col3 varchar(10))
Create table tableb(col1 varchar(10), col2 varchar(10), col3 varchar(10))
insert into tablea
select 'a1','a2','a3'
union
select 'b1','b2','b3'
union
select 'd1','d2','d3'
union
select 'e1','e2','e3'
insert into tableb
select 'a1','a2','a3'
select 'a1','c2','c3'
select 'b11','d2','d3'
select 'b12','b2','b3'
The below query provides me the required output.
select a.*,b.col1, c.col2, d.col3 from tablea a
left outer join tableb b on a.col1=b.col1
left outer join tableb c on a.col2 = c.col2
left outer join tableb d on a.col3=d.col3
Can the above query be written in a different manner by restricting to a single join instead of multiple joins with tableb (may be with group by) ? Is it possible.
Thanks in advance.
July 10, 2008 at 7:40 am
So, you've got two tables that have three different and distinct relationships between the tables? I'm pretty sure that violates one of the rules of normalization, but I'm not sure which one. I'd be curious what business rule you're trying to meet with that design.
I tried this:
select a.*,b.col1, b.col2, b.col3
from tablea a
left join tableb b
on b.col1 = a.col1
or b.col2=a.col2
or b.col3 = a.col3
It gets the same data back and based on my tests took the query from 272ms to 80ms because instead of 3 scans it only does 1. However, I wouldn't want to try to scale this solution.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 11, 2008 at 2:12 am
There is a difference of two rows to the main query and the updated query when i used EXCEPT.
This query is an existing query and as it contains millions of records, i was looking for optimization of this query.
Any ideas?
July 11, 2008 at 6:35 am
Well, eliminating all the excess JOINS does improve performance threefold in this example, so I think you can reasonably expect the something similar with larger data sets.
Because of the OR clause though, it's not going to use indexes very well. Just playing with the example, I actually saw the reads increase when I started putting indexes on the table. However, for the example, we're dealing with very small data sets, so indexes won't help much anyway. The one thing I can say is if you have at least one column that you join on that will be steady, that you can index, you can probably get decent performance out of this. The query would like something like this:
select a.*,b.col1, b.col2, b.col3
from tablea a
left join tableb b
on b.col1 = a.col1
AND (b.col2=a.col2
or b.col3 = a.col3)
Of course, from the example, this would change the result set. The thing is, because this is a violation of database design, it's going to continue to create problems for you however you go. You shouldn't have three independent relationships between tables.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 11, 2008 at 7:07 am
This will probably perform better:
select a.*,b.col1, b.col2, b.col3
from tablea a
inner join tableb b
on a.col1=b.col1
union
select a.*,c.col1, c.col2, c.col3
from tablea a
inner join tableb c
on a.col2 = c.col2
union
select a.*,d.col1, d.col2, d.col3
from tablea a
inner join tableb d
on a.col3=d.col3
It depends on the indexes available. But, assuming your joins have indexes on them (pretty standard practice), this will allow index seeks, instead of index scans.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply