December 26, 2012 at 8:34 pm
Comments posted to this topic are about the item Outer join with condition
December 26, 2012 at 8:35 pm
Nice QotD and nice explanation..:-)
Thanks..
December 26, 2012 at 10:12 pm
Very good question. Thanks!
~ Lokesh Vij
Link to my Blog Post --> www.SQLPathy.com[/url]
Follow me @Twitter
December 26, 2012 at 11:56 pm
Nice question, thanks.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
December 27, 2012 at 12:23 am
Known concept. But took some time understand the full qstn.. nice one..
--
Dineshbabu
Desire to learn new things..
December 27, 2012 at 1:28 am
This was removed by the editor as SPAM
December 27, 2012 at 5:32 am
Yeah, I'm gonna need someone to draw me a picture for this one. 😉
ron
-----
a haiku...
NULL is not zero
NULL is not an empty string
NULL is the unknown
December 27, 2012 at 6:02 am
ronmoses (12/27/2012)
Yeah, I'm gonna need someone to draw me a picture for this one. 😉ron
I'm bad at pictures, so how about a step-by-step description of the query evaluation?
(Note that this describes the LOGICAL queury evaluation - the physicial processing order may be completely different, as long as the same end result is achieved)
Query evaluation always starts with the FROM clause, and unless there are parentheses or nested joins, always works left to right. So the first step is the join of service and doc_serv. That's a standard inner join, with this internal table as intermediate result:
service_id name price doctor_id service_id
1 serv1 1000 2 1
2 serv2 500 3 2
2 serv2 500 1 2
3 serv3 700 1 3
3 serv3 700 3 3
4 serv4 1200 2 4
The next step is the join with doctors. This is a right outer join, so the right table (doctors) is preserved. If any row from the doctors table would not survive the result of an inner join on the same condition, it will still be there, but with a NULL in all the columns from the left table (the intermediate result above). Let's first look at how an inner join with the same conditions would look:
service_id name price doctor_id service_id doctor_id name
2 serv2 500 3 2 3 doc3
2 serv2 500 1 2 1 doc1
3 serv3 700 1 3 1 doc1
3 serv3 700 3 3 3 doc3
All rows in the intermediate table have a matching row in the doctors table, so no rows are eliminated because of that requirement. But rows in the intermediate table with a price of 1000 or more are lost. And as a result, we only have rows for two of the four doctors. The outer join says that all doctors have to be retained, so rows for the missing doctors are added back in:
service_id name price doctor_id service_id doctor_id name
2 serv2 500 3 2 3 doc3
2 serv2 500 1 2 1 doc1
3 serv3 700 1 3 1 doc1
3 serv3 700 3 3 3 doc3
NULL NULL NULL NULL NULL 2 doc2
NULL NULL NULL NULL NULL 4 doc4
This would be the result if the query would ask for SELECT *. There are no WHERE, GROUP BY, or HAVING clauses, so the next step is the SELECT list. This asks for an aggregate. Without a GROUP BY, that means aggregate over the whole result set. The rows are counted and the result (6) is returned.
December 27, 2012 at 6:13 am
Hugo Kornelis (12/27/2012)
I'm bad at pictures, so how about a step-by-step description of the query evaluation?
Ahh... you know, I actually had the mechanics straight, it was the fine details I screwed up. Somehow in my head I managed to lump the one excluded left record in with the duplicate 1s and 3s, instead of the 2s and 4s where it belonged, if that makes sense. So I subtracted one of the duplicates rather than NULLing the appropriate record. At least I understood the concept. Forest for the trees, etc.
thanks!
ron
-----
a haiku...
NULL is not zero
NULL is not an empty string
NULL is the unknown
December 27, 2012 at 9:31 am
A good one -- thanks!
December 27, 2012 at 10:02 am
Thanks for the question.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
December 27, 2012 at 10:04 am
This example shows that the query isn't logically the same as if it would contain filtering in "where" which is executed after joins are made.
This example shows how difficult it is to work out a join on a piece of paper, instead of firing up SSMS and using copy and paste.
December 27, 2012 at 10:09 am
A very good question.
Personally I would avoid writing a condition on one of the joined table expressions in the ON clause, but instead write it as an explicit subquery (ie modify the table-expression concerned). That way there is no scope for anyone mistaking the meaning. I'm tempted to say that it would have been better language design to preclude such conditions in the ON clause, but the time for that argument is long past. I'm sure it would have been better language design to exclude such conditions from the ON clause of the MERGE statement; BoL has a nice big caveat that makes it pretty clear that they should be avoided there.
Tom
December 27, 2012 at 11:56 am
I'm with you on that, Tom. I get confused enough by RIGHT joins, never mind a join with conditions in the ON clause that don't directly define the relationship. Here's the query re-structured, first to use a LEFT join, and then the simple move of the price<1000 condition to the sub-query's WHERE clause.--All queries return * instead of just count(*) so we can see what's counted.
-- Original join layout
select *--quantity = count(*)
from @service s
inner join @doc_serv ds on s.service_id = ds.service_id
right outer join @doctor d on ds.doctor_id = d.doctor_id and s.price < 1000
order by d.doctor_id,ds.service_id
-- Turn it over to use LEFT join to a subquery
select *--quantity = count(*)
from @doctor d
left outer join
(Select s.service_id, s.price, ds.doctor_id
from @doc_serv ds
inner join @service s
on s.service_id = ds.service_id
and s.price < 1000
) x
on x.doctor_id = d.doctor_id
order by d.doctor_id,x.service_id
-- Move the condition (price < 1000) to WHERE clause of subquerey
select *--quantity = count(*)
from @doctor d
left outer join
(Select s.service_id, s.price, ds.doctor_id
from @doc_serv ds
inner join @service s
on s.service_id = ds.service_id
where s.price < 1000
) x
on x.doctor_id = d.doctor_id
order by d.doctor_id,x.service_id
I'd prefer the longer code as shown here to the compact right join as it's much easier (at least for me) to understand.
December 28, 2012 at 5:42 am
Hugo Kornelis (12/27/2012)
I'm bad at pictures, so how about a step-by-step description of the query evaluation?
Thanks for the explanation, I should have gone through it step by step just like that instead of trying to work it out in my head.
john.arnott (12/27/2012)
I'm with you on that, Tom. I get confused enough by RIGHT joins, never mind a join with conditions in the ON clause that don't directly define the relationship. Here's the query re-structured, first to use a LEFT join, and then the simple move of the price<1000 condition to the sub-query's WHERE clause.
Much prefer the version with the price condition in a WHERE clause, seems much clearer to me.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply