May 11, 2011 at 2:59 pm
I've been researching this for the past hour and I cannot find an explanation for what I'm seeing here (or I'm just not understanding it correctly).
Can someone help me understand the difference between putting a filter in a LEFT JOIN vs the WHERE?
Example Query:
SELECT
t1.*
, t2.*
FROM
tableA t1
LEFT JOIN tableB t2
ON t1.ID = t2.ID
WHERE
t2.ID IS NULL
AND t1.Status IN ('A', 'B')
vs
SELECT
t1.*
, t2.*
FROM
tableA t1
LEFT JOIN tableB t2
ON t1.ID = t2.ID
AND t1.Status IN ('A', 'B')
WHERE
t2.ID IS NULL
For some reason when I run this query I get many more records in the second query than I do in the first. I don't understand what is going on here. When I run the execution plan I see that the table scan for t1 in the first query has the filter applied to it whereas the hash match that joins the two tables in the second query has the filter applied to listed under Probe Residual. I thought it would have worked exactly opposite that.
Any understanding would be much appreciated, it's driving me nuts.
- John
May 11, 2011 at 6:14 pm
it's the left join that is confusing you.
in the first query, the left table has all rows, and matching records from the second table get joined.
after that, you are filtering the RIGHT table to remove everything that doesn't match effectively changing that to an INNER JOIN, then filtering on status.
in the second query, the rows in TableB that were eliminated with the IS NULL are still in the results, so the dataset is bigger. change the second query to an INNER JOIN, and the two sqls will be have the same results.
Lowell
May 11, 2011 at 9:15 pm
See Kimberly Tripp's blog for a detailed explanation of what Lowell is talking about: http://www.sqlskills.com/BLOGS/KIMBERLY/post/Determining-the-position-of-search-arguments-in-a-join.aspx
It's the difference between only joining on the records with that condition (e.g. condition in the Left Join), or filtering your results after joining the two complete sets (condition in the WHERE, which ends up looking like an inner join, since you only return the ones that meet)
---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]
"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
May 11, 2011 at 9:47 pm
The real issue here is that moving the criteria for TableA to the ON clause for the join changes what you are asking SQL Server.
In the first query, you are asking for all rows from TableA that have a status of A or B that do not have a matching row in TableB.
In the second query, you are asking for all rows from TableA that do not have a matching row from TableB for those rows in TableA that have a status of A or B.
The reason you get more rows in the second query is because there are more rows in TableA with a status other than A or B which will not be filtered out because the join doesn't find a match.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 12, 2011 at 11:56 am
Thank you all for the responses. I understand where the counts I'm seeing are coming from and I can confirm the #s now (and WHY!)
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply