May 27, 2004 at 12:36 pm
I am quite experienced at writing SQL statements and I am either missing something quite obvious or SQL has a problem. I wrote a query and it brought back unexpected results so I rewrote in a few similar ways and it brought back expected results. Can someone tell me if these statements should bring back the same data and if not why.
The only thing I can think of is the joining order. It seems to do the Left Join first and then when that is done it filters it by the inner join. Is this what is happening? I would expect the inner join to happen first and then the left join. I am looking for any suggestions just to understand this.
By the way when I look at the execution plan for the broken one it does not have an outer join in it. It looks like it was replaced with an inner join. Is this possibly a bug in SQL or am I crazy?
Thanks,
Cory
The broken one - 48 records returned
Select DSTDate_dt, DSTPeriod_ti, Case When HolDate Is Null Then 0 Else 1 END Holiday
From esi_est_dst_conversion_tbl edc
Left Join KW4_HolCalDay hcd On edc.DSTDate_dt = hcd.HOLDATE
Join KW4_HolCal hc On hc.HolCalID = hcd.HolCalID And hc.HolCalName = 'NERC'
Where DSTDate_dt >= '12/24/2003' And DSTDate_dt <= '1/5/2004'
Order By DSTDate_dt
The working one - 312 records returned - I reversed the Join order
Select DSTDate_dt, DSTPeriod_ti, Case When HolDate Is Null Then 0 Else 1 END Holiday
From KW4_HolCal hc
Join KW4_HolCalDay hcd On hc.HolCalID = hcd.HolCalID And hc.HolCalName = 'NERC'
Right Join esi_est_dst_conversion_tbl edc On edc.DSTDate_dt = hcd.HOLDATE
Where DSTDate_dt >= '12/24/2003' And DSTDate_dt <= '1/5/2004'
Order By DSTDate_dt
Another version of the working one - 312 records returned - I used a sub query
Select DSTDate_dt, DSTPeriod_ti, Case When HolDate Is Null Then 0 Else 1 END Holiday
From esi_est_dst_conversion_tbl edc
Left Join (Select hc.HolCalID, HOlDate, hcd.HolCalID HolCalID2, HolCalName From KW4_HolCalDay hcd Join KW4_HolCal hc On hcd.HolCalID = hc.HolCalID And HolCalName = 'NERC') Sub
On edc.DSTDate_dt = Sub.HOLDATE
Where DSTDate_dt >= '12/24/2003' And DSTDate_dt <= '1/5/2004'
May 28, 2004 at 12:19 am
- compare execution plans !
How is this one doing ? (don't like it myself but give it a try  
Select DSTDate_dt, DSTPeriod_ti, Case When HolDate Is Null Then 0 Else 1 END Holiday
From esi_est_dst_conversion_tbl edc
Left Join KW4_HolCalDay hcd
Join KW4_HolCal hc
On hc.HolCalID = hcd.HolCalID And hc.HolCalName = 'NERC'
On edc.DSTDate_dt = hcd.HOLDATE
Where DSTDate_dt >= '12/24/2003' And DSTDate_dt <= '1/5/2004'
Order By DSTDate_dt
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
May 28, 2004 at 1:31 am
Hello Cory,
I can't explain why it is so, and whether it is a bug or not, but that's how it works :-). I've seen this many times, so it seems to me as 'normal' - anyway, I guess you'll have to live with it, unless there is some workaround that I don't know about. As far as I've been able to find out
FROM TblA
LEFT JOIN TblB ON TblA.field1 = TblB.field1
JOIN TblC ON TblC.field2 = TblB.field2
will give the same result as if the TblB is joined with an inner join. As I understand it, the third line forces the condition that table C must be joined to table B. If there is no record in table B that would match the left join, then table C can not be joined to it and the entire recordset is not returned. Once you start doing LEFT JOINs, you have to use them all the way through in all tables that are joined to the original left-joined table.
There could be something in the SQL that alzdba suggested, though I'm not sure whether it will work... and same as the author, I don't like it. I prefer an easily understandable subquery to such construct - after all, I have to consider, that other people should be able to understand what's in the code .
HTH, Vladan
May 28, 2004 at 8:09 am
alzdba, your suggested query works! But, why? I don't understand. What does moving the "On edc.DSTDate_dt = hcd.HOLDATE" to the bottom do to they query? What are the rules for this? Why does the order matter? This does not seem very intuitive. BOL doesn't have much to say about this behavior, is there any good resources on this?
Cory
May 28, 2004 at 8:33 am
Cory, it's not a bug, it's the correct behaviour. To explain better what I understand of it, let me translate the queries in "Northwind terms" (so that everyone could understand, even if they don't have your data). This would be the first query (that returns the unexpected results):
SELECT COUNT(*) FROM Customers C
LEFT JOIN Orders O ON C.CustomerID=O.CustomerID
INNER JOIN [Order Details] D ON O.OrderID=D.OrderID
In Northwind, it should return 2155 (exactly the number of rows from Order Details).
The second query would be:
SELECT COUNT(*) FROM [Order Details] D
INNER JOIN Orders O ON O.OrderID=D.OrderID
RIGHT JOIN Customers C ON C.CustomerID=O.CustomerID
This will return 2157, because there are 2 customers that have no orders.
The subquery variant would be something like:
SELECT COUNT(*) FROM Customers C
LEFT JOIN (
SELECT O.CustomerID FROM Orders O
INNER JOIN [Order Details] D ON O.OrderID=D.OrderID
) X ON C.CustomerID=X.CustomerID
This will also return 2157.
The query that alzdba suggested would be:
SELECT COUNT(*) FROM Customers C
LEFT JOIN Orders O
INNER JOIN [Order Details] D ON O.OrderID=D.OrderID
ON C.CustomerID=O.CustomerID
It seems a little odd, but I think that this query really helps us to understand what's going on. It gives exactly the same result as the second query (the one with the RIGHT JOIN).
And the query that Vladan suggested would be:
SELECT COUNT(*) FROM Customers C
LEFT JOIN Orders O ON C.CustomerID=O.CustomerID
LEFT JOIN [Order Details] D ON O.OrderID=D.OrderID
This will also give 2157, but it may give different results if there are Orders that have no Order Details.
Now let me say why I consider this to be normal:
SQL Server evaluates the joins in the order that they are written (the fact that it may choose a different query plan it's another thing, but all those query plans would give the same result).
In the first query it joins the Customers with the Orders (doing a left join) and then joins the result (doing an inner join) with the Order Details. Because the last inner join involves the OrderID column from the Orders table, SQL Server realizes that any row that would have null in this column (such as the rows corresponding to the customers with no orders) will not be included in the final resultset (because when you compare null with anything, you will get false). Because of this SQL Server optimizes the operations by doing an inner join in the first place, and that's why there are no left joins in the query plan.
The second query will perform the inner join between the Orders and Order Details first, and then join the result with the Customers, therefore giving different results. I'll skip the third query (the one with the subquery), because it's obvious.
The fourth query can be written in plain english like this: left join the Customers with "something" on the CustomerID column; "something" is computed by joining the Orders with the Order Details on the OrderID column. This way we can see that SQL will have to execute the inner join first and then the left join, therefore executing exactly the same as the second query. We can verify this by looking at the query plans. We will also notice that the fifth query will have a different query plan than the others (which could be an indication that it may, indeed, give different results).
Razvan
May 28, 2004 at 9:20 am
Somewhere is the "Rules" of writting SQL is the one that says once you perform an outer join you can NEVER perform an inner join to that table and expect the correct results. If you need to inner join to an outer joined table you need to create a derived table at the outer join table. This is what Razvan is doing with example
SELECT COUNT(*) FROM Customers C
LEFT JOIN (
SELECT O.CustomerID FROM Orders O
INNER JOIN [Order Details] D ON O.OrderID=D.OrderID
) X ON C.CustomerID=X.CustomerID
This is the only way that will correctly being back data. As Razvan pointed out outer joining to the outer join will work in some but not all cases. I have used this method for years with no problems.
SmithDM
May 28, 2004 at 9:59 am
alzdba, Vladan, Razvan, SmithDM, thanks for the help. With your explanations I do understand what is going on, I just never new the joining order was something you had to worry about with outer joins. I am glad to know it now.
Cory
May 28, 2004 at 10:11 am
sorry to fall back in so late ... meeting time
Joining is just a question of good old maths ... sets and subsets.
draw 2 circles from left to right with a bit of overlap.
the overlapped side is the inner join, the left side will be added to the overlapped part with a left join and the right part will be added to the overlapped part with a right join.
When joining more than 2 sets, you have to consider what symantics are correct to achieve your goal.
As explained by the other replies, it does make a difference.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
May 28, 2004 at 3:31 pm
SELECT COUNT(*) FROM
Customers C
LEFT JOIN
Orders O
INNER JOIN
[Order Details] D
ON
O.OrderID=D.OrderID
ON
C.CustomerID=O.CustomerID
Just to throw in here the short and simple is this.
Any join is closed out by the ON clause.
When you have multiple joins in a row and do all the ons on the outside the inner most join..on pair executes first thus in the above
O is joined to D.
When you have Join..On pairs like so
SELECT COUNT(*) FROM
Customers C
LEFT JOIN
Orders O
ON
C.CustomerID=O.CustomerID
INNER JOIN
[Order Details] D
ON
O.OrderID=D.OrderID
they are executed in order.
Because of this they are not the same because in the first
O Joined to D and that resultset is then Joined to C
in the second
O Joined to C then that resultset is Joined to D
Note: When you do this
SELECT COUNT(*) FROM
Customers C
LEFT JOIN
Orders O
INNER JOIN
[Order Details] D
ON
C.CustomerID=O.CustomerID
ON
O.OrderID=D.OrderID
You get an error "The column prefix 'C' does not match with a table name or alias name used in the query." this is because in the Join of O to D the query engine isn't even aware of C due to execution order.
With Joins placement is everything many times.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply