November 6, 2024 at 12:00 am
Comments posted to this topic are about the item Using Outer Joins
November 6, 2024 at 12:17 pm
Bad style of the query (I know, it is intended here, but you should never write it this way in real code).
Instead of
SELECT
c.CustomerName, COUNT(oh.OrderID)
FROM dbo.Customer AS c
LEFT JOIN dbo.OrderHeader AS oh ON oh.CustomerID = c.CustomerID
WHERE oh.Orderdate > '2024/04/01'
GROUP BY c.CustomerName
you should better use
SELECT
c.CustomerName, oh.cnt
FROM dbo.Customer AS c
LEFT JOIN (SELECT oh.CustomerId, COUNT(*) AS cnt
FROM dbo.OrderHeader AS oh
WHERE oh.Orderdate > '2024/04/01'
) AS oh
ON oh.CustomerID = c.CustomerID
which would be faster (since it aggregates first and just the smaller table) and prevents the common mistake to place a join condition (limiting the Orderdate) in the WHERE instead the ON (or in this case the subquery itself, where it belongs).
Drawback or bonus of the fixed query: the original one would aggregate two Millers or two Smiths together. This MAY be correct in some circumstances, but usually not when it comes to customer names.
So it usually would have been correct to group by CustomerID and CustomerName in the original query.
If this is one of the rare cases, where all Millers needs to be aggregated, you could add the GROUP BY in the second query again and use SUM(oh.cnt) as cnt in the SELECT part. It would now be grouping twice, but usually this will be still faster.
God is real, unless declared integer.
November 6, 2024 at 7:43 pm
Bad style of the query (I know, it is intended here, but you should never write it this way in real code).
Instead of
SELECT
c.CustomerName, COUNT(oh.OrderID)
FROM dbo.Customer AS c
LEFT JOIN dbo.OrderHeader AS oh ON oh.CustomerID = c.CustomerID
WHERE oh.Orderdate > '2024/04/01'
GROUP BY c.CustomerNameyou should better use
SELECT
c.CustomerName, oh.cnt
FROM dbo.Customer AS c
LEFT JOIN (SELECT oh.CustomerId, COUNT(*) AS cnt
FROM dbo.OrderHeader AS oh
WHERE oh.Orderdate > '2024/04/01'
) AS oh
ON oh.CustomerID = c.CustomerIDwhich would be faster (since it aggregates first and just the smaller table) and prevents the common mistake to place a join condition (limiting the Orderdate) in the WHERE instead the ON (or in this case the subquery itself, where it belongs).
Drawback or bonus of the fixed query: the original one would aggregate two Millers or two Smiths together. This MAY be correct in some circumstances, but usually not when it comes to customer names.
So it usually would have been correct to group by CustomerID and CustomerName in the original query.
If this is one of the rare cases, where all Millers needs to be aggregated, you could add the GROUP BY in the second query again and use SUM(oh.cnt) as cnt in the SELECT part. It would now be grouping twice, but usually this will be still faster.
Sorry - but your 'corrected' query is not actually correct. You need to add a GROUP BY in the derived table since you are including the CustomerID in the results and want to return the count by customer.
I am also not sure that construct would perform better than fixing the first query to remove the WHERE and include that in the LEFT JOIN condition.
SELECT
c.CustomerName, COUNT(oh.OrderID)
FROM dbo.Customer AS c
LEFT JOIN dbo.OrderHeader AS oh ON oh.CustomerID = c.CustomerID
AND oh.Orderdate > '2024/04/01'
GROUP BY c.CustomerName
This construct may just end up being the same execution plan as using the derived table. It would depend on how SQL Server generated the plan - since it could move things around as needed.
With that said - this may be a better approach:
SELECT
c.CustomerName, TotalOrders = coalesce(ord.OrderCount, 0)
FROM dbo.Customer AS c
OUTER APPLY (SELECT OrderCount = count(*)
FROM dbo.OrderHeader AS oh
WHERE oh.CustomerID = c.CustomerID
AND oh.Orderdate > '2024/04/01') AS ord;
Again - it would depend on indexing and how SQL Server generates the execution plan. This one also takes care of NULL - returning zero for any customers that don't have any orders.
But - in regards to the actual question - as written the query is showing what happens when you include the outer referenced table in the WHERE clause. The purpose of the question was to highlight how rows are removed even when you use an outer join and might expect those rows to be in the results.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
November 6, 2024 at 11:15 pm
I concur with the error found by Jeff Williams. I also concur with the OUTER APPLY method being faster. With the correct index, it's actually twice as fast as the LEFT JOIN method on a million rows in the OrderHeader table. I also converted the tables to Temp Tables because I also do drops when testing.
Here's the code I used to create and populate the tables. I'll let you create whatever indexes you think are needed.
DROP TABLE IF EXISTS #Customer;
GO
CREATE TABLE #Customer
(CustomerID INT, CustomerName VARCHAR(10))
;
GO
INSERT INTO #Customer WITH (TABLOCK)
(CustomerID,CustomerName)
VALUES (1,'Steve')
,(2,'Andy')
,(3,'Brian')
,(4,'Allen')
,(5,'Devin')
,(6,'Sally')
;
GO
DROP TABLE IF EXISTS #OrderHeader;
GO
CREATE TABLE #OrderHeader
(OrderID INT IDENTITY(1,1), CustomerID INT, OrderDate DATE)
;
-- INSERT INTO #OrderHeader WITH (TABLOCK)
-- (OrderID,CustomerID,OrderDate)
-- VALUES (1,'2024-02-01')
-- ,(1,'2024-03-01')
-- ,(3,'2024-04-01')
-- ,(4,'2024-05-01')
-- ,(4,'2024-05-01')
-- ,(3,'2024-06-07')
-- ,(2,'2024-04-07')
--;
--GO
INSERT INTO #OrderHeader WITH (TABLOCK)
(CustomerID,OrderDate)
SELECT CustomerID = ABS(CHECKSUM(NEWID())%6)+1
,OrderDate = DATEADD(dd,ABS(CHECKSUM(NEWID())%DATEDIFF(dd,'2024','2025')),'2024')
FROM JBMTestSSD.dbo.fnTally(1,1000000) --You need to change this line to suite your system.
;
GO
SELECT TOP 1000 * FROM #OrderHeader
--Jeff Moden
Change is inevitable... Change for the better is not.
November 7, 2024 at 4:40 pm
This is a topic for which I'd very much appreciate a summary to wrap this up. This is new to me. Something in the form of
- instead of this
- do this.
Thanks.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply