January 6, 2014 at 7:49 am
Comments posted to this topic are about the item Stairway to T-SQL Part 2: Beyond T-SQL Basics: Level 3: Building a Correlated Subquery
Gregory A. Larsen, MVP
January 6, 2014 at 11:08 pm
Thank you Gregory for this clear and concise article.
I cannot resist pitching in some Window function version of the queries;
;WITH CUST_SALES_DETAILS AS
(
SELECT
SSOH.CustomerID
,ROW_NUMBER() OVER
(PARTITION BY SSOH.SalesOrderID ORDER BY (SELECT NULL)) AS SSOH_RID
,COUNT(SSOD.SalesOrderDetailID) OVER
(PARTITION BY SSOH.SalesOrderID) AS COUNT_SSOD
FROM Sales.SalesOrderHeader SSOH
INNER JOIN Sales.SalesOrderDetail SSOD
ON SSOH.SalesOrderID = SSOD.SalesOrderID
)
SELECT
CSD.CustomerID
,CSD.COUNT_SSOD
/* ,DENSE_RANK() OVER (ORDER BY CSD.COUNT_SSOD DESC) AS SALES_RANK */
FROM CUST_SALES_DETAILS CSD
WHERE CSD.SSOH_RID = 1
AND CSD.COUNT_SSOD > 70;
GO
-------------------------------------------
;WITH CUST_SALE_BY_YEAR AS
(
SELECT
SSOH.CustomerID
,ROW_NUMBER() OVER
(PARTITION BY SSOH.CustomerID,YEAR(SSOH.[OrderDate])
ORDER BY (SELECT NULL)) AS CUST_RID
,YEAR(SSOH.[OrderDate]) AS SSOH_YEAR
,SUM(SSOH.SubTotal) OVER
(PARTITION BY SSOH.CustomerID,YEAR(SSOH.[OrderDate])) AS SSOH_SUBTOTAL
,(SUM(SSOH.SubTotal) OVER
(PARTITION BY SSOH.CustomerID,YEAR(SSOH.[OrderDate]))) * (0.10) AS SSOH_REBATE
FROMSales.SalesOrderHeaderSSOH
)
SELECT
CABY.CustomerID
/*
,DENSE_RANK() OVER (PARTITION BY CABY.SSOH_YEAR ORDER BY CABY.SSOH_SUBTOTAL DESC,CABY.SSOH_YEAR DESC) AS SALES_RANK
*/
,CABY.SSOH_YEAR
,CABY.SSOH_SUBTOTAL
,CABY.SSOH_REBATE
FROM CUST_SALE_BY_YEARCABY
WHERE CABY.CUST_RID = 1
AND CABY.SSOH_YEAR = 2008
AND CABY.SSOH_SUBTOTAL > 150000;
GO
-------------------------------------------
March 5, 2014 at 1:20 pm
Thnks for the great article. I have just one remark :
For Question 1 the answer is C and not A. as the answer A refers to inner query instead of outer query.
March 6, 2014 at 5:56 am
The correlated subquery may be executed many times. It will be run once for each candidate row selected in the outer query.
I believe that Jeff Moden refers to this as RBAR (row by agonizing row).
A nice followup article would be "How to avoid using the Correlated Subquery."
March 6, 2014 at 1:14 pm
Just to point out a possible improvement on the first query in part 3.
The sql engine has to match all rows from the subquery to the outer query. This is because the filtering is outside the subquery.
SELECT CustomerID FROM Sales.SalesOrderHeader OH
WHERE (SELECT COUNT(*) FROM Sales.SalesOrderDetail
WHERE SalesOrderID = OH.SalesOrderID) > 70;
Just by adding a GROUP BY clause, the work is cut by a 1/4, here are two suggestions, the latter slightly faster :cool:;
SELECT CustomerID FROM Sales.SalesOrderHeader OH
WHERE (SELECT COUNT(*) FROM Sales.SalesOrderDetail
WHERE SalesOrderID = OH.SalesOrderID
GROUP BY SalesOrderID) > 70;
And
SELECT
SOH.CustomerID
FROM
(
SELECT
COUNT(*) AS SOD_COUNT
,SOD.SalesOrderID
FROM Sales.SalesOrderDetail SOD
GROUP BY SOD.SalesOrderID
) AS X
INNER JOIN Sales.SalesOrderHeader SOH
ON X.SalesOrderID = SOH.SalesOrderID
WHERE X.SOD_COUNT > 70;
March 19, 2015 at 10:35 am
Dennis Wagner-347763 (3/6/2014)
The correlated subquery may be executed many times. It will be run once for each candidate row selected in the outer query.
I believe that Jeff Moden refers to this as RBAR (row by agonizing row).
Except that correlated subqueries *don't* run once per row of the outer query. It's an old and persistent myth that they do that.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 5, 2016 at 11:46 pm
I understand the logic behind listing #3, but I was surprised to see, and uncertain on how it works, that it is not necessary to use another GROUP BY clause in the subquery:
SELECT Outer_H.[CustomerID]
, SUM(Outer_H.[SubTotal]) AS TotalPurchase
, SUM(Outer_H.[SubTotal]) * .10 AS Rebate
FROM [Sales].[SalesOrderHeader] AS Outer_H
WHERE YEAR(Outer_H.[OrderDate]) = '2008'
GROUP BY Outer_H.[CustomerID]
HAVING (SELECT SUM(Inner_H.[SubTotal]) FROM [Sales].[SalesOrderHeader] AS Inner_H
WHERE Inner_H.[CustomerID] = Outer_H.[CustomerID]
AND YEAR(Inner_H.[OrderDate]) = '2008' GROUP BY CLAUSE NOT REQUIRED HERE) > 150000
ORDER BY Rebate DESC;
How is SQL able to execute the aggregate SUM() function in the inner query w/out a GROUP BY clause applied? Does it automatically apply the GROUP BY clause from the outer query to the inner query?
Thanks.
September 6, 2016 at 1:37 am
Because there's no un-aggregated columns being returned from that subquery. Group By is only needed when you have columns in aggregate and columns that aren't in aggregate in the select clause.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 10, 2016 at 3:11 pm
GilaMonster (9/6/2016)
Because there's no un-aggregated columns being returned from that subquery. Group By is only needed when you have columns in aggregate and columns that aren't in aggregate in the select clause.
Gila, thank you very much. If I learn nothing else from this stairway, I've learned this. So thanks again.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply