November 21, 2011 at 8:45 am
I have an agregate query and I want to join a table to that query to add more fields. I do not want to join that table within that query itself because it give me multiple rows. How do I do that?
Thanks.
November 21, 2011 at 9:12 am
Use a derived table to return the results of the aggregate, then join that to your secondary table:
SELECT *
FROM ( -- The results of the aggregation create the derived table
SELECT RowID, MAX(RowDate)
FROM TblA
GROUP BY RowID
) AS drv (RowID, RowDate) INNER JOIN
TblB ON drv.RowID = TblB.RowID AND
drv.RowDate = TblB.RowDate
November 21, 2011 at 12:00 pm
This is probably a good case for CROSS APPLY
http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
Jared
Jared
CE - Microsoft
November 21, 2011 at 2:57 pm
Thanks JonFox, that is what I was looking for. Thanks again.
November 21, 2011 at 5:19 pm
p-nut (11/21/2011)
This is probably a good case for CROSS APPLYhttp://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
Jared
Well...possibly. But if the requirement is just to join aggregated data to rows in another table, I don't know that CROSS APPLYing an inline TVF provides any advantage...well, that I know of, anyway. 🙂
November 21, 2011 at 5:22 pm
p-nut (11/21/2011)
This is probably a good case for CROSS APPLYhttp://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
Jared
From my experience using CROSS APPLY will force the server to do LOOP JOIN which may be not the best option. So I would suggest to use JonFox's example.
November 22, 2011 at 12:05 pm
Alexander Suprun (11/21/2011)
p-nut (11/21/2011)
This is probably a good case for CROSS APPLYhttp://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
Jared
From my experience using CROSS APPLY will force the server to do LOOP JOIN which may be not the best option. So I would suggest to use JonFox's example.
I disagree. The execution plans in this case will probably be the same (depending on indexes and such). I like the CROSS APPLY here because it is cleaner. Of course, we don't know what the data that the OP looks like because they did not post any DDL or sample data.
Jared
Jared
CE - Microsoft
November 22, 2011 at 1:27 pm
p-nut (11/22/2011)
Alexander Suprun (11/21/2011)
p-nut (11/21/2011)
This is probably a good case for CROSS APPLYhttp://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
Jared
From my experience using CROSS APPLY will force the server to do LOOP JOIN which may be not the best option. So I would suggest to use JonFox's example.
I disagree. The execution plans in this case will probably be the same (depending on indexes and such). I like the CROSS APPLY here because it is cleaner. Of course, we don't know what the data that the OP looks like because they did not post any DDL or sample data.
Jared
I don't see how APPLY can be more cleaner when simple JOIN can be used. Why make things more complicated than they really are? Even MS documentation says:
http://technet.microsoft.com/en-us/library/ms175156.aspx
The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query.
Surely it doesn't mean that you cannot use APPLY in other situations but it shows that this operator has been developed especially to invoke the functions. And if I see in someone's code APPLY instead of JOIN I'll try to find reasons for that, but possibly there are no such. I saw a few times when developers were misusing APPLY because it's more flexible and those who like to process data row-by-row definitely like it but sometimes it results in really bad query plans.
I didn't have time to find a good example for CROSS APPLY, but here is a simple one with OUTER APPLY vs LEFT JOIN:
USE AdventureWorks
GO
SET SHOWPLAN_TEXT ON
GO
SELECT OH.SalesOrderID, OH.OrderDate, OD.ProductID
FROM Sales.SalesOrderHeader OH
LEFT JOIN
(
SELECT SalesOrderID, MAX(ProductID) AS ProductID
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
) AS OD ON OD.SalesOrderID = OH.SalesOrderID
SELECT OH.SalesOrderID, OH.OrderDate, OD.ProductID
FROM Sales.SalesOrderHeader OH
OUTER APPLY
(
SELECT MAX(ProductID) AS ProductID
FROM Sales.SalesOrderDetail SOD
WHERE SOD.SalesOrderID = OH.SalesOrderID
) AS OD
|--Merge Join(Right Outer Join, MERGE: ([AdventureWorks].[Sales].[SalesOrderDetail].[SalesOrderID])=([OH].[SalesOrderID]), RESIDUAL: ([AdventureWorks].[Sales].[SalesOrderDetail].[SalesOrderID]=[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderID] as [OH].[SalesOrderID]))
|--Compute Scalar(DEFINE: ([Expr1007]=[Expr1005]))
| |--Stream Aggregate(GROUP BY: ([AdventureWorks].[Sales].[SalesOrderDetail].[SalesOrderID]) DEFINE: ([Expr1005]=MAX([AdventureWorks].[Sales].[SalesOrderDetail].[ProductID])))
| |--Clustered Index Scan(OBJECT: ([AdventureWorks].[Sales].[SalesOrderDetail].[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID]), ORDERED FORWARD)
|--Clustered Index Scan(OBJECT: ([AdventureWorks].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [OH]), ORDERED FORWARD)
|--Nested Loops(Left Outer Join, OUTER REFERENCES: ([OH].[SalesOrderID], [Expr1007]) WITH UNORDERED PREFETCH)
|--Clustered Index Scan(OBJECT: ([AdventureWorks].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [OH]))
|--Compute Scalar(DEFINE: ([Expr1005]=[Expr1004]))
|--Stream Aggregate(DEFINE: ([Expr1004]=MAX([AdventureWorks].[Sales].[SalesOrderDetail].[ProductID] as [SOD].[ProductID])))
|--Clustered Index Seek(OBJECT: ([AdventureWorks].[Sales].[SalesOrderDetail].[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] AS [SOD]), SEEK: ([SOD].[SalesOrderID]=[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderID] as [OH].[SalesOrderID]) ORDERED FORWARD)
As you can see there is nothing criminal in the 2nd query but server decided to do a LOOP JOIN in this case.
November 22, 2011 at 1:52 pm
Alexander Suprun (11/22/2011)
p-nut (11/22/2011)
Alexander Suprun (11/21/2011)
p-nut (11/21/2011)
This is probably a good case for CROSS APPLYhttp://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
Jared
From my experience using CROSS APPLY will force the server to do LOOP JOIN which may be not the best option. So I would suggest to use JonFox's example.
I disagree. The execution plans in this case will probably be the same (depending on indexes and such). I like the CROSS APPLY here because it is cleaner. Of course, we don't know what the data that the OP looks like because they did not post any DDL or sample data.
Jared
I don't see how APPLY can be more cleaner when simple JOIN can be used. Why make things more complicated than they really are? Even MS documentation says:
http://technet.microsoft.com/en-us/library/ms175156.aspx
The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query.Surely it doesn't mean that you cannot use APPLY in other situations but it shows that this operator has been developed especially to invoke the functions. And if I see in someone's code APPLY instead of JOIN I'll try to find reasons for that, but possibly there are no such. I saw a few times when developers were misusing APPLY because it's more flexible and those who like to process data row-by-row definitely like it but sometimes it results in really bad query plans.
I didn't have time to find a good example for CROSS APPLY, but here is a simple one with OUTER APPLY vs LEFT JOIN:
USE AdventureWorks
GO
SET SHOWPLAN_TEXT ON
GO
SELECT OH.SalesOrderID, OH.OrderDate, OD.ProductID
FROM Sales.SalesOrderHeader OH
LEFT JOIN
(
SELECT SalesOrderID, MAX(ProductID) AS ProductID
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
) AS OD ON OD.SalesOrderID = OH.SalesOrderID
SELECT OH.SalesOrderID, OH.OrderDate, OD.ProductID
FROM Sales.SalesOrderHeader OH
OUTER APPLY
(
SELECT MAX(ProductID) AS ProductID
FROM Sales.SalesOrderDetail SOD
WHERE SOD.SalesOrderID = OH.SalesOrderID
) AS OD
|--Merge Join(Right Outer Join, MERGE: ([AdventureWorks].[Sales].[SalesOrderDetail].[SalesOrderID])=([OH].[SalesOrderID]), RESIDUAL: ([AdventureWorks].[Sales].[SalesOrderDetail].[SalesOrderID]=[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderID] as [OH].[SalesOrderID]))
|--Compute Scalar(DEFINE: ([Expr1007]=[Expr1005]))
| |--Stream Aggregate(GROUP BY: ([AdventureWorks].[Sales].[SalesOrderDetail].[SalesOrderID]) DEFINE: ([Expr1005]=MAX([AdventureWorks].[Sales].[SalesOrderDetail].[ProductID])))
| |--Clustered Index Scan(OBJECT: ([AdventureWorks].[Sales].[SalesOrderDetail].[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID]), ORDERED FORWARD)
|--Clustered Index Scan(OBJECT: ([AdventureWorks].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [OH]), ORDERED FORWARD)
|--Nested Loops(Left Outer Join, OUTER REFERENCES: ([OH].[SalesOrderID], [Expr1007]) WITH UNORDERED PREFETCH)
|--Clustered Index Scan(OBJECT: ([AdventureWorks].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [OH]))
|--Compute Scalar(DEFINE: ([Expr1005]=[Expr1004]))
|--Stream Aggregate(DEFINE: ([Expr1004]=MAX([AdventureWorks].[Sales].[SalesOrderDetail].[ProductID] as [SOD].[ProductID])))
|--Clustered Index Seek(OBJECT: ([AdventureWorks].[Sales].[SalesOrderDetail].[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] AS [SOD]), SEEK: ([SOD].[SalesOrderID]=[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderID] as [OH].[SalesOrderID]) ORDERED FORWARD)
As you can see there is nothing criminal in the 2nd query but server decided to do a LOOP JOIN in this case.
I understand what you are saying in a simple example, but we cannot appropriately help the OP without the DDL. All the OP said was that it was an aggregate query. Did not say sum, max, or any other crazy arithmetical operation. Heck, for all we know it could be a running total.
Also, CROSS APPLY was not strictly designed for functions. You cannot take Microsoft's descriptions word for word and assume purpose. There are many cases outside of functions that are better and many cases where although the results may be the same for a join versus apply where the apply is much less resource intensive and faster. Again, this depends on certain factors in the DDL, which were not given here.
As far as "cleaner," I hate nested queries in the FROM clause. It makes script very hard to understand. That's why I say cleaner, of course, you cannot sacrifice performance for cleanliness.
Jared
Jared
CE - Microsoft
November 22, 2011 at 2:07 pm
Maybe the simplest suggestion for cleanliness and readability is a cte assuming this is a simple aggregate:
;WITH cte (SalesOrderID, ProductID) AS
(SELECT SalesOrderID, MAX(ProductID) AS ProductID
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID)
SELECT OH.SalesOrderID, OH.OrderDate, cte.ProductID
FROM Sales.SalesOrderHeader OH
INNER JOIN cte
ON cte.SalesOrderID = OH.SalesOrderID
Jared
Jared
CE - Microsoft
November 22, 2011 at 2:19 pm
p-nut (11/22/2011)
I understand what you are saying in a simple example, but we cannot appropriately help the OP without the DDL. All the OP said was that it was an aggregate query. Did not say sum, max, or any other crazy arithmetical operation. Heck, for all we know it could be a running total.Also, CROSS APPLY was not strictly designed for functions. You cannot take Microsoft's descriptions word for word and assume purpose. There are many cases outside of functions that are better and many cases where although the results may be the same for a join versus apply where the apply is much less resource intensive and faster. Again, this depends on certain factors in the DDL, which were not given here.
As far as "cleaner," I hate nested queries in the FROM clause. It makes script very hard to understand. That's why I say cleaner, of course, you cannot sacrifice performance for cleanliness.
Jared
I think the discussion is already not about the OP question 🙂 I agree we can say or suggest nothing to OP now.
If by a chance you find an example where APPLY is much less resource intensive and faster than a JOIN I will be very like to see it.
If you have to do such kind of aggregation then you cannot avoid nested queries by simply using APPLY. It should be temp table, CTE or something like that. And yes I like using CTEs in such cases as well, especially when there are more than 1 level of nesting.
Alex
November 22, 2011 at 2:28 pm
An example of the equivalency of the statements on a max compare vs. a cross apply. Adjust as needed.
This is a pared down testbed I use for this example. Think of a price logging table that you want the most recent price put into the table prior to an invoice so you can run historical invoices with their correct sale prices or different adjustments as marketing tries to get products to move as required.
Source Data:
IF OBJECT_ID( 'tempdb..#Items') IS NOT NULL
DROP TABLE #Items
CREATE TABLE #Items
(ItemIDINT IDENTITY(1,1) NOT NULL,
ItemDescVARCHAR(100),
SubcategoryIDINT,
PRIMARY KEY NONCLUSTERED ( ItemDesc)
)
CREATE CLUSTERED INDEX idx_Items ON #Items ( ItemID)
INSERT INTO #Items (ItemDesc, SubcategoryID)
SELECTNEWID() AS ItemDesc, N/500
FROMtempdb..Tally
WHEREN <= 25000
IF OBJECT_ID( 'tempdb..#PriceHistoryTable') IS NOT NULL
DROP TABLE #PriceHistoryTable
CREATE TABLE #PriceHistoryTable
(AuditIDINT IDENTITY(1,1) NOT NULL,
ItemIDINT NOT NULL,
ItemPrice DECIMAL(19,4),
NumData2 INT,
UpdateDate DATETIME,
ReasonForChange VARCHAR(200),
CommentField VARCHAR(2000)
)
CREATE CLUSTERED INDEX idx_TestData
ON #PriceHistoryTable
( UpdateDate, ItemID)
CREATE NONCLUSTERED INDEX idx_TestData2 ON #PriceHistoryTable
(ItemID, UpdateDate Desc)
INSERT INTO #PriceHistoryTable (ItemID, ItemPrice, NumData2, UpdateDate, ReasonForChange, CommentField)
SELECT
t.N%2500 AS ItemID, -- Adjust the modulo to decide the data density.
(t.n + 10) * 2 / 3 AS ItemPrice, -- Simply here as a filler
t.n AS NumData2, -- Some more filler
DATEADD( yyyy, 90, DATEADD( ms, ABS( CHECKSUM(NEWID()))%1000, DATEADD( s, ABS( CHECKSUM(NEWID()))%86400, ABS( CHECKSUM(NEWID())% (365 * 20)) )) ) AS UpdateDate, -- Give me a random date
NULL AS ReasonForChange,
REPLICATE(( '@', 500) AS CommentField
FROM
tempdb..Tally AS t,
tempdb..Tally AS t2
WHERE
t.N <= 50000
AND t2.N <= 20 --Control your testing rowcount here. 2 for 100k row test, 20 for 1mill.
Testable code. Remember to run more than once to allow for cache'ing and compiles.
SET STATISTICS IO, TIME ON
---------------------------- MIN/MAX Method
; WITH cte AS
( SELECT
ItemId,
Max( UpdateDate) AS MaxDate
FROM
#PriceHistoryTable
GROUP BY
ItemId
)
SELECT
td.ItemId,
td.ItemPrice,
td.NumData2,
td.UpdateDate
FROM
#PriceHistoryTable AS td
JOIN
cte
ON td.ItemId = cte.ItemId
AND td.Updatedate = cte.MaxDate
-----------------------Cross Apply Method
SELECT
dst.ItemID,
ca.ItemPrice,
ca.NumData2,
ca.UpdateDate
FROM
(SELECT DISTINCT ItemID FROM #PriceHistoryTable) AS dst
CROSS APPLY
(SELECT TOP 1 ItemID, ItemPrice, NumData2, UpdateDate
FROM #PriceHistoryTable AS td2
WHERE td2.ItemID = dst.ItemID
ORDER BY UpdateDate DESC) AS ca
SET STATISTICS IO, TIME OFF
I've attached my 'full test' code in case you're curious. This is for an article I'll eventually do in comparisons of index mechanics while working with application methods for logging table locatability like this. It's far from complete for my final intentions but I figured you might be curious.
For a run time idea of the time this takes:
(2500 row(s) affected)
Table '#PriceHistoryTable__________________________________________________________________________________________________000000000008'. Scan count 2501, logical reads 13249, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 265 ms, elapsed time = 397 ms.
(2500 row(s) affected)
Table '#PriceHistoryTable__________________________________________________________________________________________________000000000008'. Scan count 2501, logical reads 19914, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 172 ms, elapsed time = 270 ms.
Note the Cross Apply is faster here. This has to do with data density and index optimizations, I've been banging on this particular model for a while using it as a tool to dig at the engine.
If you review the Execution plans, of primary interest is that A) The cross apply lies about the cost in batch relation... even though it has more reads. and B) The cross apply includes a second nested loop construction.
It is still more performant. In this case. These two techniques almost always beat out the row_number methods, except for particular data densities. Which one to use is under a lot of circumstances pretty much a coin flip. If you want the particulars you'll have to wait for my article. 😉
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
November 22, 2011 at 2:31 pm
If by a chance you find an example where APPLY is much less resource intensive and faster than a JOIN I will be very like to see it.
Check this out: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/6e39d29c-11df-45b5-8a7d-aab536e93acc
and this:
http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/
"While most queries which employ CROSS APPLY can be rewritten using an INNER JOIN, CROSS APPLY can yield better execution plan and better performance, since it can limit the set being joined yet before the join occurs."
However, I do agree with you that in the case of the unattending OP and the information given that a cte is probably better than a CROSS APPLY.
Jared
Jared
CE - Microsoft
November 22, 2011 at 2:33 pm
If you want the particulars you'll have to wait for my article. 😉
I can't wait!
Jared
Jared
CE - Microsoft
November 22, 2011 at 4:00 pm
Thanks guys, very interesting stuff, I am definitely going to play with these scripts provided.
If you want the particulars you'll have to wait for my article. 😉
I'll be waiting as well!
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply