Agregate join

Question

Post reply

Agregate join

ramadesai108

SSChampion

Points: 10609
More actions
November 21, 2011 at 8:45 am

#260452

I have an agregate query and I want to join a table to that query to add more fields. I do not want to join that table within that query itself because it give me multiple rows. How do I do that?
Thanks.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

JonFox SSCrazy Points: 2157 More actions · Answer 1

Use a derived table to return the results of the aggregate, then join that to your secondary table:

SELECT *

FROM ( -- The results of the aggregation create the derived table

SELECT RowID, MAX(RowDate)

FROM TblA

GROUP BY RowID

) AS drv (RowID, RowDate) INNER JOIN

TblB ON drv.RowID = TblB.RowID AND

drv.RowDate = TblB.RowDate

Jared SSC Guru Points: 61793 More actions · Answer 2

This is probably a good case for CROSS APPLY

http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx

Jared

Jared
CE - Microsoft

ramadesai108 SSChampion Points: 10609 More actions · Answer 3

Thanks JonFox, that is what I was looking for. Thanks again.

JonFox SSCrazy Points: 2157 More actions · Answer 4

p-nut (11/21/2011)
This is probably a good case for CROSS APPLY
http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
Jared

Well...possibly. But if the requirement is just to join aggregated data to rows in another table, I don't know that CROSS APPLYing an inline TVF provides any advantage...well, that I know of, anyway. 🙂

Alexander Suprun SSCertifiable Points: 6253 More actions · Answer 5

p-nut (11/21/2011)
This is probably a good case for CROSS APPLY
http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
Jared

From my experience using CROSS APPLY will force the server to do LOOP JOIN which may be not the best option. So I would suggest to use JonFox's example.

Alex Suprun

Jared SSC Guru Points: 61793 More actions · Answer 6

Alexander Suprun (11/21/2011)
p-nut (11/21/2011)
This is probably a good case for CROSS APPLY
http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
Jared
From my experience using CROSS APPLY will force the server to do LOOP JOIN which may be not the best option. So I would suggest to use JonFox's example.

I disagree. The execution plans in this case will probably be the same (depending on indexes and such). I like the CROSS APPLY here because it is cleaner. Of course, we don't know what the data that the OP looks like because they did not post any DDL or sample data.

Jared

Jared
CE - Microsoft

Alexander Suprun SSCertifiable Points: 6253 More actions · Answer 7

p-nut (11/22/2011)
Alexander Suprun (11/21/2011)
p-nut (11/21/2011)
This is probably a good case for CROSS APPLY
http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
Jared
From my experience using CROSS APPLY will force the server to do LOOP JOIN which may be not the best option. So I would suggest to use JonFox's example.
I disagree. The execution plans in this case will probably be the same (depending on indexes and such). I like the CROSS APPLY here because it is cleaner. Of course, we don't know what the data that the OP looks like because they did not post any DDL or sample data.
Jared

I don't see how APPLY can be more cleaner when simple JOIN can be used. Why make things more complicated than they really are? Even MS documentation says:

http://technet.microsoft.com/en-us/library/ms175156.aspx
The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query.

Surely it doesn't mean that you cannot use APPLY in other situations but it shows that this operator has been developed especially to invoke the functions. And if I see in someone's code APPLY instead of JOIN I'll try to find reasons for that, but possibly there are no such. I saw a few times when developers were misusing APPLY because it's more flexible and those who like to process data row-by-row definitely like it but sometimes it results in really bad query plans.

I didn't have time to find a good example for CROSS APPLY, but here is a simple one with OUTER APPLY vs LEFT JOIN:

USE AdventureWorks

GO

SET SHOWPLAN_TEXT ON

GO

SELECT OH.SalesOrderID, OH.OrderDate, OD.ProductID

FROM Sales.SalesOrderHeader OH

LEFT JOIN

(

SELECT SalesOrderID, MAX(ProductID) AS ProductID

FROM Sales.SalesOrderDetail

GROUP BY SalesOrderID

) AS OD ON OD.SalesOrderID = OH.SalesOrderID

SELECT OH.SalesOrderID, OH.OrderDate, OD.ProductID

FROM Sales.SalesOrderHeader OH

OUTER APPLY

(

SELECT MAX(ProductID) AS ProductID

FROM Sales.SalesOrderDetail SOD

WHERE SOD.SalesOrderID = OH.SalesOrderID

) AS OD

|--Merge Join(Right Outer Join, MERGE: ([AdventureWorks].[Sales].[SalesOrderDetail].[SalesOrderID])=([OH].[SalesOrderID]), RESIDUAL: ([AdventureWorks].[Sales].[SalesOrderDetail].[SalesOrderID]=[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderID] as [OH].[SalesOrderID]))

|--Compute Scalar(DEFINE: ([Expr1007]=[Expr1005]))

| |--Stream Aggregate(GROUP BY: ([AdventureWorks].[Sales].[SalesOrderDetail].[SalesOrderID]) DEFINE: ([Expr1005]=MAX([AdventureWorks].[Sales].[SalesOrderDetail].[ProductID])))

| |--Clustered Index Scan(OBJECT: ([AdventureWorks].[Sales].[SalesOrderDetail].[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID]), ORDERED FORWARD)

|--Clustered Index Scan(OBJECT: ([AdventureWorks].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [OH]), ORDERED FORWARD)

|--Nested Loops(Left Outer Join, OUTER REFERENCES: ([OH].[SalesOrderID], [Expr1007]) WITH UNORDERED PREFETCH)

|--Clustered Index Scan(OBJECT: ([AdventureWorks].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [OH]))

|--Compute Scalar(DEFINE: ([Expr1005]=[Expr1004]))

|--Stream Aggregate(DEFINE: ([Expr1004]=MAX([AdventureWorks].[Sales].[SalesOrderDetail].[ProductID] as [SOD].[ProductID])))

|--Clustered Index Seek(OBJECT: ([AdventureWorks].[Sales].[SalesOrderDetail].[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] AS [SOD]), SEEK: ([SOD].[SalesOrderID]=[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderID] as [OH].[SalesOrderID]) ORDERED FORWARD)

As you can see there is nothing criminal in the 2nd query but server decided to do a LOOP JOIN in this case.

Alex Suprun

Jared SSC Guru Points: 61793 More actions · Answer 8

Alexander Suprun (11/22/2011)
p-nut (11/22/2011)
Alexander Suprun (11/21/2011)
p-nut (11/21/2011)
This is probably a good case for CROSS APPLY
http://weblogs.sqlteam.com/jeffs/archive/2007/10/18/sql-server-cross-apply.aspx
Jared
From my experience using CROSS APPLY will force the server to do LOOP JOIN which may be not the best option. So I would suggest to use JonFox's example.
I disagree. The execution plans in this case will probably be the same (depending on indexes and such). I like the CROSS APPLY here because it is cleaner. Of course, we don't know what the data that the OP looks like because they did not post any DDL or sample data.
Jared
I don't see how APPLY can be more cleaner when simple JOIN can be used. Why make things more complicated than they really are? Even MS documentation says:
http://technet.microsoft.com/en-us/library/ms175156.aspx
The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query.
Surely it doesn't mean that you cannot use APPLY in other situations but it shows that this operator has been developed especially to invoke the functions. And if I see in someone's code APPLY instead of JOIN I'll try to find reasons for that, but possibly there are no such. I saw a few times when developers were misusing APPLY because it's more flexible and those who like to process data row-by-row definitely like it but sometimes it results in really bad query plans.
I didn't have time to find a good example for CROSS APPLY, but here is a simple one with OUTER APPLY vs LEFT JOIN:
USE AdventureWorks
GO
SET SHOWPLAN_TEXT ON
GO
SELECT OH.SalesOrderID, OH.OrderDate, OD.ProductID
FROM Sales.SalesOrderHeader OH
LEFT JOIN
(
SELECT SalesOrderID, MAX(ProductID) AS ProductID
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
) AS OD ON OD.SalesOrderID = OH.SalesOrderID
SELECT OH.SalesOrderID, OH.OrderDate, OD.ProductID
FROM Sales.SalesOrderHeader OH
OUTER APPLY
(
SELECT MAX(ProductID) AS ProductID
FROM Sales.SalesOrderDetail SOD
WHERE SOD.SalesOrderID = OH.SalesOrderID
) AS OD
|--Merge Join(Right Outer Join, MERGE: ([AdventureWorks].[Sales].[SalesOrderDetail].[SalesOrderID])=([OH].[SalesOrderID]), RESIDUAL: ([AdventureWorks].[Sales].[SalesOrderDetail].[SalesOrderID]=[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderID] as [OH].[SalesOrderID]))
|--Compute Scalar(DEFINE: ([Expr1007]=[Expr1005]))
| |--Stream Aggregate(GROUP BY: ([AdventureWorks].[Sales].[SalesOrderDetail].[SalesOrderID]) DEFINE: ([Expr1005]=MAX([AdventureWorks].[Sales].[SalesOrderDetail].[ProductID])))
| |--Clustered Index Scan(OBJECT: ([AdventureWorks].[Sales].[SalesOrderDetail].[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID]), ORDERED FORWARD)
|--Clustered Index Scan(OBJECT: ([AdventureWorks].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [OH]), ORDERED FORWARD)
|--Nested Loops(Left Outer Join, OUTER REFERENCES: ([OH].[SalesOrderID], [Expr1007]) WITH UNORDERED PREFETCH)
|--Clustered Index Scan(OBJECT: ([AdventureWorks].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [OH]))
|--Compute Scalar(DEFINE: ([Expr1005]=[Expr1004]))
|--Stream Aggregate(DEFINE: ([Expr1004]=MAX([AdventureWorks].[Sales].[SalesOrderDetail].[ProductID] as [SOD].[ProductID])))
|--Clustered Index Seek(OBJECT: ([AdventureWorks].[Sales].[SalesOrderDetail].[PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] AS [SOD]), SEEK: ([SOD].[SalesOrderID]=[AdventureWorks].[Sales].[SalesOrderHeader].[SalesOrderID] as [OH].[SalesOrderID]) ORDERED FORWARD)
As you can see there is nothing criminal in the 2nd query but server decided to do a LOOP JOIN in this case.

I understand what you are saying in a simple example, but we cannot appropriately help the OP without the DDL. All the OP said was that it was an aggregate query. Did not say sum, max, or any other crazy arithmetical operation. Heck, for all we know it could be a running total.

Also, CROSS APPLY was not strictly designed for functions. You cannot take Microsoft's descriptions word for word and assume purpose. There are many cases outside of functions that are better and many cases where although the results may be the same for a join versus apply where the apply is much less resource intensive and faster. Again, this depends on certain factors in the DDL, which were not given here.

As far as "cleaner," I hate nested queries in the FROM clause. It makes script very hard to understand. That's why I say cleaner, of course, you cannot sacrifice performance for cleanliness.

Jared

Jared
CE - Microsoft

Jared SSC Guru Points: 61793 More actions · Answer 9

Maybe the simplest suggestion for cleanliness and readability is a cte assuming this is a simple aggregate:

;WITH cte (SalesOrderID, ProductID) AS

(SELECT SalesOrderID, MAX(ProductID) AS ProductID

FROM Sales.SalesOrderDetail

GROUP BY SalesOrderID)

SELECT OH.SalesOrderID, OH.OrderDate, cte.ProductID

FROM Sales.SalesOrderHeader OH

INNER JOIN cte

ON cte.SalesOrderID = OH.SalesOrderID

Jared

Jared
CE - Microsoft

Alexander Suprun SSCertifiable Points: 6253 More actions · Answer 10

p-nut (11/22/2011)
I understand what you are saying in a simple example, but we cannot appropriately help the OP without the DDL. All the OP said was that it was an aggregate query. Did not say sum, max, or any other crazy arithmetical operation. Heck, for all we know it could be a running total.
Also, CROSS APPLY was not strictly designed for functions. You cannot take Microsoft's descriptions word for word and assume purpose. There are many cases outside of functions that are better and many cases where although the results may be the same for a join versus apply where the apply is much less resource intensive and faster. Again, this depends on certain factors in the DDL, which were not given here.
As far as "cleaner," I hate nested queries in the FROM clause. It makes script very hard to understand. That's why I say cleaner, of course, you cannot sacrifice performance for cleanliness.
Jared

I think the discussion is already not about the OP question 🙂 I agree we can say or suggest nothing to OP now.

If by a chance you find an example where APPLY is much less resource intensive and faster than a JOIN I will be very like to see it.

If you have to do such kind of aggregation then you cannot avoid nested queries by simply using APPLY. It should be temp table, CTE or something like that. And yes I like using CTEs in such cases as well, especially when there are more than 1 level of nesting.

Alex

Alex Suprun

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 11

An example of the equivalency of the statements on a max compare vs. a cross apply. Adjust as needed.

This is a pared down testbed I use for this example. Think of a price logging table that you want the most recent price put into the table prior to an invoice so you can run historical invoices with their correct sale prices or different adjustments as marketing tries to get products to move as required.

Source Data:

IF OBJECT_ID( 'tempdb..#Items') IS NOT NULL

DROP TABLE #Items

CREATE TABLE #Items

(ItemIDINT IDENTITY(1,1) NOT NULL,

ItemDescVARCHAR(100),

SubcategoryIDINT,

PRIMARY KEY NONCLUSTERED ( ItemDesc)

)

CREATE CLUSTERED INDEX idx_Items ON #Items ( ItemID)

INSERT INTO #Items (ItemDesc, SubcategoryID)

SELECTNEWID() AS ItemDesc, N/500

FROMtempdb..Tally

WHEREN <= 25000

IF OBJECT_ID( 'tempdb..#PriceHistoryTable') IS NOT NULL

DROP TABLE #PriceHistoryTable

CREATE TABLE #PriceHistoryTable

(AuditIDINT IDENTITY(1,1) NOT NULL,

ItemIDINT NOT NULL,

ItemPrice DECIMAL(19,4),

NumData2 INT,

UpdateDate DATETIME,

ReasonForChange VARCHAR(200),

CommentField VARCHAR(2000)

)

CREATE CLUSTERED INDEX idx_TestData

ON #PriceHistoryTable

( UpdateDate, ItemID)

CREATE NONCLUSTERED INDEX idx_TestData2 ON #PriceHistoryTable

(ItemID, UpdateDate Desc)

INSERT INTO #PriceHistoryTable (ItemID, ItemPrice, NumData2, UpdateDate, ReasonForChange, CommentField)

SELECT

t.N%2500 AS ItemID, -- Adjust the modulo to decide the data density.

(t.n + 10) * 2 / 3 AS ItemPrice, -- Simply here as a filler

t.n AS NumData2, -- Some more filler

DATEADD( yyyy, 90, DATEADD( ms, ABS( CHECKSUM(NEWID()))%1000, DATEADD( s, ABS( CHECKSUM(NEWID()))%86400, ABS( CHECKSUM(NEWID())% (365 * 20)) )) ) AS UpdateDate, -- Give me a random date

NULL AS ReasonForChange,

REPLICATE(( '@', 500) AS CommentField

FROM

tempdb..Tally AS t,

tempdb..Tally AS t2

WHERE

t.N <= 50000

AND t2.N <= 20 --Control your testing rowcount here. 2 for 100k row test, 20 for 1mill.

Testable code. Remember to run more than once to allow for cache'ing and compiles.

SET STATISTICS IO, TIME ON

---------------------------- MIN/MAX Method

; WITH cte AS

( SELECT

ItemId,

Max( UpdateDate) AS MaxDate

FROM

#PriceHistoryTable

GROUP BY

ItemId

)

SELECT

td.ItemId,

td.ItemPrice,

td.NumData2,

td.UpdateDate

FROM

#PriceHistoryTable AS td

JOIN

cte

ON td.ItemId = cte.ItemId

AND td.Updatedate = cte.MaxDate

-----------------------Cross Apply Method

SELECT

dst.ItemID,

ca.ItemPrice,

ca.NumData2,

ca.UpdateDate

FROM

(SELECT DISTINCT ItemID FROM #PriceHistoryTable) AS dst

CROSS APPLY

(SELECT TOP 1 ItemID, ItemPrice, NumData2, UpdateDate

FROM #PriceHistoryTable AS td2

WHERE td2.ItemID = dst.ItemID

ORDER BY UpdateDate DESC) AS ca

SET STATISTICS IO, TIME OFF

I've attached my 'full test' code in case you're curious. This is for an article I'll eventually do in comparisons of index mechanics while working with application methods for logging table locatability like this. It's far from complete for my final intentions but I figured you might be curious.

For a run time idea of the time this takes:

(2500 row(s) affected)

Table '#PriceHistoryTable__________________________________________________________________________________________________000000000008'. Scan count 2501, logical reads 13249, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 265 ms, elapsed time = 397 ms.

(2500 row(s) affected)

Table '#PriceHistoryTable__________________________________________________________________________________________________000000000008'. Scan count 2501, logical reads 19914, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 172 ms, elapsed time = 270 ms.

Note the Cross Apply is faster here. This has to do with data density and index optimizations, I've been banging on this particular model for a while using it as a tool to dig at the engine.

If you review the Execution plans, of primary interest is that A) The cross apply lies about the cost in batch relation... even though it has more reads. and B) The cross apply includes a second nested loop construction.

It is still more performant. In this case. These two techniques almost always beat out the row_number methods, except for particular data densities. Which one to use is under a lot of circumstances pretty much a coin flip. If you want the particulars you'll have to wait for my article. 😉

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Jared SSC Guru Points: 61793 More actions · Answer 12

If by a chance you find an example where APPLY is much less resource intensive and faster than a JOIN I will be very like to see it.

Check this out: http://social.msdn.microsoft.com/Forums/en/transactsql/thread/6e39d29c-11df-45b5-8a7d-aab536e93acc

and this:

http://explainextended.com/2009/07/16/inner-join-vs-cross-apply/

"While most queries which employ CROSS APPLY can be rewritten using an INNER JOIN, CROSS APPLY can yield better execution plan and better performance, since it can limit the set being joined yet before the join occurs."

However, I do agree with you that in the case of the unattending OP and the information given that a cte is probably better than a CROSS APPLY.

Jared

Jared
CE - Microsoft

Jared SSC Guru Points: 61793 More actions · Answer 13

If you want the particulars you'll have to wait for my article. 😉

I can't wait!

Jared

Jared
CE - Microsoft

Alexander Suprun SSCertifiable Points: 6253 More actions · Answer 14

Thanks guys, very interesting stuff, I am definitely going to play with these scripts provided.

If you want the particulars you'll have to wait for my article. 😉

I'll be waiting as well!

Alex Suprun