November 30, 2011 at 1:01 pm
I often perform reports which require joining the most recent of a many-to-one table joined back to the main. I've been using the below pattern for this type of query, but I was wondering if there is a better approach.
Any feedback would be greatly appreciated.
SELECT a.CustomerID, a.CustomerName, ..., b.CallDate, b.CallerID, b.CallDuration, ...
FROM Customers a
LEFT OUTER JOIN
CustomerCalls b
ON b.ID =
(
SELECT TOP 1 ID
FROM CustomerCalls
WHERE CustomerID = a.CustomerID
ORDER BY CallDate DESC
)
November 30, 2011 at 1:42 pm
I would probably use a cte, something like this:
;
with MostRecentCalls as (
select CustomerID, max(ID) as callID
from CustomerCalls
group by CustomerID
)
select a.CustomerID, a.CustomerName, ..., C.CallDate, C.CallerID, C.CallDuration, ...
FROM Customers a
LEFT OUTER JOIN
MostRecentCalls b on a.CustomerID = b.CustomerID
JOIN
CustomerCalls C on b.callID = C.ID
The probability of survival is inversely proportional to the angle of arrival.
November 30, 2011 at 2:25 pm
Craig has already posted some test scripts with different solutions and comparison in another topic.
Check this out: http://www.sqlservercentral.com/Forums/FindPost1210550.aspx
November 30, 2011 at 3:21 pm
The CROSS APPLY solution is pretty interesting, I haven't messed with that type of query before.
I did some timed tests using Jeff Modon's million row test table, which I modified to have a million row main table and a 50 million row sub table with pricing information.
-- Build the test data
IF OBJECT_ID('tempdb..#TempMain') IS NULL BEGIN
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO #TempMain
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2
ALTER TABLE #TempMain
ADD CONSTRAINT PK_TempMain_RowNum PRIMARY KEY CLUSTERED (RowNum)
END
IF OBJECT_ID('tempdb..#TempSub') IS NULL BEGIN
SELECT TOP 50000000
RowNum = IDENTITY(INT,1,1),
Parent = ABS(CHECKSUM(NEWID()))%1000000+1,
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME)
INTO #TempSub
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2
ALTER TABLE #TempSub
ADD CONSTRAINT PK_TempSub_RowNum PRIMARY KEY CLUSTERED (RowNum)
CREATE INDEX IX_TempSub_Parent ON #TempSub ( Parent, SomeDate ) INCLUDE ( RowNum )
END
I tried three different approaches:
Method 1 is the pattern I've been using, doing a left join from the main table and using a subquery to get the id of the sub table.
Method 2 uses a CTE to calculate the most recent date and then uses that join to both the main and sub table.
Method 3 uses the OUTER APPLY approach (as opposed to a CROSS APPLY, to ensure we have a main record when nothing exists in the sub table.
-- Throwaway variables.
DECLARE @RowNum1 int, @SomeLetters2 char(2), @SomeCSV varchar(80), @RowNum2 int, @SomeMoney money, @SomeDate datetime
-- Method 1
SET STATISTICS TIME ON
SELECT @RowNum1 = a.RowNum, @SomeLetters2 = a.SomeLetters2, @SomeCSV = a.SomeCSV, @RowNum2 = b.RowNum, @SomeMoney = b.SomeMoney, @SomeDate = b.SomeDate
FROM #TempMain a
LEFT OUTER JOIN
#TempSub b
ON b.RowNum =
(
SELECT TOP 1 RowNum
FROM #TempSub
WHERE Parent = a.RowNum
ORDER BY SomeDate DESC
)
SET STATISTICS TIME OFF
-- Method 2
SET STATISTICS TIME ON
;WITH CTE AS (
SELECT Parent, MAX(SomeDate) as MaxDate
FROM #TempSub
GROUP BY Parent
)
SELECT @RowNum1 = a.RowNum, @SomeLetters2 = a.SomeLetters2, @SomeCSV = a.SomeCSV, @RowNum2 = c.RowNum, @SomeMoney = c.SomeMoney, @SomeDate = c.SomeDate
FROM #TempMain a
LEFT OUTER JOIN
CTE b ON a.RowNum = b.Parent
LEFT OUTER JOIN
#TempSub c ON a.RowNum = c.RowNum AND b.MaxDate = c.SomeDate
SET STATISTICS TIME OFF
-- Method 3
SET STATISTICS TIME ON
SELECT @RowNum1 = a.RowNum, @SomeLetters2 = a.SomeLetters2, @SomeCSV = a.SomeCSV, @RowNum2 = b.RowNum, @SomeMoney = b.SomeMoney, @SomeDate = b.SomeDate
FROM #TempMain a
CROSS APPLY
(
SELECT TOP 1 RowNum, SomeMoney, SomeDate
FROM #TempSub
WHERE Parent = a.RowNum
ORDER BY SomeDate DESC
) b
SET STATISTICS TIME OFF
Results:
SQL Server Execution Times:
CPU time = 16005 ms, elapsed time = 1230 ms.
SQL Server Execution Times:
CPU time = 37159 ms, elapsed time = 3370 ms.
SQL Server Execution Times:
CPU time = 16285 ms, elapsed time = 1238 ms.
Method 1 is pretty solid, though the OUTER APPLY in Method 3 is almost the same.
Are there any other approaches worth giving a try? Or modification of the three queries I've tested above?
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply