February 19, 2012 at 4:05 pm
The data
IF OBJECT_ID('tempdb..#temp1') IS NOT NULL DROP TABLE #temp1;
CREATE TABLE #temp1
(
ProdID INT NOT NULL
,DayDate DATETIME NOT NULL
)
INSERT INTO #temp1 SELECT 5646546,'07/08/2009'
INSERT INTO #temp1 SELECT 5646546,'10/26/2009'
INSERT INTO #temp1 SELECT 5646546,'10/27/2010'
INSERT INTO #temp1 SELECT 5646546,'10/30/2010'
INSERT INTO #temp1 SELECT 5646546,'11/10/2010'
INSERT INTO #temp1 SELECT 5646546,'11/11/2011'
INSERT INTO #temp1 SELECT 5646546,'11/16/2011'
INSERT INTO #temp1 SELECT 5646546,'12/16/2011'
INSERT INTO #temp1 SELECT 5646546,'02/14/2012'
SELECT * FROM #temp1
ORDER BY DayDate DESC
IF OBJECT_ID('tempdb..#temp2') IS NOT NULL DROP TABLE #temp2;
CREATE TABLE #temp2
(
ProdID INT NOT NULL
,DayDate DATETIME NOT NULL
,Units INT
,Value MONEY
)
INSERT INTO #temp2 SELECT 5646546,'01/14/2011',504.00,3402.00
INSERT INTO #temp2 SELECT 5646546,'11/02/2010',564.00,3931.08
INSERT INTO #temp2 SELECT 5646546,'10/07/2010',492.00,3468.60
INSERT INTO #temp2 SELECT 5646546,'09/02/2010',288.00,2113.92
INSERT INTO #temp2 SELECT 5646546,'12/10/2009',835.00,6688.35
INSERT INTO #temp2 SELECT 5646546,'11/11/2009',252.00,2086.56
INSERT INTO #temp2 SELECT 5646546,'10/05/2009',444.00,3472.08
INSERT INTO #temp2 SELECT 5646546,'08/28/2009',444.00,3516.48
INSERT INTO #temp2 SELECT 5646546,'12/29/2008',504.00,3628.80
INSERT INTO #temp2 SELECT 5646546,'12/08/2008',228.00,1639.32
INSERT INTO #temp2 SELECT 5646546,'10/31/2008',228.00,1871.88
INSERT INTO #temp2 SELECT 5646546,'10/13/2008',384.00,3022.08
INSERT INTO #temp2 SELECT 5646546,'08/20/2008',384.00,2903.04
INSERT INTO #temp2 SELECT 5646546,'01/31/2008',147.00,1093.68
INSERT INTO #temp2 SELECT 5646546,'01/07/2008',378.00,2766.96
INSERT INTO #temp2 SELECT 5646546,'11/16/2007',406.00,2943.50
INSERT INTO #temp2 SELECT 5646546,'10/29/2007',210.00,1543.50
INSERT INTO #temp2 SELECT 5646546,'09/24/2007',406.00,2943.50
INSERT INTO #temp2 SELECT 5646546,'08/20/2007',224.00,1617.28
SELECT * FROM #temp2
ORDER BY DayDate DESC
IF OBJECT_ID('tempdb..#temp1') IS NOT NULL DROP TABLE #temp1;
IF OBJECT_ID('tempdb..#temp2') IS NOT NULL DROP TABLE #temp2;
Example, Looking at #temp1 record 5646546,'07/08/2009' I am try to find the nearest record prior to the '07/08/2009' in #temp2 by row . The result would be this one 5646546,'12/29/2008',504.00,3628.80)
So the JOIN between the #temp1 and #temp2 some how has to has to have a MAX on #temp2 DayDate
Any ideas how to do this
February 19, 2012 at 4:22 pm
I guess the OUTER APPLY approach will return the data you're looking for:
SELECT *
FROM #temp1
OUTER APPLY
(
SELECT TOP 1 *
FROM #temp2
WHERE #temp2.ProdID =#temp1.ProdID
AND #temp2.DayDate < #temp1.DayDate
ORDER BY #temp2.DayDate DESC
)t2
February 19, 2012 at 4:59 pm
Thanks will learn more on this OUTER APPLY
February 19, 2012 at 6:02 pm
I believe this solution will also work and if you compare the actual query plan to the CROSS APPLY solution, it is about 5x better.
SELECT *
FROM (
SELECT #temp1.ProdID As ProdID1
, #temp1.DayDate As DayDate1
, #temp2.ProdID As ProdID2
, #temp2.DayDate As DayDate2
, Units, Value
, ROW_NUMBER() OVER (PARTITION BY #temp1.ProdID, #temp1.DayDate ORDER BY #temp1.ProdID, #temp2.DayDate, #temp2.DayDate) As rk
FROM #temp1
INNER JOIN #temp2
ON #temp2.ProdID =#temp1.ProdID
AND #temp2.DayDate < #temp1.DayDate
) x WHERE rk = 1
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 19, 2012 at 9:47 pm
dwain.c (2/19/2012)
I believe this solution will also work
Unfortunately, it doesn't. (See the results below). That's why I always ask for test data which the OP very nicely provided on this thread.
and if you compare the actual query plan to the CROSS APPLY solution, it is about 5x better.
Heh... it does actually run about 5 times faster. But you shouldn't use the "Percent of Batch" to compare performance expectations. I can show you code that is exactly backwards for "Percent of Batch" compared to the actual performance. Also, things do tend to run faster when you haven't actually calculated the correct answer.
Here are the results of the OP's test data and your code.
ProdID1 DayDate1 ProdID2 DayDate2 Units Value rk
----------- ----------------------- ----------- ----------------------- ----------- --------------------- --------------------
5646546 2009-07-08 00:00:00.000 5646546 2007-08-20 00:00:00.000 224 1617.28 1
5646546 2009-10-26 00:00:00.000 5646546 2007-08-20 00:00:00.000 224 1617.28 1
5646546 2010-10-27 00:00:00.000 5646546 2007-08-20 00:00:00.000 224 1617.28 1
5646546 2010-10-30 00:00:00.000 5646546 2007-08-20 00:00:00.000 224 1617.28 1
5646546 2010-11-10 00:00:00.000 5646546 2007-08-20 00:00:00.000 224 1617.28 1
5646546 2011-11-11 00:00:00.000 5646546 2007-08-20 00:00:00.000 224 1617.28 1
5646546 2011-11-16 00:00:00.000 5646546 2007-08-20 00:00:00.000 224 1617.28 1
5646546 2011-12-16 00:00:00.000 5646546 2007-08-20 00:00:00.000 224 1617.28 1
5646546 2012-02-14 00:00:00.000 5646546 2007-08-20 00:00:00.000 224 1617.28 1
--Jeff Moden
Change is inevitable... Change for the better is not.
February 20, 2012 at 1:26 am
Jeff - Thanks for pointing that out. I was a bit rushed (had to run off to a meeting) so I didn't carefully check the result set.
However the concept was right. I just made two minor mistakes and the query plan cost is still 20% of the OUTER JOIN.
SELECT *
FROM (
SELECT #temp1.ProdID As ProdID1
, #temp1.DayDate As DayDate1
, #temp2.ProdID As ProdID2
, #temp2.DayDate As DayDate2
, Units, Value
, ROW_NUMBER() OVER (PARTITION BY #temp1.ProdID, #temp1.DayDate
ORDER BY #temp1.ProdID, #temp1.DayDate DESC) As rk
FROM #temp1
INNER JOIN #temp2
ON #temp2.ProdID = #temp1.ProdID AND #temp2.DayDate < #temp1.DayDate
) x WHERE rk = 1
Results (OUTER JOIN is first):
Query plan cost compared (mine is the second):
However when running with STATISTICS TIME ON/STATISTICS IO ON, it appears mine runs just a tad longer elapsed time. I think you'd need to try both with a larger data set to be sure. As Jeff notes, the query plan cost doesn't always tell the entire story. I've seen plans costing 5% vs. 95% that run slower in practice. It is possible that the difference in run times (it was small) is just noise. Possibly once SQL Server gets smarter and has the query plan cached, the second would eventually run faster. The plan cost went from 25%/5% on my first try to 27%/5% on the above try.
Apologies to the original poster if I confused things. If you have a larger data set to run it on, I'd be interested to know which worked out best in the long run.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 20, 2012 at 3:49 am
After speculating about the timing with a larger record set, I got curious.
So I ran these two queries after creating/INSERTing into the two temp tables. They basicaly duplicate the existing part number 10000 times (all records) by adding a tally to the part number.
;WITH Nbrs_3( n ) AS ( SELECT 1 UNION SELECT 0 )
,Nbrs_2( n ) AS ( SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2 )
,Nbrs_1( n ) AS ( SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2 )
,Nbrs_0( n ) AS ( SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2 )
,Nbrs ( n ) AS ( SELECT 1 FROM Nbrs_0 n1 CROSS JOIN Nbrs_0 n2 )
INSERT INTO #temp1
SELECT ProdID + n As ProdID, DayDate FROM #temp1
CROSS APPLY (SELECT ROW_NUMBER() OVER (ORDER BY n) As n FROM Nbrs) D
WHERE n <=10000
;WITH Nbrs_3( n ) AS ( SELECT 1 UNION SELECT 0 )
,Nbrs_2( n ) AS ( SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2 )
,Nbrs_1( n ) AS ( SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2 )
,Nbrs_0( n ) AS ( SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2 )
,Nbrs ( n ) AS ( SELECT 1 FROM Nbrs_0 n1 CROSS JOIN Nbrs_0 n2 )
INSERT INTO #temp2
SELECT ProdID + n As ProdID, DayDate, Units, Value FROM #temp2
CROSS APPLY (SELECT ROW_NUMBER() OVER (ORDER BY n) As n FROM Nbrs) D
WHERE n <=10000
With
SET STATISTICS IO ON
SET STATISTICS TIME ON
I got the below results, which is a pretty strong indicator that the PARTITION version of the query is a bit faster in practice:
Table 'Worktable'. Scan count 270027, logical reads 1639697, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#temp2______________________________________________________________________________________________________________00000000002F'. Scan count 1, logical reads 776, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#temp1______________________________________________________________________________________________________________00000000002E'. Scan count 1, logical reads 234, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 3604 ms, elapsed time = 3883 ms.
(90009 row(s) affected)
Table 'Worktable'. Scan count 90009, logical reads 771571, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#temp2______________________________________________________________________________________________________________00000000002F'. Scan count 1, logical reads 776, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#temp1______________________________________________________________________________________________________________00000000002E'. Scan count 1, logical reads 234, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 1373 ms, elapsed time = 1571 ms.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 20, 2012 at 3:59 am
dwain.c (2/20/2012)
Jeff - Thanks for pointing that out. I was a bit rushed (had to run off to a meeting) so I didn't carefully check the result set.However the concept was right. I just made two minor mistakes and the query plan cost is still 20% of the OUTER JOIN.
SELECT *
FROM (
SELECT #temp1.ProdID As ProdID1
, #temp1.DayDate As DayDate1
, #temp2.ProdID As ProdID2
, #temp2.DayDate As DayDate2
, Units, Value
, ROW_NUMBER() OVER (PARTITION BY #temp1.ProdID, #temp1.DayDate
ORDER BY #temp1.ProdID, #temp1.DayDate DESC) As rk
FROM #temp1
INNER JOIN #temp2
ON #temp2.ProdID = #temp1.ProdID AND #temp2.DayDate < #temp1.DayDate
) x WHERE rk = 1
Results (OUTER JOIN is first):
Query plan cost compared (mine is the second):
However when running with STATISTICS TIME ON/STATISTICS IO ON, it appears mine runs just a tad longer elapsed time. I think you'd need to try both with a larger data set to be sure. As Jeff notes, the query plan cost doesn't always tell the entire story. I've seen plans costing 5% vs. 95% that run slower in practice. It is possible that the difference in run times (it was small) is just noise. Possibly once SQL Server gets smarter and has the query plan cached, the second would eventually run faster. The plan cost went from 25%/5% on my first try to 27%/5% on the above try.
Apologies to the original poster if I confused things. If you have a larger data set to run it on, I'd be interested to know which worked out best in the long run.
Can't be right surely?
PARTITION BY #temp1.ProdID, #temp1.DayDate
ORDER BY #temp1.ProdID, #temp1.DayDate DESC
More likely to be
PARTITION BY #temp1.ProdID, #temp1.DayDate
ORDER BY #temp2.DayDate DESC
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537February 20, 2012 at 5:01 am
Mark,
Good catch. It seems these 3 all produce the same results. Must be due to the ordering of the data coming into the table and coincidence.
PARTITION BY #temp1.ProdID, #temp1.DayDate
ORDER BY #temp1.ProdID, #temp1.DayDate DESC
PARTITION BY #temp1.ProdID, #temp1.DayDate
ORDER BY #temp1.ProdID DESC
PARTITION BY #temp1.ProdID, #temp1.DayDate
ORDER BY #temp1.ProdID, #temp2.DayDate DESC
I think the correction solution is the last one.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply