January 31, 2011 at 1:18 pm
I have a test table below that contains Item, UnitPrice and ChangeDate data. It contains two item numbers. Each item number has four records each, with four different UnitPrices and ChangeDate. What I am trying to do is pull the Item and UnitPrice with the latest ChangeDate. The record set should look like this:
Item UnitPrice ChangeDate
------- --------------- -------------------------
1023997260.070000002008-11-01 00:00:00.000
102400139.200000002008-11-01 00:00:00.000
Can anyone help me? I will be happy to explain in more detail if needed or provide more information.
Thank you.
--===== If the test table already exists, drop it
IF OBJECT_ID('TempDB..#mytable','U') IS NOT NULL DROP TABLE #mytable
--===== Create the test table with
CREATE TABLE #mytable
(
Item varchar(30),
UnitPrice decimal(8),
ChangeDate datetime
)
--===== Insert the test data into the test table
INSERT INTO #mytable
(Item, UnitPrice, ChangeDate)
SELECT '102399', 1341.19559000, '2003-09-05 00:00:00.000' UNION ALL
SELECT '102399', 1442.14580000, '2005-09-15 00:00:00.000' UNION ALL
SELECT '102399', 6849.12000000, '2006-09-19 00:00:00.000' UNION ALL
SELECT '102399', 7260.07000000, '2008-11-01 00:00:00.000' UNION ALL
SELECT '102400', 17.87387000, '2003-09-05 00:00:00.000' UNION ALL
SELECT '102400', 19.21922000, '2005-09-15 00:00:00.000' UNION ALL
SELECT '102400', 131.32000000, '2006-09-19 00:00:00.000' UNION ALL
SELECT '102400', 139.20000000, '2008-11-01 00:00:00.000'
January 31, 2011 at 1:28 pm
After having done some other tests, this method seems to be the quickest, especially if you have a solid index on Item/ChangePriceDate
SELECT
mt.Item,
mt.UnitPrice,
mt.ChangeDate
FROM
#myTable AS mt
JOIN
(select
Item,
MAX( ChangeDate) AS MaxDate
FROM
#mytable
group by
Item
) AS drv
ON
mt.ChangeDate = drv.MaxDate
AND mt.Item = drv.Item
First it finds the last of the ChangeDates. Then it restricts the data to only those rows.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 31, 2011 at 1:29 pm
What have you tried already and where are you running into problems? I'd recommend looking at Row_Number() or other ranking functions.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 31, 2011 at 1:42 pm
Craig Farrell (1/31/2011)
After having done some other tests, this method seems to be the quickest, especially if you have a solid index on Item/ChangePriceDate
My tests indicate the opposite. I've found that using Row_Number() is faster.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 31, 2011 at 1:42 pm
Thanks, Craig. This seems to work. My concern was that, the way you constructed it, sinced it didn't pull the price along with the max date and the item that there would be no way to connect them. I will plug this code into my sproc.
Much appreciated.
January 31, 2011 at 2:06 pm
drew.allen (1/31/2011)
Craig Farrell (1/31/2011)
After having done some other tests, this method seems to be the quickest, especially if you have a solid index on Item/ChangePriceDateMy tests indicate the opposite. I've found that using Row_Number() is faster.
Drew
Depends on data density. Wayne and I went back and forth on this a while back and we found that both ways have their benifits. It depends on both solid indexing as well as the row occurence.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 31, 2011 at 2:07 pm
sdownen05 (1/31/2011)
Thanks, Craig. This seems to work. My concern was that, the way you constructed it, sinced it didn't pull the price along with the max date and the item that there would be no way to connect them. I will plug this code into my sproc.Much appreciated.
As long as change date is unique you'll be fine. What it does in english is for each item find the maximum ChangeDate. Then, find the rows in the original table that only match up to the previously found list of item/maxdate, and return your columns.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 1, 2011 at 9:22 am
drew.allen (1/31/2011)
Craig Farrell (1/31/2011)
After having done some other tests, this method seems to be the quickest, especially if you have a solid index on Item/ChangePriceDateMy tests indicate the opposite. I've found that using Row_Number() is faster.
Drew
Ah... you have tests. 😉 You should post them. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
February 1, 2011 at 10:08 am
Jeff Moden (2/1/2011)
drew.allen (1/31/2011)
Craig Farrell (1/31/2011)
After having done some other tests, this method seems to be the quickest, especially if you have a solid index on Item/ChangePriceDateMy tests indicate the opposite. I've found that using Row_Number() is faster.
Drew
Ah... you have tests. 😉 You should post them. 🙂
I did post them in another thread http://www.sqlservercentral.com/Forums/FindPost1054915.aspx
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
February 1, 2011 at 11:32 am
Removed invalid results due to mistake in cache anticipation. See next post.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 1, 2011 at 12:43 pm
Alright, so, some proof it's all about the data:
Using this as a test code:
SET NOCOUNT ON
IF OBJECT_ID( 'tempdb..#TestData') IS NOT NULL
DROP TABLE #TestData
IF OBJECT_ID( 'tempdb..#Tally') IS NOT NULL
DROP TABLE #Tally
-- Slightly modified from tally article
SELECT TOP 50000
IDENTITY(INT,1,1) AS N
INTO #Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
ALTER TABLE #Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
CREATE TABLE #TestData
(GroupID INT,
NumData1INT,
NumData2INT,
UpdateDateDATETIME,
JunkData1VARCHAR(200),
JunkData2VARCHAR(2000)
)
INSERT INTO #TestData
SELECT
t.N%10 AS GroupID,-- Adjust the modulo to decide the # of groups to compare.
(t.n + 10) * 2 / 3 AS NumData1, -- Simply here as a filler
t.n AS NumData2, -- Some more filler
CAST( DATEADD( yyyy, 100, ABS( CHECKSUM(NEWID())%3650)) AS DATETIME) AS UpdateDate, -- Give me a random date
NULL AS JunkData1,
REPLICATE( '@', 2000) AS JunkData2
FROM
#Tally AS t,
#Tally AS t2
WHERE
t.N <= 50000
AND t2.N <= 20 --Control your testing rowcount here. 2 for 100k row test, 20 for 1mill.
-- Now, an intelligent index to make sure we only add to the tail, like a good prod system
CREATE CLUSTERED INDEX idx_TestData
ON #TestData
( UpdateDate, GroupID)
-- A couple of quick metrics
DECLARE @DurationTime DATETIME,
@RunTestLoop INT
SET @RunTestLoop = 1
SET @DurationTime = GETDATE()
WHILE @RunTestLoop <=5
BEGIN
Print 'TestLoop: ' + CAST( @RunTestLoop AS VARCHAR(3))
; WITH cte AS
( SELECT
groupID,
Max( UpdateDate) AS MaxDate
FROM
#TestData
GROUP BY
GroupID
)
SELECT
td.GroupID,
td.NumData1,
td.NumData2,
td.UpdateDate
FROM
#TestData AS td
JOIN
cte
ONtd.GroupID = cte.GroupID
AND td.Updatedate = cte.MaxDate
PRINT 'MaxDate Method took: ' + CAST( DATEDIFF( ms, @DurationTime, getdate()) AS VARCHAR(30)) + ' milliseconds.'
SET @DurationTime = GETDATE()
-- RowNumber() method
; WITH cte AS
( SELECT GroupID,
UpdateDate,
ROW_NUMBER() OVER ( Partition By GroupID ORDER BY UpdateDate DESC) AS RowNum
FROM
#TestData)
SELECT
td.GroupID,
td.NumData1,
td.NumData2,
td.UpdateDate
FROM
#TestData AS td
JOIN
cte
ONtd.GroupID = cte.GroupID
AND td.UpdateDate = cte.UpdateDate
where
cte.Rownum = 1
PRINT 'RowNum Method took: ' + CAST( DATEDIFF( ms, @DurationTime, getdate()) AS VARCHAR(30)) + ' milliseconds.'
SET @DurationTime = GETDATE()
SET @RunTestLoop = @RunTestLoop + 1
END
SET @RunTestLoop = 1
-- Create a descending noncluster for our purposes for Row_Number()
CREATE NONCLUSTERED INDEX idx_TestData2 ON #TestData
(GroupID, UpdateDate Desc)
-- Reset the Durationtime AFTER the index build
SET @DurationTime = GETDATE()
print ' '
print ' With Nonclustered Index'
WHILE @RunTestLoop <=5
BEGIN
Print 'TestLoop: ' + CAST( @RunTestLoop AS VARCHAR(3))
; WITH cte AS
( SELECT
groupID,
Max( UpdateDate) AS MaxDate
FROM
#TestData WITH ( INDEX(0))
GROUP BY
GroupID
)
SELECT
td.GroupID,
td.NumData1,
td.NumData2,
td.UpdateDate
FROM
#TestData AS td
JOIN
cte
ONtd.GroupID = cte.GroupID
AND td.Updatedate = cte.MaxDate
PRINT 'MaxDate Method took: ' + CAST( DATEDIFF( ms, @DurationTime, getdate()) AS VARCHAR(30)) + ' milliseconds.'
SET @DurationTime = GETDATE()
-- RowNumber() method
; WITH cte AS
( SELECT GroupID,
UpdateDate,
ROW_NUMBER() OVER ( Partition By GroupID ORDER BY UpdateDate DESC) AS RowNum
FROM
#TestData)
SELECT
td.GroupID,
td.NumData1,
td.NumData2,
td.UpdateDate
FROM
#TestData AS td
JOIN
cte
ONtd.GroupID = cte.GroupID
AND td.UpdateDate = cte.UpdateDate
where
cte.Rownum = 1
PRINT 'RowNum Method took: ' + CAST( DATEDIFF( ms, @DurationTime, getdate()) AS VARCHAR(30)) + ' milliseconds.'
SET @DurationTime = GETDATE()
SET @RunTestLoop = @RunTestLoop + 1
END
These are the run results. If we swap MaxDate and RowNum methods, you'll find first run hammers the RowNum method just as badly.
Please note that without the nonclustered index, the rownum method is very badly performing. It needs the index presorted to its purpose before it's useful enough to compete.
TestLoop: 1
MaxDate Method took: 9526 milliseconds.
RowNum Method took: 893 milliseconds.
TestLoop: 2
MaxDate Method took: 110 milliseconds.
RowNum Method took: 943 milliseconds.
TestLoop: 3
MaxDate Method took: 136 milliseconds.
RowNum Method took: 893 milliseconds.
TestLoop: 4
MaxDate Method took: 106 milliseconds.
RowNum Method took: 976 milliseconds.
TestLoop: 5
MaxDate Method took: 133 milliseconds.
RowNum Method took: 903 milliseconds.
With Nonclustered Index
TestLoop: 1
MaxDate Method took: 116 milliseconds.
RowNum Method took: 266 milliseconds.
TestLoop: 2
MaxDate Method took: 113 milliseconds.
RowNum Method took: 276 milliseconds.
TestLoop: 3
MaxDate Method took: 113 milliseconds.
RowNum Method took: 286 milliseconds.
TestLoop: 4
MaxDate Method took: 113 milliseconds.
RowNum Method took: 263 milliseconds.
TestLoop: 5
MaxDate Method took: 116 milliseconds.
RowNum Method took: 280 milliseconds.
For some additional run results, Change GroupID from %10 to %10000 to change the distribution of values the other way from a density perspective:
TestLoop: 1
MaxDate Method took: 15650 milliseconds.
RowNum Method took: 1096 milliseconds.
TestLoop: 2
MaxDate Method took: 330 milliseconds.
RowNum Method took: 986 milliseconds.
TestLoop: 3
MaxDate Method took: 336 milliseconds.
RowNum Method took: 1116 milliseconds.
TestLoop: 4
MaxDate Method took: 343 milliseconds.
RowNum Method took: 1093 milliseconds.
TestLoop: 5
MaxDate Method took: 323 milliseconds.
RowNum Method took: 1080 milliseconds.
With Nonclustered Index
TestLoop: 1
MaxDate Method took: 333 milliseconds.
RowNum Method took: 476 milliseconds.
TestLoop: 2
MaxDate Method took: 393 milliseconds.
RowNum Method took: 473 milliseconds.
TestLoop: 3
MaxDate Method took: 416 milliseconds.
RowNum Method took: 483 milliseconds.
TestLoop: 4
MaxDate Method took: 483 milliseconds.
RowNum Method took: 463 milliseconds.
TestLoop: 5
MaxDate Method took: 403 milliseconds.
RowNum Method took: 433 milliseconds.
A lot of the time here was displaying the resultset to my desktop.
Now, if we remove the JunkData2 column (change REPLICATE( '@', 2000) to NULL AS JunkData2) and get ourselves a decent rowcount/page in the clustered (keeping the %10000 distribution):
TestLoop: 1
MaxDate Method took: 716 milliseconds.
RowNum Method took: 1106 milliseconds.
TestLoop: 2
MaxDate Method took: 293 milliseconds.
RowNum Method took: 1053 milliseconds.
TestLoop: 3
MaxDate Method took: 273 milliseconds.
RowNum Method took: 993 milliseconds.
TestLoop: 4
MaxDate Method took: 256 milliseconds.
RowNum Method took: 1150 milliseconds.
TestLoop: 5
MaxDate Method took: 286 milliseconds.
RowNum Method took: 1183 milliseconds.
With Nonclustered Index
TestLoop: 1
MaxDate Method took: 230 milliseconds.
RowNum Method took: 660 milliseconds.
TestLoop: 2
MaxDate Method took: 286 milliseconds.
RowNum Method took: 660 milliseconds.
TestLoop: 3
MaxDate Method took: 280 milliseconds.
RowNum Method took: 670 milliseconds.
TestLoop: 4
MaxDate Method took: 280 milliseconds.
RowNum Method took: 673 milliseconds.
TestLoop: 5
MaxDate Method took: 276 milliseconds.
RowNum Method took: 596 milliseconds.
In very few cases, with solid indexing (especially for an audit table), will rowNumber seem to outperform MAX methodology. There are some instances where it will, but as a rule of thumb it does not.
For the sake of completeness, I also re-ran without the IndexHint on the MaxDate method after the Nonclustered index is introduced. Results from that run:
With Nonclustered Index
TestLoop: 1
MaxDate Method took: 206 milliseconds.
RowNum Method took: 610 milliseconds.
TestLoop: 2
MaxDate Method took: 306 milliseconds.
RowNum Method took: 603 milliseconds.
TestLoop: 3
MaxDate Method took: 300 milliseconds.
RowNum Method took: 576 milliseconds.
TestLoop: 4
MaxDate Method took: 300 milliseconds.
RowNum Method took: 546 milliseconds.
TestLoop: 5
MaxDate Method took: 323 milliseconds.
RowNum Method took: 576 milliseconds.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 2, 2011 at 6:32 am
MAX DATE method has one downside though - you can get duplicate output if more than one record has the same MAX DATE. This should not happen with ROW_NUMBER() method.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 2, 2011 at 10:56 am
TheSQLGuru (2/2/2011)
MAX DATE method has one downside though - you can get duplicate output if more than one record has the same MAX DATE. This should not happen with ROW_NUMBER() method.
Agreed, which is why I made the comment earlier that ChangeDate should be unique per group.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 2, 2011 at 4:46 pm
I've done some bench mark tests comparing ROW_NUMBER() to doing a CROSS APPLY with a TOP 1 type of construct. Even if the TOP 1/CROSS APPLY has a covering index to allow for a quick index seek, the ROW_NUMBER() will beat it for a small number of rows within the partition.
At some point as the number of rows increase within the partition the CROSS APPLY will start to pull away from either MAX or ROW_NUMBER() since it will seek rather than have to touch upon each row.
The absolute fastest I've seen is using the MAX in such a way in the CTE or derived table that the Optimizer itself figures out that you really want a TOP operator. I think Paul White posted something on this not too long ago. Anyway I tried it out and it's lightning fast.
Todd Fifield
February 2, 2011 at 7:05 pm
tfifield (2/2/2011)
At some point as the number of rows increase within the partition the CROSS APPLY will start to pull away from either MAX or ROW_NUMBER() since it will seek rather than have to touch upon each row.
That makes sense. If you chance upon the article again, I'd love to see the benchmarking.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply