Need the Price with the Latest Date

Question

Need the Price with the Latest Date

ducksql@gmail.com

SSCrazy

Points: 2838
More actions
January 31, 2011 at 1:18 pm

#230930

I have a test table below that contains Item, UnitPrice and ChangeDate data. It contains two item numbers. Each item number has four records each, with four different UnitPrices and ChangeDate. What I am trying to do is pull the Item and UnitPrice with the latest ChangeDate. The record set should look like this:
Item UnitPrice ChangeDate
------- --------------- -------------------------
1023997260.070000002008-11-01 00:00:00.000
102400139.200000002008-11-01 00:00:00.000
Can anyone help me? I will be happy to explain in more detail if needed or provide more information.
Thank you.
--===== If the test table already exists, drop it
IF OBJECT_ID('TempDB..#mytable','U') IS NOT NULL DROP TABLE #mytable
--===== Create the test table with
CREATE TABLE #mytable
(
Item varchar(30),
UnitPrice decimal(8),
ChangeDate datetime
)
--===== Insert the test data into the test table
INSERT INTO #mytable
(Item, UnitPrice, ChangeDate)
SELECT '102399', 1341.19559000, '2003-09-05 00:00:00.000' UNION ALL
SELECT '102399', 1442.14580000, '2005-09-15 00:00:00.000' UNION ALL
SELECT '102399', 6849.12000000, '2006-09-19 00:00:00.000' UNION ALL
SELECT '102399', 7260.07000000, '2008-11-01 00:00:00.000' UNION ALL
SELECT '102400', 17.87387000, '2003-09-05 00:00:00.000' UNION ALL
SELECT '102400', 19.21922000, '2005-09-15 00:00:00.000' UNION ALL
SELECT '102400', 131.32000000, '2006-09-19 00:00:00.000' UNION ALL
SELECT '102400', 139.20000000, '2008-11-01 00:00:00.000'

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

After having done some other tests, this method seems to be the quickest, especially if you have a solid index on Item/ChangePriceDate

SELECT

mt.Item,

mt.UnitPrice,

mt.ChangeDate

FROM

#myTable AS mt

JOIN

(select

Item,

MAX( ChangeDate) AS MaxDate

FROM

#mytable

group by

Item

) AS drv

ON

mt.ChangeDate = drv.MaxDate

AND mt.Item = drv.Item

First it finds the last of the ChangeDates. Then it restricts the data to only those rows.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

drew.allen SSC Guru Points: 77014 More actions · Answer 2

What have you tried already and where are you running into problems? I'd recommend looking at Row_Number() or other ranking functions.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

drew.allen SSC Guru Points: 77014 More actions · Answer 3

Craig Farrell (1/31/2011)
After having done some other tests, this method seems to be the quickest, especially if you have a solid index on Item/ChangePriceDate

My tests indicate the opposite. I've found that using Row_Number() is faster.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

ducksql@gmail.com SSCrazy Points: 2838 More actions · Answer 4

Thanks, Craig. This seems to work. My concern was that, the way you constructed it, sinced it didn't pull the price along with the max date and the item that there would be no way to connect them. I will plug this code into my sproc.

Much appreciated.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 5

drew.allen (1/31/2011)
Craig Farrell (1/31/2011)
After having done some other tests, this method seems to be the quickest, especially if you have a solid index on Item/ChangePriceDate
My tests indicate the opposite. I've found that using Row_Number() is faster.
Drew

Depends on data density. Wayne and I went back and forth on this a while back and we found that both ways have their benifits. It depends on both solid indexing as well as the row occurence.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 6

sdownen05 (1/31/2011)
Thanks, Craig. This seems to work. My concern was that, the way you constructed it, sinced it didn't pull the price along with the max date and the item that there would be no way to connect them. I will plug this code into my sproc.
Much appreciated.

As long as change date is unique you'll be fine. What it does in english is for each item find the maximum ChangeDate. Then, find the rows in the original table that only match up to the previously found list of item/maxdate, and return your columns.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

drew.allen (1/31/2011)
Craig Farrell (1/31/2011)
After having done some other tests, this method seems to be the quickest, especially if you have a solid index on Item/ChangePriceDate
My tests indicate the opposite. I've found that using Row_Number() is faster.
Drew

Ah... you have tests. 😉 You should post them. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

drew.allen SSC Guru Points: 77014 More actions · Answer 8

Jeff Moden (2/1/2011)
drew.allen (1/31/2011)
Craig Farrell (1/31/2011)
After having done some other tests, this method seems to be the quickest, especially if you have a solid index on Item/ChangePriceDate
My tests indicate the opposite. I've found that using Row_Number() is faster.
Drew
Ah... you have tests. 😉 You should post them. 🙂

I did post them in another thread http://www.sqlservercentral.com/Forums/FindPost1054915.aspx

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 9

Removed invalid results due to mistake in cache anticipation. See next post.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 10

Alright, so, some proof it's all about the data:

Using this as a test code:

SET NOCOUNT ON

IF OBJECT_ID( 'tempdb..#TestData') IS NOT NULL

DROP TABLE #TestData

IF OBJECT_ID( 'tempdb..#Tally') IS NOT NULL

DROP TABLE #Tally

-- Slightly modified from tally article

SELECT TOP 50000

IDENTITY(INT,1,1) AS N

INTO #Tally

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2

ALTER TABLE #Tally

ADD CONSTRAINT PK_Tally_N

PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

CREATE TABLE #TestData

(GroupID INT,

NumData1INT,

NumData2INT,

UpdateDateDATETIME,

JunkData1VARCHAR(200),

JunkData2VARCHAR(2000)

)

INSERT INTO #TestData

SELECT

t.N%10 AS GroupID,-- Adjust the modulo to decide the # of groups to compare.

(t.n + 10) * 2 / 3 AS NumData1, -- Simply here as a filler

t.n AS NumData2, -- Some more filler

CAST( DATEADD( yyyy, 100, ABS( CHECKSUM(NEWID())%3650)) AS DATETIME) AS UpdateDate, -- Give me a random date

NULL AS JunkData1,

REPLICATE( '@', 2000) AS JunkData2

FROM

#Tally AS t,

#Tally AS t2

WHERE

t.N <= 50000

AND t2.N <= 20 --Control your testing rowcount here. 2 for 100k row test, 20 for 1mill.

-- Now, an intelligent index to make sure we only add to the tail, like a good prod system

CREATE CLUSTERED INDEX idx_TestData

ON #TestData

( UpdateDate, GroupID)

-- A couple of quick metrics

DECLARE @DurationTime DATETIME,

@RunTestLoop INT

SET @RunTestLoop = 1

SET @DurationTime = GETDATE()

WHILE @RunTestLoop <=5

BEGIN

Print 'TestLoop: ' + CAST( @RunTestLoop AS VARCHAR(3))

; WITH cte AS

( SELECT

groupID,

Max( UpdateDate) AS MaxDate

FROM

#TestData

GROUP BY

GroupID

)

SELECT

td.GroupID,

td.NumData1,

td.NumData2,

td.UpdateDate

FROM

#TestData AS td

JOIN

cte

ONtd.GroupID = cte.GroupID

AND td.Updatedate = cte.MaxDate

PRINT 'MaxDate Method took: ' + CAST( DATEDIFF( ms, @DurationTime, getdate()) AS VARCHAR(30)) + ' milliseconds.'

SET @DurationTime = GETDATE()

-- RowNumber() method

; WITH cte AS

( SELECT GroupID,

UpdateDate,

ROW_NUMBER() OVER ( Partition By GroupID ORDER BY UpdateDate DESC) AS RowNum

FROM

#TestData)

SELECT

td.GroupID,

td.NumData1,

td.NumData2,

td.UpdateDate

FROM

#TestData AS td

JOIN

cte

ONtd.GroupID = cte.GroupID

AND td.UpdateDate = cte.UpdateDate

where

cte.Rownum = 1

PRINT 'RowNum Method took: ' + CAST( DATEDIFF( ms, @DurationTime, getdate()) AS VARCHAR(30)) + ' milliseconds.'

SET @DurationTime = GETDATE()

SET @RunTestLoop = @RunTestLoop + 1

END

SET @RunTestLoop = 1

-- Create a descending noncluster for our purposes for Row_Number()

CREATE NONCLUSTERED INDEX idx_TestData2 ON #TestData

(GroupID, UpdateDate Desc)

-- Reset the Durationtime AFTER the index build

SET @DurationTime = GETDATE()

print ' '

print ' With Nonclustered Index'

WHILE @RunTestLoop <=5

BEGIN

Print 'TestLoop: ' + CAST( @RunTestLoop AS VARCHAR(3))

; WITH cte AS

( SELECT

groupID,

Max( UpdateDate) AS MaxDate

FROM

#TestData WITH ( INDEX(0))

GROUP BY

GroupID

)

SELECT

td.GroupID,

td.NumData1,

td.NumData2,

td.UpdateDate

FROM

#TestData AS td

JOIN

cte

ONtd.GroupID = cte.GroupID

AND td.Updatedate = cte.MaxDate

PRINT 'MaxDate Method took: ' + CAST( DATEDIFF( ms, @DurationTime, getdate()) AS VARCHAR(30)) + ' milliseconds.'

SET @DurationTime = GETDATE()

-- RowNumber() method

; WITH cte AS

( SELECT GroupID,

UpdateDate,

ROW_NUMBER() OVER ( Partition By GroupID ORDER BY UpdateDate DESC) AS RowNum

FROM

#TestData)

SELECT

td.GroupID,

td.NumData1,

td.NumData2,

td.UpdateDate

FROM

#TestData AS td

JOIN

cte

ONtd.GroupID = cte.GroupID

AND td.UpdateDate = cte.UpdateDate

where

cte.Rownum = 1

PRINT 'RowNum Method took: ' + CAST( DATEDIFF( ms, @DurationTime, getdate()) AS VARCHAR(30)) + ' milliseconds.'

SET @DurationTime = GETDATE()

SET @RunTestLoop = @RunTestLoop + 1

END

These are the run results. If we swap MaxDate and RowNum methods, you'll find first run hammers the RowNum method just as badly.

Please note that without the nonclustered index, the rownum method is very badly performing. It needs the index presorted to its purpose before it's useful enough to compete.

TestLoop: 1

MaxDate Method took: 9526 milliseconds.

RowNum Method took: 893 milliseconds.

TestLoop: 2

MaxDate Method took: 110 milliseconds.

RowNum Method took: 943 milliseconds.

TestLoop: 3

MaxDate Method took: 136 milliseconds.

RowNum Method took: 893 milliseconds.

TestLoop: 4

MaxDate Method took: 106 milliseconds.

RowNum Method took: 976 milliseconds.

TestLoop: 5

MaxDate Method took: 133 milliseconds.

RowNum Method took: 903 milliseconds.

With Nonclustered Index

TestLoop: 1

MaxDate Method took: 116 milliseconds.

RowNum Method took: 266 milliseconds.

TestLoop: 2

MaxDate Method took: 113 milliseconds.

RowNum Method took: 276 milliseconds.

TestLoop: 3

MaxDate Method took: 113 milliseconds.

RowNum Method took: 286 milliseconds.

TestLoop: 4

MaxDate Method took: 113 milliseconds.

RowNum Method took: 263 milliseconds.

TestLoop: 5

MaxDate Method took: 116 milliseconds.

RowNum Method took: 280 milliseconds.

For some additional run results, Change GroupID from %10 to %10000 to change the distribution of values the other way from a density perspective:

TestLoop: 1

MaxDate Method took: 15650 milliseconds.

RowNum Method took: 1096 milliseconds.

TestLoop: 2

MaxDate Method took: 330 milliseconds.

RowNum Method took: 986 milliseconds.

TestLoop: 3

MaxDate Method took: 336 milliseconds.

RowNum Method took: 1116 milliseconds.

TestLoop: 4

MaxDate Method took: 343 milliseconds.

RowNum Method took: 1093 milliseconds.

TestLoop: 5

MaxDate Method took: 323 milliseconds.

RowNum Method took: 1080 milliseconds.

With Nonclustered Index

TestLoop: 1

MaxDate Method took: 333 milliseconds.

RowNum Method took: 476 milliseconds.

TestLoop: 2

MaxDate Method took: 393 milliseconds.

RowNum Method took: 473 milliseconds.

TestLoop: 3

MaxDate Method took: 416 milliseconds.

RowNum Method took: 483 milliseconds.

TestLoop: 4

MaxDate Method took: 483 milliseconds.

RowNum Method took: 463 milliseconds.

TestLoop: 5

MaxDate Method took: 403 milliseconds.

RowNum Method took: 433 milliseconds.

A lot of the time here was displaying the resultset to my desktop.

Now, if we remove the JunkData2 column (change REPLICATE( '@', 2000) to NULL AS JunkData2) and get ourselves a decent rowcount/page in the clustered (keeping the %10000 distribution):

TestLoop: 1

MaxDate Method took: 716 milliseconds.

RowNum Method took: 1106 milliseconds.

TestLoop: 2

MaxDate Method took: 293 milliseconds.

RowNum Method took: 1053 milliseconds.

TestLoop: 3

MaxDate Method took: 273 milliseconds.

RowNum Method took: 993 milliseconds.

TestLoop: 4

MaxDate Method took: 256 milliseconds.

RowNum Method took: 1150 milliseconds.

TestLoop: 5

MaxDate Method took: 286 milliseconds.

RowNum Method took: 1183 milliseconds.

With Nonclustered Index

TestLoop: 1

MaxDate Method took: 230 milliseconds.

RowNum Method took: 660 milliseconds.

TestLoop: 2

MaxDate Method took: 286 milliseconds.

RowNum Method took: 660 milliseconds.

TestLoop: 3

MaxDate Method took: 280 milliseconds.

RowNum Method took: 670 milliseconds.

TestLoop: 4

MaxDate Method took: 280 milliseconds.

RowNum Method took: 673 milliseconds.

TestLoop: 5

MaxDate Method took: 276 milliseconds.

RowNum Method took: 596 milliseconds.

In very few cases, with solid indexing (especially for an audit table), will rowNumber seem to outperform MAX methodology. There are some instances where it will, but as a rule of thumb it does not.

For the sake of completeness, I also re-ran without the IndexHint on the MaxDate method after the Nonclustered index is introduced. Results from that run:

With Nonclustered Index

TestLoop: 1

MaxDate Method took: 206 milliseconds.

RowNum Method took: 610 milliseconds.

TestLoop: 2

MaxDate Method took: 306 milliseconds.

RowNum Method took: 603 milliseconds.

TestLoop: 3

MaxDate Method took: 300 milliseconds.

RowNum Method took: 576 milliseconds.

TestLoop: 4

MaxDate Method took: 300 milliseconds.

RowNum Method took: 546 milliseconds.

TestLoop: 5

MaxDate Method took: 323 milliseconds.

RowNum Method took: 576 milliseconds.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 11

MAX DATE method has one downside though - you can get duplicate output if more than one record has the same MAX DATE. This should not happen with ROW_NUMBER() method.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 12

TheSQLGuru (2/2/2011)
MAX DATE method has one downside though - you can get duplicate output if more than one record has the same MAX DATE. This should not happen with ROW_NUMBER() method.

Agreed, which is why I made the comment earlier that ChangeDate should be unique per group.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

tfifield SSCrazy Eights Points: 9655 More actions · Answer 13

I've done some bench mark tests comparing ROW_NUMBER() to doing a CROSS APPLY with a TOP 1 type of construct. Even if the TOP 1/CROSS APPLY has a covering index to allow for a quick index seek, the ROW_NUMBER() will beat it for a small number of rows within the partition.

At some point as the number of rows increase within the partition the CROSS APPLY will start to pull away from either MAX or ROW_NUMBER() since it will seek rather than have to touch upon each row.

The absolute fastest I've seen is using the MAX in such a way in the CTE or derived table that the Optimizer itself figures out that you really want a TOP operator. I think Paul White posted something on this not too long ago. Anyway I tried it out and it's lightning fast.

Todd Fifield

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 14

tfifield (2/2/2011)
At some point as the number of rows increase within the partition the CROSS APPLY will start to pull away from either MAX or ROW_NUMBER() since it will seek rather than have to touch upon each row.

That makes sense. If you chance upon the article again, I'd love to see the benchmarking.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA