best practice for returning when matching a MAX()

Question

best practice for returning when matching a MAX()

Oblio Leitch

SSCrazy

Points: 2137
More actions
August 1, 2007 at 10:25 am

#174666

I want to return an different column for each row having a MAX(). There doesn't seem to be an intuitive way to do this. For example:
select topicID, lastPosterID
where postDate=MAX(postDate)
the problem comes in grouping. SQL wants you to group by everything in the select list, and you can't just grab the other data on that row. This forces you to join back, like so:
select topicID, lastPosterID
from tbl
join (
select topicID, max(postDate)
group by topicID
) "match" on topicID=topicID and postDate=postDate
Is this how it *has to* be, or is there a better way? Can MAX() OVER() be leveraged somehow?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Aaron West SSC Eights! Points: 808 More actions · Answer 1

There's a method of converting everything to strings, then finding the MAX, then separating out substrings and casting back as necessary. It can be rather cumbersome for multiple columns, however. It should look something like this, I think:

SELECT topicID, lastPosterID =

SUBSTRING (

MAX( CONVERT(CHAR(23), postDate, 121)

+ CAST(lastPosterID AS CHAR(10))

), 24, 10)

FROM tbl

GROUP BY topicID

Tell me if it works for you. I forgot the name of the guy who told me this technique (in a $99 day seminar in 1995 about aggregating along trees...he has a book on that subject).

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 2

SELECT d.TopicID,

d.LastPosterID,

d.PostDate

FROM (

SELECT TopicID,

LastPosterID,

PostDate,

ROW_NUMBER() OVER (PARTITION BY TopicID ORDER BY PostDate DESC) AS RecID

FROM Tbl

) AS d

WHERE d.RecID = 1

ORDER BY d.TopicID

N 56°04'39.16"
E 12°55'05.25"

Oblio Leitch SSCrazy Points: 2137 More actions · Answer 3

Aaron, that's an interesting approach. Basically, you're appending the data you want to use to the end of the field you're sorting on. Once sorted, you simply break off the piece you need. I wonder how you'd extend it to include a third field and how to sort in different directions. Sometimes "old-school" shouldn't be discounted.

Peter, yeah, that's one way. I can't help thinking that it should be even simpler, though. I can't believe that I'm the only one to need this from time to time.

Aaron West SSC Eights! Points: 808 More actions · Answer 4

For multiple fields, you could retrieve the concatenated chunk in a subquery, then break it into pieces in the parent query.

Instead of Peter's technique, I think you could use TOP 1 somehow...

Results 1 - 10 of about 218,000,000 for top 1 for each group

#2 http://expertanswercenter.techtarget.com/eac/knowledgebaseAnswer/0,295199,sid63_gci980115,00.html

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 5

select topicID, lastPosterID
from tbl
where postDate = (select max(postDate) from tbl)

The problem is that, unless you truncate the dates when entering them into the table, you will still only return the last entry made. If you want to see every entry made on the same day as the last entry made:

select topicID, lastPosterID
from tbl
where DateDiff( dd, postDate, (select max(postDate) from tbl)) = 0

This could be a killer for efficiency. If the postDate field is not indexed, it would end up doing do two complete table scans. If it is indexed, it would still do one.

If this is a large table and this is a query you make often, there are a couple of possibilities.

truncate the date when you insert it:

insert into tbl (..., postDate, ...)
values(..., DateAdd( DateDiff( dd, @PostDate, 0), 0), ...)

or, if you want to keep the time value, create another field to hold the truncated value:
```
insert into tbl (..., postDate, postDay, ...)
values(..., @PostDate, DateAdd( DateDiff( dd, @PostDate, 0), 0), ...)
```
Then you can substitute postDay for postDate in my first query above.

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

Oblio Leitch SSCrazy Points: 2137 More actions · Answer 6

Thanks for your reply, Tomm. But you seem to be missing the point of the GROUP BY -- the idea is if I have, say 5 topics, I want the latest one in each. The essence of your query (and most other suggestions) is a subquery, essentially doing the lookup twice. I was wondering if there's a more efficient way.

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 7

Maybe you could post some sample data and a sample of what you want the output to look like.

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

Joe Clifford SSCrazy Eights Points: 8739 More actions · Answer 8

Try this, you're looking for a derived table:

select

topicid, lastposterid, postdate

from tbl

inner join (

select topicid, max(postdate) as maxpost from tbl group by topicid) lastpost on lastpost.topicid = tbl.topicid and tbl.postdate = lastpost.maxpost

Joe

dbajunior SSCrazy Points: 2307 More actions · Answer 9

Not exactly a model of efficiency, but here goes:

select

topicID, lastPosterID, postdate

from tbl

where convert(char(10),postDate,101) in

(select max(convert(char(10),postDate,101)) from tbl)

Bob Fazio SSChampion Points: 10794 More actions · Answer 10

Very interesting approach, so I put it to the test to see how efficient it really was. Sorry to say, it wasn't.

I did the following:

Table DDL: -- FYI the table has roughtly 200k rows.

CREATE TABLE [tblApplicationLog](

[FromApplication] [varchar](50) NOT NULL,

[EntryTime] [datetime] NOT NULL,

...

CREATE CLUSTERED INDEX [IX_tblApplicationLog] ON [tblApplicationLog]

(

[FromApplication] ASC,

[WhenEntered] ASC

)

--Standard Approach to the problem

select * from

tblApplicationLog A

Inner Join ( select FromApplication,max(EntryTime) MET from tblApplicationLog

group by FromApplication

) B

ON B.FromApplication = A.FromApplication

and B.MET = A.EntryTime

-- Your Approach

SELECT * FROM (

SELECT ROW_NUMBER() OVER (PARTITION BY FromApplication ORDER BY EntryTime DESC) RN,

* FROM tblApplicationLog) B

WHERE B.RN=1

ORDER BY B.FromApplication

-- Alternate approach (random rows returned from another post-- This I really liked).

SELECT * FROM (

SELECT ROW_NUMBER() OVER (PARTITION BY FromApplication ORDER BY NEWID() DESC) RN,

* FROM tblApplicationLog) B

WHERE B.RN=1

ORDER BY B.FromApplication

Results.

Roughly 375ms each for the two row_number() methods, and only 6ms for the standard approach.

Another issue, is the actual query returns 21 rows (some rows with the same EntryTime) but your approach only returns 17

I still may find a use for that technique, and it is a good one to have in my bag of tricks, but the best way to do a max, don't think so.