August 1, 2007 at 10:25 am
I want to return an different column for each row having a MAX(). There doesn't seem to be an intuitive way to do this. For example:
select topicID, lastPosterID
where postDate=MAX(postDate)
the problem comes in grouping. SQL wants you to group by everything in the select list, and you can't just grab the other data on that row. This forces you to join back, like so:
select topicID, lastPosterID
from tbl
join (
select topicID, max(postDate)
group by topicID
) "match" on topicID=topicID and postDate=postDate
Is this how it *has to* be, or is there a better way? Can MAX() OVER() be leveraged somehow?
August 2, 2007 at 12:04 am
There's a method of converting everything to strings, then finding the MAX, then separating out substrings and casting back as necessary. It can be rather cumbersome for multiple columns, however. It should look something like this, I think:
SELECT topicID, lastPosterID =
SUBSTRING (
MAX( CONVERT(CHAR(23), postDate, 121)
+ CAST(lastPosterID AS CHAR(10))
), 24, 10)
FROM tbl
GROUP BY topicID
Tell me if it works for you. I forgot the name of the guy who told me this technique (in a $99 day seminar in 1995 about aggregating along trees...he has a book on that subject).
August 2, 2007 at 1:16 am
SELECT d.TopicID,
d.LastPosterID,
d.PostDate
FROM (
SELECT TopicID,
LastPosterID,
PostDate,
ROW_NUMBER() OVER (PARTITION BY TopicID ORDER BY PostDate DESC) AS RecID
FROM Tbl
) AS d
WHERE d.RecID = 1
ORDER BY d.TopicID
N 56°04'39.16"
E 12°55'05.25"
August 2, 2007 at 5:28 am
Aaron, that's an interesting approach. Basically, you're appending the data you want to use to the end of the field you're sorting on. Once sorted, you simply break off the piece you need. I wonder how you'd extend it to include a third field and how to sort in different directions. Sometimes "old-school" shouldn't be discounted.
Peter, yeah, that's one way. I can't help thinking that it should be even simpler, though. I can't believe that I'm the only one to need this from time to time.
August 2, 2007 at 7:06 am
For multiple fields, you could retrieve the concatenated chunk in a subquery, then break it into pieces in the parent query.
Instead of Peter's technique, I think you could use TOP 1 somehow...
Results 1 - 10 of about 218,000,000 for top 1 for each group
#2 http://expertanswercenter.techtarget.com/eac/knowledgebaseAnswer/0,295199,sid63_gci980115,00.html
August 2, 2007 at 12:45 pm
select topicID, lastPosterID from tbl where postDate = (select max(postDate) from tbl)
The problem is that, unless you truncate the dates when entering them into the table, you will still only return the last entry made. If you want to see every entry made on the same day as the last entry made:
select topicID, lastPosterID from tbl where DateDiff( dd, postDate, (select max(postDate) from tbl)) = 0
This could be a killer for efficiency. If the postDate field is not indexed, it would end up doing do two complete table scans. If it is indexed, it would still do one.
If this is a large table and this is a query you make often, there are a couple of possibilities.
insert into tbl (..., postDate, ...) values(..., DateAdd( DateDiff( dd, @PostDate, 0), 0), ...)
insert into tbl (..., postDate, postDay, ...) values(..., @PostDate, DateAdd( DateDiff( dd, @PostDate, 0), 0), ...)
Then you can substitute postDay for postDate in my first query above.
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
August 3, 2007 at 8:51 am
Thanks for your reply, Tomm. But you seem to be missing the point of the GROUP BY -- the idea is if I have, say 5 topics, I want the latest one in each. The essence of your query (and most other suggestions) is a subquery, essentially doing the lookup twice. I was wondering if there's a more efficient way.
August 3, 2007 at 3:34 pm
Maybe you could post some sample data and a sample of what you want the output to look like.
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
August 6, 2007 at 11:56 pm
Try this, you're looking for a derived table:
select
topicid, lastposterid, postdate
from tbl
inner join (
select topicid, max(postdate) as maxpost from tbl group by topicid) lastpost on lastpost.topicid = tbl.topicid and tbl.postdate = lastpost.maxpost
Joe
August 7, 2007 at 4:09 pm
Not exactly a model of efficiency, but here goes:
select
topicID, lastPosterID, postdate
from tbl
where convert(char(10),postDate,101) in
(select max(convert(char(10),postDate,101)) from tbl)
August 9, 2007 at 8:37 am
Very interesting approach, so I put it to the test to see how efficient it really was. Sorry to say, it wasn't.
I did the following:
Table DDL: -- FYI the table has roughtly 200k rows.
CREATE TABLE [tblApplicationLog](
[FromApplication] [varchar](50) NOT NULL,
[EntryTime] [datetime] NOT NULL,
...
CREATE CLUSTERED INDEX [IX_tblApplicationLog] ON [tblApplicationLog]
(
[FromApplication] ASC,
[WhenEntered] ASC
)
--Standard Approach to the problem
select * from
tblApplicationLog A
Inner Join ( select FromApplication,max(EntryTime) MET from tblApplicationLog
group by FromApplication
) B
ON B.FromApplication = A.FromApplication
and B.MET = A.EntryTime
-- Your Approach
SELECT * FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY FromApplication ORDER BY EntryTime DESC) RN,
* FROM tblApplicationLog) B
WHERE B.RN=1
ORDER BY B.FromApplication
-- Alternate approach (random rows returned from another post-- This I really liked).
SELECT * FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY FromApplication ORDER BY NEWID() DESC) RN,
* FROM tblApplicationLog) B
WHERE B.RN=1
ORDER BY B.FromApplication
Results.
Roughly 375ms each for the two row_number() methods, and only 6ms for the standard approach.
Another issue, is the actual query returns 21 rows (some rows with the same EntryTime) but your approach only returns 17
I still may find a use for that technique, and it is a good one to have in my bag of tricks, but the best way to do a max, don't think so.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply