Get related fields from record containing MAX()

Question

Get related fields from record containing MAX()

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply

pdanes SSCrazy Eights Points: 8497 More actions · Answer 1

So far, in my experience, the solution which has performed quickest for this kind of operation is the self join with a proper covering index, it just always seems to perform quicker than the windowed function, largely becaus of the sort operation required when doing it that way.

Thanks for the tip. The self-join is indeed fast - zero seconds reported elapsed time, with a source dataset of nearly 100,000 records, joined to several smaller auxiliary tables. I haven't tried the OVER clause on my real data yet, too much other stuff in the queue at the moment, but it looks like a good project to fiddle with over the weekend. Even if it's slower in the end, I still want to get acquainted with it - it looks quite useful.

Matthew Darwin SSCertifiable Points: 5513 More actions · Answer 2

If no ORDER BY clause is included in the ROW_NUMBER() function, the row numbers generated are arbitrary. I should also say this will apply if you order using a non unique column, the row numbers will be arbitrarily assigned over the window of that group.

It's also worth noting that you can use MAX() with an OVER clause too.

The ROW_NUMBER() solution performs best if you include the exact covering index required for that solution, so in this case:-

CREATE INDEX idx_T

ON #T (ColA, ColB desc)

INCLUDE (ColC)

But as I say, in real world functions, I've usually found that the self join approach works best, especially as datasets increase in size. This is largely due to the sort operation required to generate the ROW_NUMBER if no covering index is present.

Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

pdanes SSCrazy Eights Points: 8497 More actions · Answer 3

Matthew Darwin (5/11/2012)
If no ORDER BY clause is included in the ROW_NUMBER() function, the row numbers generated are arbitrary. I should also say this will apply if you order using a non unique column, the row numbers will be arbitrarily assigned over the window of that group.
It's also worth noting that you can use MAX() with an OVER clause too.
The ROW_NUMBER() solution performs best if you include the exact covering index required for that solution, so in this case:-
CREATE INDEX idx_T
ON #T (ColA, ColB desc)
INCLUDE (ColC)
But as I say, in real world functions, I've usually found that the self join approach works best, especially as datasets increase in size. This is largely due to the sort operation required to generate the ROW_NUMBER if no covering index is present.

Ah, I see. Thank you for the explanations and the sample. I'll add it to my goodie box and see what kind of results I get on my actual data. Seems I have my free time taken care of for this weekend.:-)

Matthew Darwin SSCertifiable Points: 5513 More actions · Answer 4

If you get the chance, I'd recommend checking out Itzik Ben-Gan's "Inside Microsoft SQL Server 2008: T-SQL Querying" book, it has some really good examples of the benefits of using the OVER clause against traditional grouping, and when it seems to work best of all.

Another possible solution, that from testing seems to work best when you're returning a very small result set, is using CROSS APPLY:-

SELECT t1.ColA

, t1.ColB

, t1.ColC

FROM #T t1

CROSS APPLY (SELECT MAX(ColB) MaxColB

FROM #T t2

WHERE t1.ColA = t2.ColA

) t3

WHERE t1.ColB = t3.MaxColB

Over any decent sized dataset this really doesn't perform well though, due to the way that the MAX has to be calculated seperately for each row input.

Have a play about with them and check out the query plans and you'll see pretty quickly which method seems to work best for you.

Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 5

You can read this post to get to know about PARTITION BY and OVER clauses.