Get related fields from record containing MAX()

  • So far, in my experience, the solution which has performed quickest for this kind of operation is the self join with a proper covering index, it just always seems to perform quicker than the windowed function, largely becaus of the sort operation required when doing it that way.

    Thanks for the tip. The self-join is indeed fast - zero seconds reported elapsed time, with a source dataset of nearly 100,000 records, joined to several smaller auxiliary tables. I haven't tried the OVER clause on my real data yet, too much other stuff in the queue at the moment, but it looks like a good project to fiddle with over the weekend. Even if it's slower in the end, I still want to get acquainted with it - it looks quite useful.

  • If no ORDER BY clause is included in the ROW_NUMBER() function, the row numbers generated are arbitrary. I should also say this will apply if you order using a non unique column, the row numbers will be arbitrarily assigned over the window of that group.

    It's also worth noting that you can use MAX() with an OVER clause too.

    The ROW_NUMBER() solution performs best if you include the exact covering index required for that solution, so in this case:-

    CREATE INDEX idx_T

    ON #T (ColA, ColB desc)

    INCLUDE (ColC)

    But as I say, in real world functions, I've usually found that the self join approach works best, especially as datasets increase in size. This is largely due to the sort operation required to generate the ROW_NUMBER if no covering index is present.

    Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

  • Matthew Darwin (5/11/2012)


    If no ORDER BY clause is included in the ROW_NUMBER() function, the row numbers generated are arbitrary. I should also say this will apply if you order using a non unique column, the row numbers will be arbitrarily assigned over the window of that group.

    It's also worth noting that you can use MAX() with an OVER clause too.

    The ROW_NUMBER() solution performs best if you include the exact covering index required for that solution, so in this case:-

    CREATE INDEX idx_T

    ON #T (ColA, ColB desc)

    INCLUDE (ColC)

    But as I say, in real world functions, I've usually found that the self join approach works best, especially as datasets increase in size. This is largely due to the sort operation required to generate the ROW_NUMBER if no covering index is present.

    Ah, I see. Thank you for the explanations and the sample. I'll add it to my goodie box and see what kind of results I get on my actual data. Seems I have my free time taken care of for this weekend.:-)

  • If you get the chance, I'd recommend checking out Itzik Ben-Gan's "Inside Microsoft SQL Server 2008: T-SQL Querying" book, it has some really good examples of the benefits of using the OVER clause against traditional grouping, and when it seems to work best of all.

    Another possible solution, that from testing seems to work best when you're returning a very small result set, is using CROSS APPLY:-

    SELECT t1.ColA

    , t1.ColB

    , t1.ColC

    FROM #T t1

    CROSS APPLY (SELECT MAX(ColB) MaxColB

    FROM #T t2

    WHERE t1.ColA = t2.ColA

    ) t3

    WHERE t1.ColB = t3.MaxColB

    Over any decent sized dataset this really doesn't perform well though, due to the way that the MAX has to be calculated seperately for each row input.

    Have a play about with them and check out the query plans and you'll see pretty quickly which method seems to work best for you.

    Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

  • You can read this post to get to know about PARTITION BY and OVER clauses.

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply