OVER(PARTITION BY...) vs. GROUP BY

  • create table mytable (policyid int, trandate datetime, amount money)

    insert mytable (policyid, trandate, amount)

    values (1, '01/02/08', 12.00)

    insert mytable (policyid, trandate, amount)

    values (1, '01/03/08', 15.00)

    insert mytable (policyid, trandate, amount)

    values (1, '01/04/08', 10.00)

    insert mytable (policyid, trandate, amount)

    values (2, '01/02/08', 12.00)

    insert mytable (policyid, trandate, amount)

    values (2, '01/03/08', 11.00)

    select sum(amount) over(partition by policyid),

    max(trandate) over(partition by policyid)

    from mytable

    select sum(amount),

    max(trandate),

    policyid

    from mytable

    group by policyid

    Why when I use GROUP BY it returns two unique records, exactly as I expect,

    and when I use OVER(PARTITION BY), it repeats the records as many times as there are records in the table?

    I prefer using OVER in this case because I have a lot of other items on the select list (which I didn't put into my sample in order to make it simpler). There are two aggregates as you see on the SELECT as well as about 10 other row values.

    I need to get one record from the table for each policyid. I only want to see the records with the maximum trandate.

    Please advise.

    Thank you!

  • Because a GROUP BY totals the set of data and OVER summarizes the data over the set. It's just the way it works. 😉

    Here are a couple of options for you

    ;With totals (policyId, TotalAmount, MaxTranDate) As

    (Select policyId

    ,sum(amount)

    ,max(trandate)

    From mytable

    Group By policyId)

    Select *

    From mytable m

    Join totals t On t.policyId = m.policyId

    And t.MaxTranDate = m.trandate;

    ;With totals (PolicyId, TotalAmount, MaxTranDate, rn) As

    (Select PolicyId

    ,sum(amount) over(Partition By policyid)

    ,max(trandate) over(Partition By policyid)

    ,row_number() over(Partition By policyid Order By trandate desc)

    From mytable)

    Select * From totals Where rn = 1;

    Oh, and replace the '*' with the actual columns 🙂

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • You'd be well advised to check performance as well if you plan on starting to use windowed aggregates a lot, or on large sets. Sad to say, but they tend to perform rather badly, much worse in fact than going through and setting up a derived table query with your aggregates in it (somehow 2005 doesn't understand how to optimize its own built-in windowed aggregate....).

    So - very convenient for the writing and the concept, not so hot at all on the perf side of the house.

    As of right now, I tend to restrict my use of OVER to only those time when I am pulling a LOT of stats based on divergent groupings (where the number of derived tables needed tends to overwhelm everything else), and even then - I tend to preaggregate before I feed it into the OVER clauses.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thank you!

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply