Aggregation reloaded vs. RBAR

  • Say I have a table employee (employeeID, name, dateOfBirth, departmentID) and want to retrieve the youngest employee per department. I'd do something like

    select

    e1.employeeID,

    e1.name,

    e1.dateOfBirth,

    e1.departmentID

    from

    employee e1 join

    (

    select departmentID, max(dateOfBirth) maxDate

    from employee

    group by departmentID

    ) e2

    on e1.departmentID=e2.departmentID and e1.dateOfBirth=e2.maxDate

    This is a pattern I meet quite often. I'd rather expect SQL to offer some syntax like

    select

    employeeID,

    name,

    dateOfBirth,

    departmentID

    from

    employee

    where dateOfBirth = max(dateOfBirth) grouped by departmentID

    Wouldn't that be a useful SQL extension?

    If not: Is that the way you would code it or do you recommend some other pattern?

  • Since you're in 2005, you can actually use the new windowed version of max() to do just that. It looks something like this:

    select

    employeeID,

    name,

    dateOfBirth,

    departmentID

    from

    employee

    where dateOfBirth = max(dateOfBirth) OVER (Partition by departmentID)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thank you for your reply.

    That's what I would have hoped for, but I get an error saying that windowed functions are valid only in select and order by clauses.

    I think that the use case for windowed functions is creating different groupings/partitions and therefore retrieving different subtotals for different partitions.

    Anyway: I tried

    select

    employeeID,

    name,

    dateOfBirth

    departmentID

    from

    (

    select

    employeeID,

    name,

    dateOfBirth,

    departmentID,

    max(dateOfBirth) over (Partition by departmentID) maxdate

    ) e

    where maxdate=dateOfBirth

    but the performance is worse (at least with my index configuration):

    The execution plan of the JOIN version shows parallelism, sort, segementation, top while the PARTITION BY version shows something I would have expected to occur in the JOIN version:

    Spooling the data twice then the calculation of the aggregate, joining the streams and so on.

    But on the other hand it is easier to read and maintain.

  • yup - forgot all about that little restriction...

    And yes - it can sometimes turn out to be (quite a bit) less efficient than the older-style derived table. Looks to me to be a rather massive optimizer screw-up (since it should always be possible for the optimizer to reoptimize as a derived table when appropriate), but there's not much to be done about it now.....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply