Aggregation reloaded vs. RBAR

Question

Aggregation reloaded vs. RBAR

Alexander G.

SSCrazy

Points: 2705
More actions
May 20, 2008 at 7:44 am

#380593

Say I have a table employee (employeeID, name, dateOfBirth, departmentID) and want to retrieve the youngest employee per department. I'd do something like
select
e1.employeeID,
e1.name,
e1.dateOfBirth,
e1.departmentID
from
employee e1 join
(
select departmentID, max(dateOfBirth) maxDate
from employee
group by departmentID
) e2
on e1.departmentID=e2.departmentID and e1.dateOfBirth=e2.maxDate
This is a pattern I meet quite often. I'd rather expect SQL to offer some syntax like
select
employeeID,
name,
dateOfBirth,
departmentID
from
employee
where dateOfBirth = max(dateOfBirth) grouped by departmentID
Wouldn't that be a useful SQL extension?
If not: Is that the way you would code it or do you recommend some other pattern?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 1

Since you're in 2005, you can actually use the new windowed version of max() to do just that. It looks something like this:

select

employeeID,

name,

dateOfBirth,

departmentID

from

employee

where dateOfBirth = max(dateOfBirth) OVER (Partition by departmentID)

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Alexander G. SSCrazy Points: 2705 More actions · Answer 2

Thank you for your reply.

That's what I would have hoped for, but I get an error saying that windowed functions are valid only in select and order by clauses.

I think that the use case for windowed functions is creating different groupings/partitions and therefore retrieving different subtotals for different partitions.

Anyway: I tried

select

employeeID,

name,

dateOfBirth

departmentID

from

(

select

employeeID,

name,

dateOfBirth,

departmentID,

max(dateOfBirth) over (Partition by departmentID) maxdate

) e

where maxdate=dateOfBirth

but the performance is worse (at least with my index configuration):

The execution plan of the JOIN version shows parallelism, sort, segementation, top while the PARTITION BY version shows something I would have expected to occur in the JOIN version:

Spooling the data twice then the calculation of the aggregate, joining the streams and so on.

But on the other hand it is easier to read and maintain.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 3

yup - forgot all about that little restriction...

And yes - it can sometimes turn out to be (quite a bit) less efficient than the older-style derived table. Looks to me to be a rather massive optimizer screw-up (since it should always be possible for the optimizer to reoptimize as a derived table when appropriate), but there's not much to be done about it now.....

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?