May 20, 2008 at 7:44 am
Say I have a table employee (employeeID, name, dateOfBirth, departmentID) and want to retrieve the youngest employee per department. I'd do something like
select
e1.employeeID,
e1.name,
e1.dateOfBirth,
e1.departmentID
from
employee e1 join
(
select departmentID, max(dateOfBirth) maxDate
from employee
group by departmentID
) e2
on e1.departmentID=e2.departmentID and e1.dateOfBirth=e2.maxDate
This is a pattern I meet quite often. I'd rather expect SQL to offer some syntax like
select
employeeID,
name,
dateOfBirth,
departmentID
from
employee
where dateOfBirth = max(dateOfBirth) grouped by departmentID
Wouldn't that be a useful SQL extension?
If not: Is that the way you would code it or do you recommend some other pattern?
May 20, 2008 at 7:47 am
Since you're in 2005, you can actually use the new windowed version of max() to do just that. It looks something like this:
select
employeeID,
name,
dateOfBirth,
departmentID
from
employee
where dateOfBirth = max(dateOfBirth) OVER (Partition by departmentID)
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 20, 2008 at 9:29 am
Thank you for your reply.
That's what I would have hoped for, but I get an error saying that windowed functions are valid only in select and order by clauses.
I think that the use case for windowed functions is creating different groupings/partitions and therefore retrieving different subtotals for different partitions.
Anyway: I tried
select
employeeID,
name,
dateOfBirth
departmentID
from
(
select
employeeID,
name,
dateOfBirth,
departmentID,
max(dateOfBirth) over (Partition by departmentID) maxdate
) e
where maxdate=dateOfBirth
but the performance is worse (at least with my index configuration):
The execution plan of the JOIN version shows parallelism, sort, segementation, top while the PARTITION BY version shows something I would have expected to occur in the JOIN version:
Spooling the data twice then the calculation of the aggregate, joining the streams and so on.
But on the other hand it is easier to read and maintain.
May 20, 2008 at 9:53 am
yup - forgot all about that little restriction...
And yes - it can sometimes turn out to be (quite a bit) less efficient than the older-style derived table. Looks to me to be a rather massive optimizer screw-up (since it should always be possible for the optimizer to reoptimize as a derived table when appropriate), but there's not much to be done about it now.....
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply