January 15, 2008 at 10:40 pm
My developers have a question regarding the performance of a query they're trying to write.
For example, they'd like to retrieve the employeeid from a salaries table where they have the max(salary).
Obviously you could do it such that:
select top 1 employeeid
from salaries where salary = (select max(salary) from salaries).
Shouldn't the query optimizer be aware of some sort of row id, or a PK value within the table that the aggregate function is applied to?
Is there some sort of inbuilt function or fancy way of doing this as it would remove a performance hit of the subselect? Similar to the WHERE CURRENT OF clause in cursor processing.
January 16, 2008 at 12:26 am
January 16, 2008 at 10:21 pm
Thanks for that, but I think I need to qualify it a bit more. Here's more of an example.
I have the following table:
emp with EmpId, EmpName, Salary, DivisionId
with the following data:
[font="Courier New"]EmpId___EmpName___Salary___DivisionId
22______Joe_______25000____1
24______Alex______30000____2
27______John______48000____1
30______Mark______29000____3
33______Alice_____37000____1
41______Bob_______21000____2
42______Betty_____29000____3
45______Frank_____38000____2[/font]
I want to find the max salary for each division within the emp table. That is solved by a simple group by query.
SELECT max(Salary) AS Salary, DivisionId
FROM emp
GROUP BY DivisionId
returns:
[font="Courier New"]Salary___DivisionId
48000____1
38000____2
29000____3[/font]
However I want to be able to expose the other fields of the emp table without affecting the result. I.e. Show me the EmpId and EmpName fields associated to those records that have the max salaries.
This can easily be done using a subquery, or as shown here, a derived table.
SELECT maxsal.DivisionId, maxsal.Salary, emp.EmpId, emp.EmpName
FROM emp
INNER JOIN (
SELECT max(Salary) AS Salary, e.DivisionId
FROM emp e
GROUP BY e.DivisionId) maxsal
ON emp.Salary = maxsal.Salary
AND emp.DivisionId = maxsal.DivisionId
returns:
[font="Courier New"]DivisionId____Salary___EmpId___EmpName
1_____________48000____27______John
2_____________38000____45______Frank
3_____________29000____30______Mark
3_____________29000____42______Betty[/font]
I am looking to avoid performing the join, in order to improve performance.
My question is... Surely the query optimizer knows of a way to identify the rows which satisfy the max aggregate function and allow me to access the other fields in these corresponding rows.
This would be the similar behavior to the WHERE CURRENT OF clause in cursor processing.
Let it be known that this example I have shown is effectively cut out of a highly complex query.
January 17, 2008 at 4:19 am
I think that the derived table you illustrate is the best, safest, and most efficient method. I'd be really pleased to be proved wrong..
Best wishes,
Phil Factor
January 17, 2008 at 5:44 am
You could at least index the department and salary columns to help speed up the join.
In SQL 2k5 you may be able to make it more efficient using some of the OVER syntax available (I haven't given it much thought) but your way is the most common way I've seen - just that many forget to index the field which, if you have a LOT of data, could slow down the join. Also, an index on department then salary with salary in descending order may make things easy for the query optimiser as it only need go to the first index entry for each department. I haven't checked that but it would be nice to be right about it 😀
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply