Accessing other columns as part of aggregate query

  • My developers have a question regarding the performance of a query they're trying to write.

    For example, they'd like to retrieve the employeeid from a salaries table where they have the max(salary).

    Obviously you could do it such that:

    select top 1 employeeid

    from salaries where salary = (select max(salary) from salaries).

    Shouldn't the query optimizer be aware of some sort of row id, or a PK value within the table that the aggregate function is applied to?

    Is there some sort of inbuilt function or fancy way of doing this as it would remove a performance hit of the subselect? Similar to the WHERE CURRENT OF clause in cursor processing.

  • [Code]SELECT TOP 1 EMPLOYEEID

    FROM SALARIES

    ORDER BY SALARY DESC[/Code]


    Regards,

    goodguy

    Experience is a bad teacher whose exams precede its lessons

  • Thanks for that, but I think I need to qualify it a bit more. Here's more of an example.

    I have the following table:

    emp with EmpId, EmpName, Salary, DivisionId

    with the following data:

    [font="Courier New"]EmpId___EmpName___Salary___DivisionId

    22______Joe_______25000____1

    24______Alex______30000____2

    27______John______48000____1

    30______Mark______29000____3

    33______Alice_____37000____1

    41______Bob_______21000____2

    42______Betty_____29000____3

    45______Frank_____38000____2[/font]

    I want to find the max salary for each division within the emp table. That is solved by a simple group by query.

    SELECT max(Salary) AS Salary, DivisionId

    FROM emp

    GROUP BY DivisionId

    returns:

    [font="Courier New"]Salary___DivisionId

    48000____1

    38000____2

    29000____3[/font]

    However I want to be able to expose the other fields of the emp table without affecting the result. I.e. Show me the EmpId and EmpName fields associated to those records that have the max salaries.

    This can easily be done using a subquery, or as shown here, a derived table.

    SELECT maxsal.DivisionId, maxsal.Salary, emp.EmpId, emp.EmpName

    FROM emp

    INNER JOIN (

    SELECT max(Salary) AS Salary, e.DivisionId

    FROM emp e

    GROUP BY e.DivisionId) maxsal

    ON emp.Salary = maxsal.Salary

    AND emp.DivisionId = maxsal.DivisionId

    returns:

    [font="Courier New"]DivisionId____Salary___EmpId___EmpName

    1_____________48000____27______John

    2_____________38000____45______Frank

    3_____________29000____30______Mark

    3_____________29000____42______Betty[/font]

    I am looking to avoid performing the join, in order to improve performance.

    My question is... Surely the query optimizer knows of a way to identify the rows which satisfy the max aggregate function and allow me to access the other fields in these corresponding rows.

    This would be the similar behavior to the WHERE CURRENT OF clause in cursor processing.

    Let it be known that this example I have shown is effectively cut out of a highly complex query.

  • I think that the derived table you illustrate is the best, safest, and most efficient method. I'd be really pleased to be proved wrong..

    Best wishes,
    Phil Factor

  • You could at least index the department and salary columns to help speed up the join.

    In SQL 2k5 you may be able to make it more efficient using some of the OVER syntax available (I haven't given it much thought) but your way is the most common way I've seen - just that many forget to index the field which, if you have a LOT of data, could slow down the join. Also, an index on department then salary with salary in descending order may make things easy for the query optimiser as it only need go to the first index entry for each department. I haven't checked that but it would be nice to be right about it 😀

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply