Using PERCENTILE_DISC in the same query as other aggregates

  • I have a situation where I would like to use both PERCENTILE_DISC and other aggregates in the same function. The problem I'm running into is that I get and error about the column in the WITHIN GROUP clause not being in the GROUP BY clause. The only solution I've come up with is to remove the GROUP BY, add an OVER to every aggregate in the query, and then add a distinct to get rid of the duplicates.

    Is this really the best option or is there a better solution to this? I recognize that the scenario below is very simple and maybe the percentile calculation could be refactored out into a CTE, but in the real world this is a complex query and it would be simpler and more maintainable to just keep the OVER clauses on the aggregates if that's the only other solution.


    DROP TABLE IF EXISTS #employee;

    CREATE TABLE #employee(
      id INT,
      dept INT,
      salary decimal( 12, 2 )
    );

    INSERT INTO #employee VALUES
    ( 1, 1, 50000.0 ),
    ( 2, 1, 51000.0 ),
    ( 3, 1, 52000.0 ),
    ( 4, 1, 53000.0 ),
    ( 5, 1, 54000.0 ),
    ( 6, 2, 55000.0 ),
    ( 7, 2, 56000.0 ),
    ( 8, 2, 57000.0 ),
    ( 9, 2, 58000.0 ),
    ( 10, 2, 60000.0 );

    -- doesn't work
    -- Column '#employee.salary' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
    SELECT
      Department = Dept,
      EmployeeCount = COUNT(*),
      MedianSalary = PERCENTILE_DISC(0.5) WITHIN GROUP( ORDER BY salary ) OVER(PARTITION BY dept)
    FROM #employee e
    GROUP BY dept

    -- works, but gives duplicates
    SELECT
      Department = Dept,
      EmployeeCount = COUNT(*) OVER(PARTITION BY dept),
      MedianSalary = PERCENTILE_DISC(0.5) WITHIN GROUP( ORDER BY salary ) OVER(PARTITION BY dept)
    FROM #employee e

    -- requires distinct to eliminate duplcates
    SELECT DISTINCT
      Department = Dept,
      EmployeeCount = COUNT(*) OVER(PARTITION BY dept),
      MedianSalary = PERCENTILE_DISC(0.5) WITHIN GROUP( ORDER BY salary ) OVER(PARTITION BY dept)
    FROM #employee e

  • Whether or not it's the best choice is going to be heavily dependent on just how much other complexity is going on.   What you've shown here isn't complex enough on it's own to necessarily justify an alternative.   Are we talking about tables with hundreds of millions of rows?   There's way too much simplification here to be much help...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • A fair point about the complexity determining what is the best solution. I can try to give you more clarity on that, but ultimately it would probably be easier for me to just pick from the available alternatives. To that end, do have any alternatives besides the two I originally mentioned (separate CTE's to calculate the median and adding an OVER/DISTINCT to everything)?

    To provide a little more detail, there are multiple tables involved, some of which will have millions or 10's of millions of rows, but not 100's of millions. I need to be able to calculate multiple other aggregates as well as median, preferably in the same query. If anything else would be helpful, let me know.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply