How Can I Write A Query That Sums the Number of Occurrences Of a Value and Groups Them Into Ranges

  • I need a query that sums the number of occurrences of a value and group them into ranges.  For example, in AdventureWorks the employee table, I can run the following query

    select organizationlevel, count(organizationlevel) from humanresources.employee

    group by organizationlevel

    organizationlevel   count(organizationlevel)
    NULL                                     0
    1                                            6
    2                                            27
    3                                            66
    4                                           190

    What I want to do is return the following output on the column organization level.  You see I want a sum of the values with the number of occurrences between 1 and 30 (there are 2 of those (level 1 and level 2), and a sum of the values with the number of occurrences between 31 and 70 (there is 1, level 3) and so on.

    range

    between 1 and 30    2

    between 31 and 70   1

    > 70                1


    I need this query to run on columns when I am optimizing storedprocedures that may result in parameter sniffing. (some values may result in a good execution plan while other get a terrible execution plan.  Can anyone help with this?

  • ceciliarenebaker - Friday, June 23, 2017 3:57 PM

    I need a query that sums the number of occurrences of a value and group them into ranges.  For example, in AdventureWorks the employee table, I can run the following query

    select organizationlevel, count(organizationlevel) from humanresources.employee

    group by organizationlevel

    organizationlevel   count(organizationlevel)
    NULL                                     0
    1                                            6
    2                                            27
    3                                            66
    4                                           190

    What I want to do is return the following output on the column organization level.  You see I want a sum of the values with the number of occurrences between 1 and 30 (there are 2 of those (level 1 and level 2), and a sum of the values with the number of occurrences between 31 and 70 (there is 1, level 3) and so on.

    range

    between 1 and 30    2

    between 31 and 70   1

    > 70                1


    I need this query to run on columns when I am optimizing storedprocedures that may result in parameter sniffing. (some values may result in a good execution plan while other get a terrible execution plan.  Can anyone help with this?

    I see what you're saying.  Here's some DDL to create table and some test data to mimic your HumanResources.Employees table.

    IF OBJECT_ID('dbo.Employees', 'u') IS NOT NULL DROP TABLE dbo.Employees;
    CREATE TABLE dbo.Employees (
    ID Integer not null identity (1, 1),
    constraint Employees_PK primary key (ID),
    OrganizationLevel Integer);

    WITH cteLevelsAndCounts AS (
    SELECT Level, Number
        FROM (VALUES(1, 6),
                    (2, 27),
                    (3, 66),
                    (4, 190)
             ) x (Level, Number)
    )
    INSERT INTO dbo.Employees(OrganizationLevel)
    SELECT Level
        FROM cteLevelsAndCounts d
          CROSS APPLY util.dbo.TallyN(d.Number);


    Now we can query the counts and bucket them into a range.

    WITH cteRanges AS (
    SELECT Mn, Mx
        FROM (VALUES(1, 30),
                  (31, 70),
                  (71, 9999)
             ) x (Mn, Mx)
    ),
    cteData AS (
    SELECT OrganizationLevel, RC = COUNT(*)
        FROM dbo.Employees
        GROUP BY OrganizationLevel
    )
    SELECT r.Mn, r.Mx, COUNT(d.RC), SUM(d.RC)
      FROM cteData d
        CROSS APPLY cteRanges r
      WHERE d.RC >= r.Mn
        AND d.RC < r.Mx
      GROUP BY r.Mn, r.Mx
      ORDER BY r.Mn;

    Because you already have your table, you'll want to change the cteData query to read from your own table.
    The cteRanges CTE could be a physical table if you want to use the set of ranges in multiple places and want a centralized list of values.
    The cteData does your counts.  The outer query reads the employee counts and buckets each count into one of the ranges.
    You said you wanted the sum of values in each range, but the expected data showed the counts.  So, I figured I'd include both to be sure.

    You should be able to use the technique on almost any query in the cteData CTE.  I hope this helps.

  • Thank you so much for your help.

  • ceciliarenebaker - Tuesday, June 27, 2017 1:09 PM

    Thank you so much for your help.

    I'm glad it worked for you.  Thanks for the feedback.

  • Dang! Ed beat me to it. 

    I would add - for optimal performance you want this index in place: 

    CREATE INDEX nc_poc_employees_level ON dbo.employees(OrganizationLevel);

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B - Tuesday, June 27, 2017 2:45 PM

    Dang! Ed beat me to it. 

    I would add - for optimal performance you want this index in place: 

    CREATE INDEX nc_poc_employees_level ON dbo.employees(OrganizationLevel);

    If that's the low cardinality that I think it will be, it would hurt INSERT performance possibly by a lot.  Fortunately, it appears to be on an employee table which isn't likely to need to take a lot of inserts, especially in a batch.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Tuesday, June 27, 2017 7:15 PM

    Alan.B - Tuesday, June 27, 2017 2:45 PM

    Dang! Ed beat me to it. 

    I would add - for optimal performance you want this index in place: 

    CREATE INDEX nc_poc_employees_level ON dbo.employees(OrganizationLevel);

    If that's the low cardinality that I think it will be, it would hurt INSERT performance possibly by a lot.  Fortunately, it appears to be on an employee table which isn't likely to need to take a lot of inserts, especially in a batch.

    Agreed. I often forget to mention the impact on data modifications when adding a new index. I'm spoiled and live in the data warehouse world where we don't have to think much about inserts, updates and deletes 😉

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply