How Can I Write A Query That Sums the Number of Occurrences Of a Value and Groups Them Into Ranges

Question

How Can I Write A Query That Sums the Number of Occurrences Of a Value and Groups Them Into Ranges

ceciliarenebaker

Old Hand

Points: 380
More actions
June 23, 2017 at 3:57 pm

#402213

I need a query that sums the number of occurrences of a value and group them into ranges. For example, in AdventureWorks the employee table, I can run the following query
select organizationlevel, count(organizationlevel) from humanresources.employee
group by organizationlevel
organizationlevel   count(organizationlevel)
NULL                                     0
1                                            6
2                                            27
3                                            66
4                                           190
What I want to do is return the following output on the column organization level. You see I want a sum of the values with the number of occurrences between 1 and 30 (there are 2 of those (level 1 and level 2), and a sum of the values with the number of occurrences between 31 and 70 (there is 1, level 3) and so on.
range
between 1 and 30    2
between 31 and 70   1
> 70                1

I need this query to run on columns when I am optimizing storedprocedures that may result in parameter sniffing. (some values may result in a good execution plan while other get a terrible execution plan. Can anyone help with this?

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Ed Wagner SSC Guru Points: 287026 More actions · Answer 1

ceciliarenebaker - Friday, June 23, 2017 3:57 PM
I need a query that sums the number of occurrences of a value and group them into ranges. For example, in AdventureWorks the employee table, I can run the following query
select organizationlevel, count(organizationlevel) from humanresources.employee
group by organizationlevel
organizationlevel   count(organizationlevel)
NULL                                     0
1                                            6
2                                            27
3                                            66
4                                           190
What I want to do is return the following output on the column organization level. You see I want a sum of the values with the number of occurrences between 1 and 30 (there are 2 of those (level 1 and level 2), and a sum of the values with the number of occurrences between 31 and 70 (there is 1, level 3) and so on.
range
between 1 and 30    2
between 31 and 70   1
> 70                1

I need this query to run on columns when I am optimizing storedprocedures that may result in parameter sniffing. (some values may result in a good execution plan while other get a terrible execution plan. Can anyone help with this?

I see what you're saying. Here's some DDL to create table and some test data to mimic your HumanResources.Employees table.

IF OBJECT_ID('dbo.Employees', 'u') IS NOT NULL DROP TABLE dbo.Employees; CREATE TABLE dbo.Employees ( ID Integer not null identity (1, 1), constraint Employees_PK primary key (ID), OrganizationLevel Integer);

WITH cteLevelsAndCounts AS ( SELECT Level, Number FROM (VALUES(1, 6), (2, 27), (3, 66), (4, 190) ) x (Level, Number) ) INSERT INTO dbo.Employees(OrganizationLevel) SELECT Level FROM cteLevelsAndCounts d CROSS APPLY util.dbo.TallyN(d.Number);

Now we can query the counts and bucket them into a range.

WITH cteRanges AS ( SELECT Mn, Mx FROM (VALUES(1, 30), (31, 70), (71, 9999) ) x (Mn, Mx) ), cteData AS ( SELECT OrganizationLevel, RC = COUNT(*) FROM dbo.Employees GROUP BY OrganizationLevel ) SELECT r.Mn, r.Mx, COUNT(d.RC), SUM(d.RC) FROM cteData d CROSS APPLY cteRanges r WHERE d.RC >= r.Mn AND d.RC < r.Mx GROUP BY r.Mn, r.Mx ORDER BY r.Mn;

Because you already have your table, you'll want to change the cteData query to read from your own table.
The cteRanges CTE could be a physical table if you want to use the set of ranges in multiple places and want a centralized list of values.
The cteData does your counts. The outer query reads the employee counts and buckets each count into one of the ranges.
You said you wanted the sum of values in each range, but the expected data showed the counts. So, I figured I'd include both to be sure.

You should be able to use the technique on almost any query in the cteData CTE. I hope this helps.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

ceciliarenebaker Old Hand Points: 380 More actions · Answer 2

ceciliarenebaker

Old Hand

Points: 380

June 27, 2017 at 1:09 pm

#1948433

Thank you so much for your help.

Ed Wagner SSC Guru Points: 287026 More actions · Answer 3

ceciliarenebaker - Tuesday, June 27, 2017 1:09 PM
Thank you so much for your help.

I'm glad it worked for you. Thanks for the feedback.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Alan Burstein SSC Guru Points: 61141 More actions · Answer 4

Dang! Ed beat me to it.

I would add - for optimal performance you want this index in place:

CREATE INDEX nc_poc_employees_level ON dbo.employees(OrganizationLevel);

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Jeff Moden SSC Guru Points: 1004577 More actions · Answer 5

Alan.B - Tuesday, June 27, 2017 2:45 PM
Dang! Ed beat me to it.
I would add - for optimal performance you want this index in place:
CREATE INDEX nc_poc_employees_level ON dbo.employees(OrganizationLevel);

If that's the low cardinality that I think it will be, it would hurt INSERT performance possibly by a lot. Fortunately, it appears to be on an employee table which isn't likely to need to take a lot of inserts, especially in a batch.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Alan Burstein SSC Guru Points: 61141 More actions · Answer 6

Jeff Moden - Tuesday, June 27, 2017 7:15 PM
Alan.B - Tuesday, June 27, 2017 2:45 PM
Dang! Ed beat me to it.
I would add - for optimal performance you want this index in place:
CREATE INDEX nc_poc_employees_level ON dbo.employees(OrganizationLevel);
If that's the low cardinality that I think it will be, it would hurt INSERT performance possibly by a lot. Fortunately, it appears to be on an employee table which isn't likely to need to take a lot of inserts, especially in a batch.

Agreed. I often forget to mention the impact on data modifications when adding a new index. I'm spoiled and live in the data warehouse world where we don't have to think much about inserts, updates and deletes

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

How Can I Write A Query That Sums the Number of Occurrences Of a Value and Groups Them Into Ranges

Cookies on SQLServerCentral