Query to group and count data elements from a single column

Question

Query to group and count data elements from a single column

paulgq

SSC Enthusiast

Points: 140
More actions
March 6, 2019 at 12:15 am

#370944

In records 1, 2 and 3 (plus 3000 more rows), we have the following sample data from table SAMPLE_DATA, COL01:
The brown cat jumped over the fence
The red fox dug under the fence
The white rabbit ran past the fence
I need to list the elements of each Col1 value and compare them to the rest of the elements from other Col1 records.
A SQL statement that produces the following output listed in order from most occurring to least occurring, then alphabetically
The number is the count of how many times the element appears in the data set and the text is the element.
I only need the top 1000 rows. Capitals and lower case should be treated as the same letter.
(6) the
(3) fence
(1) brown
(1) cat
(1) dug
(1) fox
(1) jumped
(1) over
(1) past
(1) rabbit
(1) ran
(1) red
(1) under
(1) white
Any suggestions would be very helpful. Thank you.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182920 More actions · Answer 1

paulgq - Wednesday, March 6, 2019 12:15 AM
In Col1, records 1, 2 and 3 we have the following data:
The brown cat jumped over the fence
The red fox dug under the fence
The white rabbit ran past the fence
I need to list the elements of each Col1 value and compare them to the rest of the elements from other Col1 records.
A SQL statement that produces the following output listed in order from most occurring to least occurring, then alphabetically
The number is the count of how many times the element appears in the data set and the text is the element.
I only need the top 1000 rows. Capitals and lower case should be treated as the same letter.
(6) the
(3) fence
(1) brown
(1) cat
(1) dug
(1) fox
(1) jumped
(1) over
(1) past
(1) rabbit
(1) ran
(1) red
(1) under
(1) white
Any suggestions would be very helpful. Thank you.

Here is one way of doing this
😎

USE TEEST; GO SET NOCOUNT ON;

;WITH SAMPLE_DATA(COL01) AS ( SELECT 'The brown cat jumped over the fence' UNION ALL SELECT 'The red fox dug under the fence' UNION ALL SELECT 'The white rabbit ran past the fence' ) SELECT WRD.[value] AS WORD ,COUNT(*) AS WCNT FROM SAMPLE_DATA SD CROSS APPLY string_split(SD.COL01,CHAR(32)) WRD GROUP BY WRD.[value] ORDER BY WCNT DESC;

Output
WORD WCNT ------- ----- The 6 fence 3 fox 1 jumped 1 over 1 past 1 rabbit 1 ran 1 red 1 under 1 white 1 brown 1 cat 1

paulgq SSC Enthusiast Points: 140 More actions · Answer 2

that seems elegant. but I'm confused about this part

;WITH SAMPLE_DATA(COL01) AS
(
SELECT 'The brown cat jumped over the fence' UNION ALL
SELECT 'The red fox dug under the fence' UNION ALL
SELECT 'The white rabbit ran past the fence'
)

There are hundreds of rows that have all different values.
How do I select all of the rows from SAMPLE_DATA COL01 ?

Thanks,

Eirikur Eiriksson SSC Guru Points: 182920 More actions · Answer 3

paulgq - Wednesday, March 6, 2019 1:45 AM
that seems elegant. but I'm confused about this part
;WITH SAMPLE_DATA(COL01) AS
(
SELECT 'The brown cat jumped over the fence' UNION ALL
SELECT 'The red fox dug under the fence' UNION ALL
SELECT 'The white rabbit ran past the fence'
)
There are hundreds of rows that have all different values.
How do I select all of the rows from SAMPLE_DATA COL01 ?
Thanks,

The SAMPLE_DATA CTE is simply there to provide sample data for the method, you will simply skip that and point to the table/column you are going to use.
😎

What version of SQL Server are you using?
If you are on SQL Server < 2012 then use this function.
If you are on SQL Server 2012 or 2014 then use this function.
For SQL Server > 2014, use the string_split function.

paulgq SSC Enthusiast Points: 140 More actions · Answer 4

ok, think that I understand. my statement will be

SELECT WRD.[value] AS WORD, COUNT(*) AS WCNT
FROM SAMPLE_DATA
CROSS APPLY string_split(SAMPLE_DATA.COL01,CHAR(32)) WRD
GROUP BY WRD.[value]
ORDER BY WCNT DESC;

I am using MS Access so I am unsure if the above syntax will work correctly or not.