How to write query on column with list of delimited categories

Question

Post reply

How to write query on column with list of delimited categories

halifaxdal

SSCoach

Points: 19741

February 13, 2020 at 9:57 pm

Go to Answer

#3725773

I have to deal with a table with schema designed as below:

CREATE TABLE [dbo].[UseCase]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[UseCase] [text] NOT NULL,
[Comment] [nvarchar](40) NOT NULL,
[Categories] [nvarchar](255) NULL
)

GO

INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 1', 'This is comment 1 for use case 1', 'Cat1|cat3|cat9')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 2', 'This is comment 1 for use case 2', 'Cat2|cat9')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 3', 'This is comment 1 for use case 3', 'Cat1|cat3')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 4', 'This is comment 1 for use case 4', 'Cat1|cat4')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 5', 'This is comment 1 for use case 5', 'Cat5')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 6', 'This is comment 1 for use case 6', 'Cat6|cat8')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 7', 'This is comment 1 for use case 7', 'Cat3|cat8')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 8', 'This is comment 1 for use case 8', 'Cat1|cat3|cat9')

Note the Categories column is a string with pipe | as delimiter.

Can anyone share your thought on how to write a query to get the stats like how many use cases of each category?

Cat1 4
Cat2 1
Cat3 4
Cat4 1
Cat5 1
Cat6 1
Cat7 0
Cat8 2
Cat9 9

Thank you very much.

This topic was modified 4 years, 9 months ago by halifaxdal. Reason: This question is for SQL2008R2

Steve Collins

SSCrazy

Points: 2155
More actions
February 14, 2020 at 2:54 pm
Answer
#3725930
pietlinden wrote:
That was cheating-level easy... except you need a helper function, Jeff Moden's DelimitedSplit8K function. It's here https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-“csv-splitter”-function%5B/quote%5D
The OP is splitting nvarchar(255) and Jeff's other splitter 'DelimitedSplitN4K' targets nvarchar. Also no subquery needed here.
```
select
  ds.Item,
  count(*) string_count
from
  dbo.UseCase u
 cross apply
  dbo.DelimitedSplitN4K(u.[Categories], N'|') ds
where
  ds.Item<> ''
group by
  ds.Item;
```
- This reply was modified 4 years, 9 months ago by Steve Collins.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63500 More actions · Answer 1

That was cheating-level easy... except you need a helper function, Jeff Moden's DelimitedSplit8K function. It's here https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-“csv-splitter”-function

Here's the SQL:

SELECT x.Item AS Category
 , COUNT(*) AS UseCaseCount
FROM 
(
 SELECT ID
  , Categories
  , ds.itemNumber
  , ds.Item
 FROM dbo.UseCase
 CROSS APPLY Teest.dbo.DelimitedSplit8K(Categories,'|') ds
) x
GROUP BY x.Item;

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 2

Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead

again - sorry jeff, but splitting strings is not what SQL was designed for.

Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂

MVDBA

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

MVDBA (Mike Vessey) wrote:

Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead
again - sorry jeff, but splitting strings is not what SQL was designed for.
Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂

How might you implement this "TVP" solution, Mike?

I'm quite familiar with the DelimitedSplit8K function, I certainly wouldn't describe it as RBAR because it's a set-based solution. There's no programmatically stepping through of rows which is how we identify a process as RBAR.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Steve Collins SSCrazy Points: 2155 More actions · Answer 4

Not sure what TVP has to do with this. Could you provide a solution?

So I realize the OP is working with 2008R2 here. Also in 2016 the string_split() function was added. However, the string_split function only accepts single character delimiters. What if someone needs more than that? Ha, here's a Json splitter that accepts delimiters of up to 3 characters (it's adjustable in the tvf). Instead of splitting the string it does search/replace of the delimiters with json markup and queries the nested arrays. 1 row was added to the test set to include extra commas and delimiters to test with. Also the TEXT column type (which is to be avoided) in the sample table was changed to nvarchar(4000). No idea how efficient this is compared to alternatives.

drop table if exists dbo.Test_UseCase;
go
create table dbo.Test_UseCase(
  Id int identity(1,1) not null,
  UseCase nvarchar(4000) not null,
  Comment nvarchar(40) not null,
  Categories nvarchar(255) null);
go

INSERT dbo.Test_UseCase(UseCase, Comment, Categories) VALUES
('This is use case 1', 'This is comment 1 for use case 1', 'Cat1|||cat3|||cat9'),
('This is use case 2', 'This is comment 1 for use case 2', 'Cat2|||cat9'),
('This is use case 3', 'This is comment 1 for use case 3', 'Cat1|||cat3'),
('This is use case 4', 'This is comment 1 for use case 4', 'Cat1|||cat4'),
('This is use case 5', 'This is comment 1 for use case 5', 'Cat5'),
('This is use case 6', 'This is comment 1 for use case 6', 'Cat6|||cat8'),
('This is use case 7', 'This is comment 1 for use case 7', 'Cat3|||cat8'),
('This is use case 8', 'This is comment 1 for use case 8', 'Cat1|||cat3|||cat9'),
('This , is use | case 9', 'This , is comment 1,,, for ||use case 9', 'Cat99|||cat99|||cat99');
go

drop function if exists dbo.json_splitter;
go
create function dbo.json_splitter(
  @string            nvarchar(max),
  @delimiter       nvarchar(3))
returns table as 
return
with
json_cte(nested_json) as(
    select concat(N'{ "nested_json" : [{ "split_string":"', replace(@string, @delimiter, N'"},{"split_string":"'), N'"}]}'))
select * 
from
  json_cte jc
 cross apply
  openjson(jc.nested_json, N'strict $.nested_json')  with (split_string nvarchar(max));
go

select
  js.split_string,
  count(*) string_count
from
  Test_UseCase tu
 cross apply
  dbo.json_splitter(tu.[Categories], N'|||') js
group by
  js.split_string;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 5

ChrisM@Work wrote:

MVDBA (Mike Vessey) wrote:
Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead
again - sorry jeff, but splitting strings is not what SQL was designed for.
Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂
How might you implement this "TVP" solution, Mike?
I'm quite familiar with the DelimitedSplit8K function, I certainly wouldn't describe it as RBAR because it's a set-based solution. There's no programmatically stepping through of rows which is how we identify a process as RBAR.

maybe not rbar, but string by agonising delimited string

with a TVP you can pass in a table - nice and simple.... if something is building up a delimited list then de-delimiting it (if that is a word) then that is overhead at both ends.. just send an array/icollection/table, nice fast and easy

MVDBA

pietlinden SSC Guru Points: 63500 More actions · Answer 6

Sorry, missed the NVARCHAR. (The answer is in the same article, though!)

Jeff Moden SSC Guru Points: 1004312 More actions · Answer 7

MVDBA (Mike Vessey) wrote:

ChrisM@Work wrote:
MVDBA (Mike Vessey) wrote:
Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead
again - sorry jeff, but splitting strings is not what SQL was designed for.
Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂
How might you implement this "TVP" solution, Mike?
I'm quite familiar with the DelimitedSplit8K function, I certainly wouldn't describe it as RBAR because it's a set-based solution. There's no programmatically stepping through of rows which is how we identify a process as RBAR.
maybe not rbar, but string by agonising delimited string
with a TVP you can pass in a table - nice and simple.... if something is building up a delimited list then de-delimiting it (if that is a word) then that is overhead at both ends.. just send an array/icollection/table, nice fast and easy

Mike,

You have an interesting idea but can you setup a coded example with test data, please? I'd love to test such a different idea.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 8

Jeff Moden wrote:

MVDBA (Mike Vessey) wrote:
ChrisM@Work wrote:
MVDBA (Mike Vessey) wrote:
Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead
again - sorry jeff, but splitting strings is not what SQL was designed for.
Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂
How might you implement this "TVP" solution, Mike?
I'm quite familiar with the DelimitedSplit8K function, I certainly wouldn't describe it as RBAR because it's a set-based solution. There's no programmatically stepping through of rows which is how we identify a process as RBAR.
maybe not rbar, but string by agonising delimited string
with a TVP you can pass in a table - nice and simple.... if something is building up a delimited list then de-delimiting it (if that is a word) then that is overhead at both ends.. just send an array/icollection/table, nice fast and easy
Mike,
You have an interesting idea but can you setup a coded example with test data, please? I'd love to test such a different idea.

I like a challenge 🙂 let me get to lunchtime and i'll code something up

MVDBA