How to write query on column with list of delimited categories

  • I have to deal with a table with schema designed as below:

    CREATE TABLE [dbo].[UseCase]
    (
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [UseCase] [text] NOT NULL,
    [Comment] [nvarchar](40) NOT NULL,
    [Categories] [nvarchar](255) NULL
    )

    GO

    INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
    VALUES ('This is use case 1', 'This is comment 1 for use case 1', 'Cat1|cat3|cat9')
    INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
    VALUES ('This is use case 2', 'This is comment 1 for use case 2', 'Cat2|cat9')
    INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
    VALUES ('This is use case 3', 'This is comment 1 for use case 3', 'Cat1|cat3')
    INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
    VALUES ('This is use case 4', 'This is comment 1 for use case 4', 'Cat1|cat4')
    INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
    VALUES ('This is use case 5', 'This is comment 1 for use case 5', 'Cat5')
    INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
    VALUES ('This is use case 6', 'This is comment 1 for use case 6', 'Cat6|cat8')
    INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
    VALUES ('This is use case 7', 'This is comment 1 for use case 7', 'Cat3|cat8')
    INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
    VALUES ('This is use case 8', 'This is comment 1 for use case 8', 'Cat1|cat3|cat9')

    Note the Categories column is a string with pipe | as delimiter.

    Can anyone share your thought on how to write a query to get the stats like how many use cases of each category?

    Cat1 4
    Cat2 1
    Cat3 4
    Cat4 1
    Cat5 1
    Cat6 1
    Cat7 0
    Cat8 2
    Cat9 9

    Thank you very much.

    • This topic was modified 4 years, 9 months ago by  halifaxdal. Reason: This question is for SQL2008R2
  • That was cheating-level easy... except you need a helper function, Jeff Moden's DelimitedSplit8K function. It's here https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-“csv-splitter”-function

    Here's the SQL:

    SELECT x.Item AS Category
     , COUNT(*) AS UseCaseCount
    FROM
    (
     SELECT ID
      , Categories
      , ds.itemNumber
      , ds.Item
     FROM dbo.UseCase
     CROSS APPLY Teest.dbo.DelimitedSplit8K(Categories,'|') ds
    ) x
    GROUP BY x.Item;
  • Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead

    again - sorry jeff, but splitting strings is not what SQL was designed for.

    Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂

    MVDBA

  • MVDBA (Mike Vessey) wrote:

    Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead

    again - sorry jeff, but splitting strings is not what SQL was designed for.

    Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂

     

    How might you implement this "TVP" solution, Mike?

    I'm quite familiar with the DelimitedSplit8K function, I certainly wouldn't describe it as RBAR because it's a set-based solution. There's no programmatically stepping through of rows which is how we identify a process as RBAR.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Not sure what TVP has to do with this.  Could you provide a solution?

    So I realize the OP is working with 2008R2 here.  Also in 2016 the string_split() function was added.  However, the string_split function only accepts single character delimiters.  What if someone needs more than that?  Ha, here's a Json splitter that accepts delimiters of up to 3 characters (it's adjustable in the tvf).  Instead of splitting the string it does search/replace of the delimiters with json markup and queries the nested arrays.  1 row was added to the test set to include extra commas and delimiters to test with.  Also the TEXT column type (which is to be avoided) in the sample table was changed to nvarchar(4000).  No idea how efficient this is compared to alternatives.

    drop table if exists dbo.Test_UseCase;
    go
    create table dbo.Test_UseCase(
    Id int identity(1,1) not null,
    UseCase nvarchar(4000) not null,
    Comment nvarchar(40) not null,
    Categories nvarchar(255) null);
    go

    INSERT dbo.Test_UseCase(UseCase, Comment, Categories) VALUES
    ('This is use case 1', 'This is comment 1 for use case 1', 'Cat1|||cat3|||cat9'),
    ('This is use case 2', 'This is comment 1 for use case 2', 'Cat2|||cat9'),
    ('This is use case 3', 'This is comment 1 for use case 3', 'Cat1|||cat3'),
    ('This is use case 4', 'This is comment 1 for use case 4', 'Cat1|||cat4'),
    ('This is use case 5', 'This is comment 1 for use case 5', 'Cat5'),
    ('This is use case 6', 'This is comment 1 for use case 6', 'Cat6|||cat8'),
    ('This is use case 7', 'This is comment 1 for use case 7', 'Cat3|||cat8'),
    ('This is use case 8', 'This is comment 1 for use case 8', 'Cat1|||cat3|||cat9'),
    ('This , is use | case 9', 'This , is comment 1,,, for ||use case 9', 'Cat99|||cat99|||cat99');
    go

    drop function if exists dbo.json_splitter;
    go
    create function dbo.json_splitter(
    @string nvarchar(max),
    @delimiter nvarchar(3))
    returns table as
    return
    with
    json_cte(nested_json) as(
    select concat(N'{ "nested_json" : [{ "split_string":"', replace(@string, @delimiter, N'"},{"split_string":"'), N'"}]}'))
    select *
    from
    json_cte jc
    cross apply
    openjson(jc.nested_json, N'strict $.nested_json') with (split_string nvarchar(max));
    go

    select
    js.split_string,
    count(*) string_count
    from
    Test_UseCase tu
    cross apply
    dbo.json_splitter(tu.[Categories], N'|||') js
    group by
    js.split_string;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • ChrisM@Work wrote:

    MVDBA (Mike Vessey) wrote:

    Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead

    again - sorry jeff, but splitting strings is not what SQL was designed for.

    Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂

    How might you implement this "TVP" solution, Mike?

    I'm quite familiar with the DelimitedSplit8K function, I certainly wouldn't describe it as RBAR because it's a set-based solution. There's no programmatically stepping through of rows which is how we identify a process as RBAR.

    maybe not rbar, but string by agonising delimited string

    with a TVP you can pass in a table - nice and simple.... if something is building up a delimited list then de-delimiting it (if that is a word) then that is overhead at both ends.. just send an array/icollection/table, nice fast and easy

    MVDBA

  • pietlinden wrote:

    That was cheating-level easy... except you need a helper function, Jeff Moden's DelimitedSplit8K function. It's here https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-“csv-splitter”-function%5B/quote%5D

    The OP is splitting nvarchar(255) and Jeff's other splitter 'DelimitedSplitN4K' targets nvarchar.  Also no subquery needed here.

    select
    ds.Item,
    count(*) string_count
    from
    dbo.UseCase u
    cross apply
    dbo.DelimitedSplitN4K(u.[Categories], N'|') ds
    where
    ds.Item<> ''
    group by
    ds.Item;

    • This reply was modified 4 years, 9 months ago by  Steve Collins.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Sorry, missed the NVARCHAR. (The answer is in the same article, though!)

  • MVDBA (Mike Vessey) wrote:

    ChrisM@Work wrote:

    MVDBA (Mike Vessey) wrote:

    Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead

    again - sorry jeff, but splitting strings is not what SQL was designed for.

    Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂

    How might you implement this "TVP" solution, Mike?

    I'm quite familiar with the DelimitedSplit8K function, I certainly wouldn't describe it as RBAR because it's a set-based solution. There's no programmatically stepping through of rows which is how we identify a process as RBAR.

    maybe not rbar, but string by agonising delimited string

    with a TVP you can pass in a table - nice and simple.... if something is building up a delimited list then de-delimiting it (if that is a word) then that is overhead at both ends.. just send an array/icollection/table, nice fast and easy

    Mike,

    You have an interesting idea but can you setup a coded example with test data, please?  I'd love to test such a different idea.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    MVDBA (Mike Vessey) wrote:

    ChrisM@Work wrote:

    MVDBA (Mike Vessey) wrote:

    Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead

    again - sorry jeff, but splitting strings is not what SQL was designed for.

    Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂

    How might you implement this "TVP" solution, Mike?

    I'm quite familiar with the DelimitedSplit8K function, I certainly wouldn't describe it as RBAR because it's a set-based solution. There's no programmatically stepping through of rows which is how we identify a process as RBAR.

    maybe not rbar, but string by agonising delimited string

    with a TVP you can pass in a table - nice and simple.... if something is building up a delimited list then de-delimiting it (if that is a word) then that is overhead at both ends.. just send an array/icollection/table, nice fast and easy

    Mike,

    You have an interesting idea but can you setup a coded example with test data, please?  I'd love to test such a different idea.

    I like a challenge 🙂 let me get to lunchtime and i'll code something up

    MVDBA

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply