I have to deal with a table with schema designed as below:
CREATE TABLE [dbo].[UseCase]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[UseCase] [text] NOT NULL,
[Comment] [nvarchar](40) NOT NULL,
[Categories] [nvarchar](255) NULL
)
GO
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 1', 'This is comment 1 for use case 1', 'Cat1|cat3|cat9')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 2', 'This is comment 1 for use case 2', 'Cat2|cat9')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 3', 'This is comment 1 for use case 3', 'Cat1|cat3')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 4', 'This is comment 1 for use case 4', 'Cat1|cat4')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 5', 'This is comment 1 for use case 5', 'Cat5')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 6', 'This is comment 1 for use case 6', 'Cat6|cat8')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 7', 'This is comment 1 for use case 7', 'Cat3|cat8')
INSERT INTO [dbo].[UseCase] ([UseCase], [Comment], [Categories])
VALUES ('This is use case 8', 'This is comment 1 for use case 8', 'Cat1|cat3|cat9')
Note the Categories column is a string with pipe | as delimiter.
Can anyone share your thought on how to write a query to get the stats like how many use cases of each category?
Cat1 4
Cat2 1
Cat3 4
Cat4 1
Cat5 1
Cat6 1
Cat7 0
Cat8 2
Cat9 9
Thank you very much.
February 13, 2020 at 11:11 pm
That was cheating-level easy... except you need a helper function, Jeff Moden's DelimitedSplit8K function. It's here https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-“csv-splitter”-function
Here's the SQL:
SELECT x.Item AS Category
, COUNT(*) AS UseCaseCount
FROM
(
SELECT ID
, Categories
, ds.itemNumber
, ds.Item
FROM dbo.UseCase
CROSS APPLY Teest.dbo.DelimitedSplit8K(Categories,'|') ds
) x
GROUP BY x.Item;
February 14, 2020 at 11:08 am
Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead
again - sorry jeff, but splitting strings is not what SQL was designed for.
Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂
MVDBA
February 14, 2020 at 1:13 pm
Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead
again - sorry jeff, but splitting strings is not what SQL was designed for.
Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂
How might you implement this "TVP" solution, Mike?
I'm quite familiar with the DelimitedSplit8K function, I certainly wouldn't describe it as RBAR because it's a set-based solution. There's no programmatically stepping through of rows which is how we identify a process as RBAR.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
February 14, 2020 at 1:16 pm
Not sure what TVP has to do with this. Could you provide a solution?
So I realize the OP is working with 2008R2 here. Also in 2016 the string_split() function was added. However, the string_split function only accepts single character delimiters. What if someone needs more than that? Ha, here's a Json splitter that accepts delimiters of up to 3 characters (it's adjustable in the tvf). Instead of splitting the string it does search/replace of the delimiters with json markup and queries the nested arrays. 1 row was added to the test set to include extra commas and delimiters to test with. Also the TEXT column type (which is to be avoided) in the sample table was changed to nvarchar(4000). No idea how efficient this is compared to alternatives.
drop table if exists dbo.Test_UseCase;
go
create table dbo.Test_UseCase(
Id int identity(1,1) not null,
UseCase nvarchar(4000) not null,
Comment nvarchar(40) not null,
Categories nvarchar(255) null);
go
INSERT dbo.Test_UseCase(UseCase, Comment, Categories) VALUES
('This is use case 1', 'This is comment 1 for use case 1', 'Cat1|||cat3|||cat9'),
('This is use case 2', 'This is comment 1 for use case 2', 'Cat2|||cat9'),
('This is use case 3', 'This is comment 1 for use case 3', 'Cat1|||cat3'),
('This is use case 4', 'This is comment 1 for use case 4', 'Cat1|||cat4'),
('This is use case 5', 'This is comment 1 for use case 5', 'Cat5'),
('This is use case 6', 'This is comment 1 for use case 6', 'Cat6|||cat8'),
('This is use case 7', 'This is comment 1 for use case 7', 'Cat3|||cat8'),
('This is use case 8', 'This is comment 1 for use case 8', 'Cat1|||cat3|||cat9'),
('This , is use | case 9', 'This , is comment 1,,, for ||use case 9', 'Cat99|||cat99|||cat99');
go
drop function if exists dbo.json_splitter;
go
create function dbo.json_splitter(
@string nvarchar(max),
@delimiter nvarchar(3))
returns table as
return
with
json_cte(nested_json) as(
select concat(N'{ "nested_json" : [{ "split_string":"', replace(@string, @delimiter, N'"},{"split_string":"'), N'"}]}'))
select *
from
json_cte jc
cross apply
openjson(jc.nested_json, N'strict $.nested_json') with (split_string nvarchar(max));
go
select
js.split_string,
count(*) string_count
from
Test_UseCase tu
cross apply
dbo.json_splitter(tu.[Categories], N'|||') js
group by
js.split_string;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
February 14, 2020 at 1:49 pm
MVDBA (Mike Vessey) wrote:Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead
again - sorry jeff, but splitting strings is not what SQL was designed for.
Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂
How might you implement this "TVP" solution, Mike?
I'm quite familiar with the DelimitedSplit8K function, I certainly wouldn't describe it as RBAR because it's a set-based solution. There's no programmatically stepping through of rows which is how we identify a process as RBAR.
maybe not rbar, but string by agonising delimited string
with a TVP you can pass in a table - nice and simple.... if something is building up a delimited list then de-delimiting it (if that is a word) then that is overhead at both ends.. just send an array/icollection/table, nice fast and easy
MVDBA
That was cheating-level easy... except you need a helper function, Jeff Moden's DelimitedSplit8K function. It's here https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-“csv-splitter”-function%5B/quote%5D
The OP is splitting nvarchar(255) and Jeff's other splitter 'DelimitedSplitN4K' targets nvarchar. Also no subquery needed here.
select
ds.Item,
count(*) string_count
from
dbo.UseCase u
cross apply
dbo.DelimitedSplitN4K(u.[Categories], N'|') ds
where
ds.Item<> ''
group by
ds.Item;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
February 15, 2020 at 8:09 pm
Sorry, missed the NVARCHAR. (The answer is in the same article, though!)
February 16, 2020 at 12:23 am
ChrisM@Work wrote:MVDBA (Mike Vessey) wrote:Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead
again - sorry jeff, but splitting strings is not what SQL was designed for.
Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂
How might you implement this "TVP" solution, Mike?
I'm quite familiar with the DelimitedSplit8K function, I certainly wouldn't describe it as RBAR because it's a set-based solution. There's no programmatically stepping through of rows which is how we identify a process as RBAR.
maybe not rbar, but string by agonising delimited string
with a TVP you can pass in a table - nice and simple.... if something is building up a delimited list then de-delimiting it (if that is a word) then that is overhead at both ends.. just send an array/icollection/table, nice fast and easy
Mike,
You have an interesting idea but can you setup a coded example with test data, please? I'd love to test such a different idea.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 17, 2020 at 9:25 am
MVDBA (Mike Vessey) wrote:ChrisM@Work wrote:MVDBA (Mike Vessey) wrote:Sorry to jeff, but i'm going to make light of the split functionality, it ends up in a function that goes RBAR - Use a TVP instead
again - sorry jeff, but splitting strings is not what SQL was designed for.
Grant keeps nagging me about using profiler, I figure that I might start nagging you about TVP 🙂
How might you implement this "TVP" solution, Mike?
I'm quite familiar with the DelimitedSplit8K function, I certainly wouldn't describe it as RBAR because it's a set-based solution. There's no programmatically stepping through of rows which is how we identify a process as RBAR.
maybe not rbar, but string by agonising delimited string
with a TVP you can pass in a table - nice and simple.... if something is building up a delimited list then de-delimiting it (if that is a word) then that is overhead at both ends.. just send an array/icollection/table, nice fast and easy
Mike,
You have an interesting idea but can you setup a coded example with test data, please? I'd love to test such a different idea.
I like a challenge 🙂 let me get to lunchtime and i'll code something up
MVDBA
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply