June 5, 2024 at 11:01 pm
I've used this function before and I'm sorry I do not recall it's origination. (DelimitedSplit8K)
It works as expected however there is a comma in the data that I am trying to parse. I've poste here the situation that I am trying to resolve and would greatly appreciate any help this accomplish the task of parsing this data out so that column/field T5 = BLUEF NA
Thanks.
DECLARE @string NVARCHAR(MAX) = '"T1","T2","T3","T4","T5" "000055555XYZ",00001,"","","BLUEF, NA"'
SELECT columnN=@string
SELECT @string= REPLACE(REPLACE(REPLACE(@string,char(10),'|'),char(13),'|'),' ','|')
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT 0)) As RN,
ItemNumber_A = A.ItemNumber
,Item_A = A.Item
,ItemNumber_B = B.ItemNumber
,Item_B = B.Item
FROM dbo.DelimitedSplit8K(@string, '|') AS A
CROSS APPLY dbo.DelimitedSplit8K(A.Item, ',') AS B
June 6, 2024 at 12:50 am
One way could be to use the same replacement technique as with the line breaks. Replace the "," and ", and ," with |. Then replace any remaining double quotes with an empty string. Then split on the pipes. This assumes the replaced character combinations don't appear elsewhere in the string. If that's not the case then perhaps you could add an escape sequence for the double quotes. The strings are Unicode, NVARCHAR, so the appropriate splitter is dbo.DelimitedSplitN4K
DECLARE @string NVARCHAR(MAX) = N'"T1","T2","T3","T4","T5" "000055555XYZ",00001,"","","BLUEF, NA"';
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) As RN
,ItemNumber_A = A.ItemNumber, Item_A = A.Item
,ItemNumber_B = B.ItemNumber, Item_B = B.Item
FROM dbo.DelimitedSplitN4K(REPLACE(REPLACE(REPLACE(@string,char(10),N'|'),char(13),N'|'),N' ',N'|'), '|') AS A
CROSS APPLY dbo.DelimitedSplitN4K(REPLACE(REPLACE(REPLACE(REPLACE(A.Item,N'","',N'|'),N',"',N'|'), N'",', N'|'), N'"', N''), N'|') AS B;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
June 6, 2024 at 4:59 pm
I did originally post this question under my personal account and this is from my work account.
I did think about using a second replace statement but could not devise the proper algorithm. It took me awhile to find the function you recommended. But once I did and tested it, that does not work. It actually provides almost the same results.
I should add that the sample data provided is just a sample and BLUEF, NA could be any string or numeric value containing 1 or more commas such as 1,000,000.00.
June 6, 2024 at 5:00 pm
This was removed by the editor as SPAM
June 6, 2024 at 5:49 pm
When I had to do something like this, I created a custom function that would replace commas if, and only if, they were enclosed in double quotes. Using a function prior to using DelimitedSplit will significantly reduce the performance of that function. You need to be aware of that if you have a (very) large number of rows to process.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 6, 2024 at 6:13 pm
There will not be a large number of rows only a hundred or so at a time. Do you have that function that you might care to share it? I was thinking of something along that same line but fail to figure out how to identify in my mind what is between a set of double quotes when there are so many double quotes in the string. Ant insight as to how to accomplish that would be appreciated.
June 6, 2024 at 7:03 pm
This gets close to what you want. It uses an ILTV function for the tally table. It essentially uses a running total modulo 2 to determine whether a certain value is currently quoted.
DECLARE @string NVARCHAR(MAX) = '"T1","T2","T3","T4","T5"
"000055555XYZ",00001,"","","BLUEF, NA"';
SET @string = REPLACE(@string, CHAR(13) + CHAR(10), ',');
WITH Quoted_Flags AS
(
SELECT t.n, c.c, CAST(SUM(CASE WHEN c.c = '"' THEN 1 ELSE 0 END) OVER(ORDER BY t.n ROWS UNBOUNDED PRECEDING) % 2 AS BIT) AS Quoted_Flag
FROM dbo.Tally(LEN(@string), 1) AS t
CROSS APPLY (VALUES(SUBSTRING(@string, t.n, 1))) AS c(c)
UNION
SELECT 0, ',', 0
)
SELECT REPLACE(REPLACE(SUBSTRING(@string, qf.n + 1, LEAD(qf.n, 1, LEN(@string) + 1) OVER(ORDER BY qf.n) - qf.n - 1), ',', ''), '"', '')
FROM Quoted_Flags AS qf
WHERE qf.c = ','
AND qf.Quoted_Flag = 0
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
June 6, 2024 at 7:39 pm
Tally table, ok I am vaguely familiar, but really clueless. A tally table is used in the DelimitedSplit8K function and I am guessing that this tally table would need to be constructed on modulo 2. But that is just me guessing. can you provide more about dbo.tally() please
June 7, 2024 at 11:31 am
I did originally post this question under my personal account and this is from my work account.
I did think about using a second replace statement but could not devise the proper algorithm. It took me awhile to find the function you recommended. But once I did and tested it, that does not work. It actually provides almost the same results.
I should add that the sample data provided is just a sample and BLUEF, NA could be any string or numeric value containing 1 or more commas such as 1,000,000.00.
Looking at your output it's clear you didn't run the code as it was provided
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
June 7, 2024 at 1:47 pm
This works - for the given example:
DECLARE @string NVARCHAR(MAX) = N'"T1","T2","T3","T4","T5"
"000055555XYZ",00001,"","","BLUEF, NA"';
PRINT @string
SET @string = replace(replace(replace(replace(replace(@string, char(13)+char(10), ','), '","', '|'), '",', '|'), ',"', '|'), '"', '')
PRINT @string
SELECT * FROM string_split(@string, '|') ss
Example with a table:
DECLARE @Strings TABLE (String nvarchar(MAX));
INSERT INTO @Strings (String)
VALUES (N'"T1","T2","T3","T4","T5"
"000055555XYZ",00001,"","","BLUEF, NA"')
, (N'"A1","A2","A3","A4","A5"
"000066666XYZ",00002,"","","1,000,000"')
, (N'"B1","B2","B3","B4","B5"
"000077777XYZ",00003,"","","NO COMMAS HERE"');
SELECT *
FROM @Strings s
CROSS APPLY string_split(replace(replace(replace(replace(replace(s.String, char(13)+char(10), ','), '","', '|'), '",', '|'), ',"', '|'), '"', ''), '|')
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
June 7, 2024 at 1:53 pm
I've used this function before and I'm sorry I do not recall it's origination. (DelimitedSplit8K)
Reaping the benefits of the Window functions in T-SQL – SQLServerCentral
😎
Look at the second half of the article.
June 7, 2024 at 4:07 pm
Sorry Steve the output I provided was only what a partial. I did run exactly what was provided and here is a screen shot of everything including what I found for the DelimitedSplit4k function. This was run in tempdb on SQL 2022, but I don't think that matters.
June 7, 2024 at 5:34 pm
The problem in Steve's solution is that is doesn't account for possible CR/LF data in the last field. In your data - there is a CR\LF after the comma in the last field.
So - what you have is "BLUEF," + CHAR(13) + CHAR(10) + "NA" - which displays as "BLUEF, NA" where that space is actually the CR\LF characters.
Depending on your data - you may need to modify the replace statements to include separate replacements for CHAR(13), CHAR(10) and CHAR(13)+CHAR(10).
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
June 7, 2024 at 6:58 pm
Thank you all so much. I will need to do more than just "Look at the second half of the article". I have read it twice and will read it again and again until I believe I understand it.
June 7, 2024 at 7:49 pm
In case you want it, here's a function to prep the string as we discussed:
DECLARE @string nvarchar(max);
SET @string = '"A1","A2","A3","A4","A5"
"000066666XYZ",00002,"","","1,000,000"';
SET @string = REPLACE(@string, CHAR(13) + CHAR(10), ',');
SELECT @string, dbo.prepare_string_for_split(@string, DEFAULT, DEFAULT, DEFAULT);
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
GO
CREATE OR ALTER FUNCTION dbo.prepare_string_for_split
(
@string nvarchar(max),
@split_char nchar(1) = N',',
@control_char nchar(1) = N'"',
@replacement_char nchar(1) = N'^'
)
RETURNS nvarchar(max) WITH RETURNS NULL ON NULL INPUT
AS
BEGIN
DECLARE @byte1 int;
DECLARE @byte2 int;
DECLARE @string_len int;
SET @string_len = LEN(@string);
SET @byte1 = PATINDEX(N'%' + @control_char + N'%', @string);
IF @byte1 = 0
SET @byte1 = @string_len;
WHILE @byte1 < @string_len
BEGIN
SET @byte2 = @byte1 + 1;
WHILE @byte2 <= @string_len AND SUBSTRING(@string, @byte2, 1) NOT IN (@split_char, @control_char)
SET @byte2 = @byte2 + 1;
IF @byte2 >= @string_len
BREAK;
WHILE SUBSTRING(@string, @byte2, 1) = @split_char
BEGIN
SET @string = STUFF(@string, @byte2, 1, @replacement_char);
WHILE @byte2 <= @string_len AND SUBSTRING(@string, @byte2, 1) NOT IN (@split_char, @control_char)
SET @byte2 = @byte2 + 1;
END /*WHILE*/ SET @byte1 = @byte2 + 1;
WHILE @byte1 <= @string_len AND SUBSTRING(@string, @byte1, 1) NOT IN (@control_char)
SET @byte1 = @byte1 + 1;
END /*WHILE*/RETURN @string;
END /*FUNCTION*/GO
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply