Parse Double Quoted comma separated data with comma

  • I've used this function before and I'm sorry I do not recall it's origination.  (DelimitedSplit8K)

    It works as expected however there is a comma in the data that I am trying to parse.  I've poste here the situation that I am trying to resolve and would greatly appreciate any help this accomplish the task of parsing this data out so that column/field T5 = BLUEF NA

    Thanks.

    DECLARE @string NVARCHAR(MAX) = '"T1","T2","T3","T4","T5"  "000055555XYZ",00001,"","","BLUEF, NA"'

    SELECT columnN=@string

    SELECT @string= REPLACE(REPLACE(REPLACE(@string,char(10),'|'),char(13),'|'),'  ','|')

    SELECT

    ROW_NUMBER() OVER(ORDER BY (SELECT 0)) As RN,

    ItemNumber_A = A.ItemNumber

    ,Item_A                                = A.Item

    ,ItemNumber_B = B.ItemNumber

    ,Item_B                                = B.Item

    FROM dbo.DelimitedSplit8K(@string, '|') AS A

    CROSS APPLY dbo.DelimitedSplit8K(A.Item, ',') AS B

     

  • One way could be to use the same replacement technique as with the line breaks.  Replace the "," and ", and ," with |.  Then replace any remaining double quotes with an empty string.  Then split on the pipes.  This assumes the replaced character combinations don't appear elsewhere in the string.  If that's not the case then perhaps you could add an escape sequence for the double quotes.  The strings are Unicode, NVARCHAR, so the appropriate splitter is dbo.DelimitedSplitN4K

    DECLARE @string NVARCHAR(MAX) = N'"T1","T2","T3","T4","T5"  "000055555XYZ",00001,"","","BLUEF, NA"';

    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 0)) As RN
    ,ItemNumber_A = A.ItemNumber, Item_A = A.Item
    ,ItemNumber_B = B.ItemNumber, Item_B = B.Item
    FROM dbo.DelimitedSplitN4K(REPLACE(REPLACE(REPLACE(@string,char(10),N'|'),char(13),N'|'),N' ',N'|'), '|') AS A
    CROSS APPLY dbo.DelimitedSplitN4K(REPLACE(REPLACE(REPLACE(REPLACE(A.Item,N'","',N'|'),N',"',N'|'), N'",', N'|'), N'"', N''), N'|') AS B;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • I did originally post this question under my personal account and this is from my work account.

    I did think about using a second replace statement but could not devise the proper algorithm.  It took me awhile to find the function you recommended.  But once I did and tested it, that does not work.  It actually provides almost the same results.   20240606.SQLQuery

    I should add that the sample data provided is just a sample and BLUEF, NA could be any string or numeric value containing 1 or more commas such as 1,000,000.00.

     

  • This was removed by the editor as SPAM

  • When I had to do something like this, I created a custom function that would replace commas if, and only if, they were enclosed in double quotes.  Using a function prior to using DelimitedSplit will significantly reduce the performance of that function.  You need to be aware of that if you have a (very) large number of rows to process.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • There will not be a large number of rows only a hundred or so at a time.  Do you have that function that you might care to share it?  I was thinking of something along that same line but fail to figure out how to identify in my mind what is between a set of double quotes when there are so many double quotes in the string.  Ant insight as to how to accomplish that would be appreciated.

  • This gets close to what you want.  It uses an ILTV function for the tally table.  It essentially uses a running total modulo 2 to determine whether a certain value is currently quoted.

    DECLARE @string NVARCHAR(MAX) = '"T1","T2","T3","T4","T5"
    "000055555XYZ",00001,"","","BLUEF, NA"';

    SET @string = REPLACE(@string, CHAR(13) + CHAR(10), ',');

    WITH Quoted_Flags AS
    (
    SELECT t.n, c.c, CAST(SUM(CASE WHEN c.c = '"' THEN 1 ELSE 0 END) OVER(ORDER BY t.n ROWS UNBOUNDED PRECEDING) % 2 AS BIT) AS Quoted_Flag
    FROM dbo.Tally(LEN(@string), 1) AS t
    CROSS APPLY (VALUES(SUBSTRING(@string, t.n, 1))) AS c(c)
    UNION
    SELECT 0, ',', 0
    )
    SELECT REPLACE(REPLACE(SUBSTRING(@string, qf.n + 1, LEAD(qf.n, 1, LEN(@string) + 1) OVER(ORDER BY qf.n) - qf.n - 1), ',', ''), '"', '')
    FROM Quoted_Flags AS qf
    WHERE qf.c = ','
    AND qf.Quoted_Flag = 0

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Tally table, ok I am vaguely familiar, but really clueless.  A tally table is used in the DelimitedSplit8K function and I am guessing that this tally table would need to be constructed on modulo 2.  But that is just me guessing. can you provide more about dbo.tally() please

  • Budd wrote:

    I did originally post this question under my personal account and this is from my work account.

    I did think about using a second replace statement but could not devise the proper algorithm.  It took me awhile to find the function you recommended.  But once I did and tested it, that does not work.  It actually provides almost the same results.   20240606.SQLQuery

    I should add that the sample data provided is just a sample and BLUEF, NA could be any string or numeric value containing 1 or more commas such as 1,000,000.00.

    Looking at your output it's clear you didn't run the code as it was provided

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • This works - for the given example:

    DECLARE @string NVARCHAR(MAX) = N'"T1","T2","T3","T4","T5"
    "000055555XYZ",00001,"","","BLUEF, NA"';

    PRINT @string
    SET @string = replace(replace(replace(replace(replace(@string, char(13)+char(10), ','), '","', '|'), '",', '|'), ',"', '|'), '"', '')
    PRINT @string
    SELECT * FROM string_split(@string, '|') ss

    Example with a table:

    DECLARE @Strings TABLE (String nvarchar(MAX));
    INSERT INTO @Strings (String)
    VALUES (N'"T1","T2","T3","T4","T5"
    "000055555XYZ",00001,"","","BLUEF, NA"')
    , (N'"A1","A2","A3","A4","A5"
    "000066666XYZ",00002,"","","1,000,000"')
    , (N'"B1","B2","B3","B4","B5"
    "000077777XYZ",00003,"","","NO COMMAS HERE"');

    SELECT *
    FROM @Strings s
    CROSS APPLY string_split(replace(replace(replace(replace(replace(s.String, char(13)+char(10), ','), '","', '|'), '",', '|'), ',"', '|'), '"', ''), '|')

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • drbob wrote:

    I've used this function before and I'm sorry I do not recall it's origination.  (DelimitedSplit8K) 

    Reaping the benefits of the Window functions in T-SQL – SQLServerCentral

    😎

    Look at the second half of the article.

  • Sorry Steve the output I provided was only what a partial.  I did run exactly what was provided and here is a screen shot of everything including what I found for the DelimitedSplit4k function.   This was run in tempdb on SQL 2022, but I don't think that matters.

    20240607.SQLQuery

  • The problem in Steve's solution is that is doesn't account for possible CR/LF data in the last field.  In your data - there is a CR\LF after the comma in the last field.

    So - what you have is "BLUEF," + CHAR(13) + CHAR(10) + "NA" - which displays as "BLUEF, NA" where that space is actually the CR\LF characters.

    Depending on your data - you may need to modify the replace statements to include separate replacements for CHAR(13), CHAR(10) and CHAR(13)+CHAR(10).

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thank you all so much.  I will need to do more than just "Look at the second half of the article".  I have read it twice and will read it again and again until I believe I understand it.

  • In case you want it, here's a function to prep the string as we discussed:

    DECLARE @string nvarchar(max);

    SET @string = '"A1","A2","A3","A4","A5"

    "000066666XYZ",00002,"","","1,000,000"';

    SET @string = REPLACE(@string, CHAR(13) + CHAR(10), ',');

    SELECT @string, dbo.prepare_string_for_split(@string, DEFAULT, DEFAULT, DEFAULT);

    SET ANSI_NULLS ON;
    SET QUOTED_IDENTIFIER ON;
    GO
    CREATE OR ALTER FUNCTION dbo.prepare_string_for_split
    (
    @string nvarchar(max),
    @split_char nchar(1) = N',',
    @control_char nchar(1) = N'"',
    @replacement_char nchar(1) = N'^'
    )
    RETURNS nvarchar(max) WITH RETURNS NULL ON NULL INPUT
    AS
    BEGIN
    DECLARE @byte1 int;
    DECLARE @byte2 int;
    DECLARE @string_len int;
    SET @string_len = LEN(@string);
    SET @byte1 = PATINDEX(N'%' + @control_char + N'%', @string);
    IF @byte1 = 0
    SET @byte1 = @string_len;

    WHILE @byte1 < @string_len
    BEGIN
    SET @byte2 = @byte1 + 1;
    WHILE @byte2 <= @string_len AND SUBSTRING(@string, @byte2, 1) NOT IN (@split_char, @control_char)
    SET @byte2 = @byte2 + 1;
    IF @byte2 >= @string_len
    BREAK;
    WHILE SUBSTRING(@string, @byte2, 1) = @split_char
    BEGIN
    SET @string = STUFF(@string, @byte2, 1, @replacement_char);
    WHILE @byte2 <= @string_len AND SUBSTRING(@string, @byte2, 1) NOT IN (@split_char, @control_char)
    SET @byte2 = @byte2 + 1;
    END /*WHILE*/ SET @byte1 = @byte2 + 1;
    WHILE @byte1 <= @string_len AND SUBSTRING(@string, @byte1, 1) NOT IN (@control_char)
    SET @byte1 = @byte1 + 1;
    END /*WHILE*/RETURN @string;
    END /*FUNCTION*/GO

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply