Count delimiters in field containing CSV

  • A table has a column that holds the contents of a csv file, text identified with double quotes. I need to find out how many delimiters each row contains so I can make each row have the same number by padding with commas as necessary - results will be saved as csv and then imported into a new table.

    The issues:

    Not all commas in the string are delimiters - some are identified as text

    The tags aren't known ahead of time and don't always follow a set pattern, other than having = after them and being preceded by a comma (if they aren't the first in the string)

    Identified text can also contain =.

    I can't replace the commas or equals with an empty string and compare with the original string length as they are not always all delimiters - sometimes they are identified text - and will later split text that should be in the same column across multiple columns. Here is an example:

    declare @temp table (id int, string varchar(100))

    insert into @temp

    values(1,'aa=Hello, ab=Goodbye, "ac=Farewell, goodbye", "ad=1""=""1"')

    ,(2,'xy=abc, zy0=cba,,')

    selectid

    ,string + REPLICATE(',',isnull(MAX(len(string) - len(REPLACE(string,',',''))) over()-(len(string) - len(REPLACE(string,',',''))),0))

    from @temp

    How can I count the number of tags in each string?

    Eventually I want to end up with a table that has three columns - id, tag, text so I'm not worried about which column each tag + text ends up in at the moment.

    Thanks

    SM

  • The solution below is based on the DelimitedSplit8K fuction referenced in my signature expanded by an additional output column

    Position = CHARINDEX(@pDelimiter,@pString,s.N1),

    This allows to check for the number of occurence of '"' to be either odd or even (to determine whether the comma is within quotation marks or not).

    Also, I added a max number to define the target number of commas per string.

    declare @temp table (id int, string varchar(100))

    insert into @temp

    values (1,'aa=Hello, ab=Goodbye, "ac=Farewell, goodbye", "ad=1""=""1"')

    , (2,'xy=abc, zy0=cba,,')

    DECLARE @max-2 INT -- max number of comma

    SET @max-2 =5;

    WITH cte AS

    (

    SELECT

    id,

    SUM(

    CASE WHEN (position -DATALENGTH(REPLACE(LEFT(string,position),'"','')))%2 = 0

    AND position >0

    THEN 1

    ELSE 0

    END) cnt

    FROM @temp t1

    CROSS APPLY [dbo].[DelimitedSplit8K](string,',')

    GROUP BY id

    )

    SELECT t.id, string + REPLICATE(',',@max - cnt)

    FROM cte

    INNER JOIN @temp t ON t.id=cte.id



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Here's another way. For any comma, count the number of double quotes preceeding it. If there is an even number count it, otherwise ignore.

    WITH CTE AS (

    SELECT t.id,

    t.string,

    SUBSTRING(t.string,n.number,1) AS ch,

    ROW_NUMBER() OVER(ORDER BY n.number) -

    ROW_NUMBER() OVER(PARTITION BY SUBSTRING(t.string,n.number,1) ORDER BY n.number) AS rnDiff

    FROM @temp t

    INNER JOIN master.dbo.spt_values n ON n.number BETWEEN 1 AND LEN(t.string)

    AND n.type='P'

    AND SUBSTRING(t.string,n.number,1) IN (',','"'))

    SELECT id,string + REPLICATE(',', MAX(COUNT(*)) OVER() - COUNT(*)),

    COUNT(*)+1 AS NumberOfTags,

    MAX(COUNT(*)) OVER()+1 AS MaxNumberOfTags

    FROM CTE

    WHERE ch=','

    AND rnDiff % 2 = 0

    GROUP BY id,string

    ORDER BY id;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Both good solutions - thanks!

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply