February 22, 2012 at 11:45 pm
A table has a column that holds the contents of a csv file, text identified with double quotes. I need to find out how many delimiters each row contains so I can make each row have the same number by padding with commas as necessary - results will be saved as csv and then imported into a new table.
The issues:
Not all commas in the string are delimiters - some are identified as text
The tags aren't known ahead of time and don't always follow a set pattern, other than having = after them and being preceded by a comma (if they aren't the first in the string)
Identified text can also contain =.
I can't replace the commas or equals with an empty string and compare with the original string length as they are not always all delimiters - sometimes they are identified text - and will later split text that should be in the same column across multiple columns. Here is an example:
declare @temp table (id int, string varchar(100))
insert into @temp
values(1,'aa=Hello, ab=Goodbye, "ac=Farewell, goodbye", "ad=1""=""1"')
,(2,'xy=abc, zy0=cba,,')
selectid
,string + REPLICATE(',',isnull(MAX(len(string) - len(REPLACE(string,',',''))) over()-(len(string) - len(REPLACE(string,',',''))),0))
from @temp
How can I count the number of tags in each string?
Eventually I want to end up with a table that has three columns - id, tag, text so I'm not worried about which column each tag + text ends up in at the moment.
Thanks
SM
February 23, 2012 at 5:14 am
The solution below is based on the DelimitedSplit8K fuction referenced in my signature expanded by an additional output column
Position = CHARINDEX(@pDelimiter,@pString,s.N1),
This allows to check for the number of occurence of '"' to be either odd or even (to determine whether the comma is within quotation marks or not).
Also, I added a max number to define the target number of commas per string.
declare @temp table (id int, string varchar(100))
insert into @temp
values (1,'aa=Hello, ab=Goodbye, "ac=Farewell, goodbye", "ad=1""=""1"')
, (2,'xy=abc, zy0=cba,,')
DECLARE @max-2 INT -- max number of comma
SET @max-2 =5;
WITH cte AS
(
SELECT
id,
SUM(
CASE WHEN (position -DATALENGTH(REPLACE(LEFT(string,position),'"','')))%2 = 0
AND position >0
THEN 1
ELSE 0
END) cnt
FROM @temp t1
CROSS APPLY [dbo].[DelimitedSplit8K](string,',')
GROUP BY id
)
SELECT t.id, string + REPLICATE(',',@max - cnt)
FROM cte
INNER JOIN @temp t ON t.id=cte.id
February 23, 2012 at 6:47 am
Here's another way. For any comma, count the number of double quotes preceeding it. If there is an even number count it, otherwise ignore.
WITH CTE AS (
SELECT t.id,
t.string,
SUBSTRING(t.string,n.number,1) AS ch,
ROW_NUMBER() OVER(ORDER BY n.number) -
ROW_NUMBER() OVER(PARTITION BY SUBSTRING(t.string,n.number,1) ORDER BY n.number) AS rnDiff
FROM @temp t
INNER JOIN master.dbo.spt_values n ON n.number BETWEEN 1 AND LEN(t.string)
AND n.type='P'
AND SUBSTRING(t.string,n.number,1) IN (',','"'))
SELECT id,string + REPLICATE(',', MAX(COUNT(*)) OVER() - COUNT(*)),
COUNT(*)+1 AS NumberOfTags,
MAX(COUNT(*)) OVER()+1 AS MaxNumberOfTags
FROM CTE
WHERE ch=','
AND rnDiff % 2 = 0
GROUP BY id,string
ORDER BY id;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537February 23, 2012 at 10:11 am
Both good solutions - thanks!
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply