December 11, 2012 at 12:53 am
Hello all - I am having hard time to split an input string into multicolumn - multirows.
Task - Create a stored procedure that reads an input string with pre-defined field and row terminators; splits the string into multicolumn - multirows; and inserts records into a table.
The table structure is predefined:
create table #temp (field1 varchar(200), field2 varchar(200));
Here is an example input string: (the input string is of variable length; the fields are variable lengths as well)
30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|
";" = field terminator
"|" = row terminator
The stored proc should insert the values in this fashion:
field1 field2
3038469
3138470
3238471
3338472
3438473
3538474
3638475
3738476
3838477
Storing the input string as a text file is not an option. Life would have been much better if I could save the text file and use bulk insert!!
Any help is greatly appreciated. Thanks!!
December 11, 2012 at 1:54 am
This will do something like that.
It caters for any number of fields - just amend the final select for the number required
declare @s-2 varchar(8000)
select @s-2 = '30;38469;1|31;38470;1|32;38471;1|33;38472;1|34;38473;1|35;38474;1|36;38475;1|37;38476;1|38;38477;1|'
select @s-2 = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'
declare @rowterm varchar(1) = '|'
declare @fieldterm varchar(1) = ';'
;with cte as
(
select rowstrt = 1, rowend = charindex(@rowterm,@s)-1, seq = 1
union all
select rowstrt = rowend+2, rowend = charindex(@rowterm,@s,rowend+2)-1, seq = seq + 1
from cte where charindex(@rowterm,@s,rowend+2)<>0
) ,
cte2 as
(
select s = substring(@s, rowstrt,rowend-rowstrt+1), seq
from cte
) ,
cte3 as
(
select fldstrt = 1, fldend = charindex(@fieldterm,s)-1, seq, s, fldseq = 1 from cte2
union all
select fldstrt = fldend+2,
fldend = case when charindex(@fieldterm,s,fldend+2) <> 0 then charindex(@fieldterm,s,fldend+2)-1 else len(s) end,
seq, s, fldseq = fldseq+1
from cte3 where fldend < len(s)
)
select s1.s, s2.s, s3.s
from
(select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 1) s1
join (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 2) s2 on s1.seq = s2.seq
left join (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 3) s3 on s1.seq = s3.seq
Cursors never.
DTS - only when needed and never to control.
December 11, 2012 at 2:32 am
My hope is that Jeff Moden won't take um-bridge to my ripping apart of his code. . . 😀
I've based this answer on the 8K splitter[/url], with some modifications to make it split on multiple deliminators.
IF EXISTS (SELECT 1
FROM sysobjects
WHERE id = OBJECT_ID('SSC_Multi_Split') AND OBJECTPROPERTY(id, 'IsProcedure') = 1)
BEGIN
DROP PROCEDURE SSC_Multi_Split;
END
GO
CREATE PROCEDURE SSC_Multi_Split (@string VARCHAR(8000), @rowDeliminater CHAR(1), @colDeliminater CHAR(1))
AS
BEGIN
IF object_id('tempdb..#temporaryResultHolder') IS NOT NULL
BEGIN
DROP TABLE #temporaryResultHolder;
END;
WITH CTE1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
CTE2(N) AS (SELECT 1 FROM CTE1 X CROSS JOIN CTE1 Y),
CTE3(N) AS (SELECT 1 FROM CTE2 X CROSS JOIN CTE2 Y),
CTE4(N) AS (SELECT 1 FROM CTE3 X CROSS JOIN CTE3 Y),
CTE5(N) AS (SELECT 1 FROM CTE4 X CROSS JOIN CTE4 Y),
CTE6(N) AS (SELECT 1 FROM CTE5 X CROSS JOIN CTE5 Y),
TALLY(N) AS (SELECT TOP (ISNULL(DATALENGTH(@string),0))
ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM CTE6),
RowStart(start) AS (SELECT 1 UNION ALL
SELECT N+1
FROM TALLY
WHERE SUBSTRING(@string,N,1) = @rowDeliminater
),
RowLen(start,size) AS (SELECT start,
ISNULL(NULLIF(CHARINDEX(@rowDeliminater,@string,start),0)-start,ISNULL(DATALENGTH(@string),8000))
FROM RowStart
),
RowData(rowNumber,data) AS (SELECT ROW_NUMBER() OVER(ORDER BY start),
SUBSTRING(@string, start, size)
FROM RowLen
),
ColumnStart(start, rowNumber) AS (SELECT b.N, b.rowNumber
FROM (SELECT N+1, rowNumber
FROM TALLY
CROSS APPLY (SELECT rowNumber,data
FROM RowData) b
WHERE SUBSTRING(data,N,1) = @colDeliminater
)a(N, rowNumber)
CROSS APPLY (SELECT 1, rowNumber UNION ALL SELECT N, rowNumber)b(N, rowNumber)
),
ColumnLen(rowNumber,start,size,data) AS (SELECT b.rowNumber, start,
ISNULL(NULLIF(CHARINDEX(@colDeliminater,data,start),0)-start,ISNULL(DATALENGTH(data),8000)),
data
FROM ColumnStart a
CROSS APPLY (SELECT rowNumber,data
FROM RowData
WHERE a.rowNumber = rowNumber) b
),
allData(rowNumber,columnNumber,data) AS (SELECT rowNumber, ROW_NUMBER() OVER(PARTITION BY rowNumber ORDER BY start),
SUBSTRING(data, start, size)
FROM ColumnLen
)
SELECT rowNumber,columnNumber,data
INTO #temporaryResultHolder
FROM allData;
DECLARE @SQL NVARCHAR(MAX);
WITH CTE1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
CTE2(N) AS (SELECT 1 FROM CTE1 X CROSS JOIN CTE1 Y),
CTE3(N) AS (SELECT 1 FROM CTE2 X CROSS JOIN CTE2 Y),
CTE4(N) AS (SELECT 1 FROM CTE3 X CROSS JOIN CTE3 Y),
CTE5(N) AS (SELECT 1 FROM CTE4 X CROSS JOIN CTE4 Y),
CTE6(N) AS (SELECT 1 FROM CTE5 X CROSS JOIN CTE5 Y),
TALLY(N) AS (SELECT TOP (ISNULL(DATALENGTH(@string),0))
ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM CTE6)
SELECT @SQL = STUFF((SELECT ','+CHAR(13)+CHAR(10)+'MAX(CASE WHEN columnNumber = '+CAST(N AS VARCHAR(20))+' THEN data ELSE '+CHAR(39)+CHAR(39)+' END) AS field'+CAST(N AS VARCHAR(20))
FROM TALLY
CROSS APPLY (SELECT TOP 1 columnNumber
FROM #temporaryResultHolder
ORDER BY columnNumber DESC) b
WHERE columnNumber >= N
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)'),1,3,'');
SELECT @SQL = 'SELECT '+@SQL+CHAR(13)+CHAR(10)+'FROM #temporaryResultHolder'+CHAR(13)+CHAR(10)+'GROUP BY rowNumber'+CHAR(13)+CHAR(10)+'ORDER BY rowNumber';
EXECUTE sp_executeSQL @SQL;
END
GO
To test: -
EXECUTE SSC_Multi_Split '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|','|',';';
Which returns: -
field1 field2
------ -------
30 38469
31 38470
32 38471
33 38472
34 38473
35 38474
36 38475
37 38476
38 38477
December 11, 2012 at 2:46 am
Thanks much, nigelrivett .. this is a great help! I could understand the code. Really appreciate it.
December 11, 2012 at 2:47 am
Thanks Cadavre. This little too complicated for a newbie. But I'll go through the code and understand the beauty of it. Appreciate your quick reply.
December 11, 2012 at 4:09 am
Just some food for thought:
declare @TestString varchar(8000) = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|';
select
max(case ds2.ItemNumber when 1 then ds2.Item else '' end) as field1,
max(case ds2.ItemNumber when 2 then ds2.Item else '' end) as field2
from
dbo.DelimitedSplit8K(@TestString,'|') ds1
cross apply dbo.DelimitedSplit8K(ds1.Item,';') ds2
where
ds1.Item <> ''
group by
ds1.ItemNumber;
Uses Jeff's DelimitedSplit8K function as is.
You will find the code for the dbo.DelimitedSplit8K function here: http://www.sqlservercentral.com/articles/Tally+Table/72993/.
December 11, 2012 at 8:28 am
Lynn Pettis (12/11/2012)
Just some food for thought:
declare @TestString varchar(8000) = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|';
select
max(case ds2.ItemNumber when 1 then ds2.Item else '' end) as field1,
max(case ds2.ItemNumber when 2 then ds2.Item else '' end) as field2
from
dbo.DelimitedSplit8K(@TestString,'|') ds1
cross apply dbo.DelimitedSplit8K(ds1.Item,';') ds2
where
ds1.Item <> ''
group by
ds1.ItemNumber;
Uses Jeff's DelimitedSplit8K function as is.
You will find the code for the dbo.DelimitedSplit8K function here: http://www.sqlservercentral.com/articles/Tally+Table/72993/.
You're right, that's a much better idea. To make it work for the OPs requirements, I'd stick it in a sproc to generate the result-set dynamically so that the number of "field" doesn't need to be known before-hand.
So first, we need Jeff's splitter: -
IF EXISTS (SELECT 1
FROM sysobjects
WHERE id = OBJECT_ID('DelimitedSplit8K'))
BEGIN
DROP FUNCTION [dbo].[DelimitedSplit8K];
END
GO
CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
GO
Then we need the calling sproc: -
IF EXISTS (SELECT 1
FROM sysobjects
WHERE id = OBJECT_ID('SSC_Multi_Split'))
BEGIN
DROP PROCEDURE SSC_Multi_Split;
END
GO
CREATE PROCEDURE SSC_Multi_Split (@string VARCHAR(8000), @rowDeliminater CHAR(1), @colDeliminater CHAR(1))
AS
BEGIN
IF object_id('tempdb..#temporaryResultHolder') IS NOT NULL
BEGIN
DROP TABLE #temporaryResultHolder;
END;
SELECT ds1.ItemNumber AS rowNumber, ds2.ItemNumber AS columnNumber, ds2.Item AS data
INTO #temporaryResultHolder
FROM dbo.DelimitedSplit8K(@string, @rowDeliminater) ds1
CROSS APPLY dbo.DelimitedSplit8K(ds1.Item, @colDeliminater) ds2
WHERE ds1.Item <> '';
DECLARE @SQL NVARCHAR(MAX);
WITH CTE1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
CTE2(N) AS (SELECT 1 FROM CTE1 X CROSS JOIN CTE1 Y),
CTE3(N) AS (SELECT 1 FROM CTE2 X CROSS JOIN CTE2 Y),
CTE4(N) AS (SELECT 1 FROM CTE3 X CROSS JOIN CTE3 Y),
CTE5(N) AS (SELECT 1 FROM CTE4 X CROSS JOIN CTE4 Y),
CTE6(N) AS (SELECT 1 FROM CTE5 X CROSS JOIN CTE5 Y),
TALLY(N) AS (SELECT TOP (ISNULL(DATALENGTH(@string),0))
ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM CTE6)
SELECT @SQL = STUFF((SELECT ','+CHAR(13)+CHAR(10)+'MAX(CASE WHEN columnNumber = '+CAST(N AS VARCHAR(20))+' THEN data ELSE '+CHAR(39)+CHAR(39)+' END) AS field'+CAST(N AS VARCHAR(20))
FROM TALLY
CROSS APPLY (SELECT TOP 1 columnNumber
FROM #temporaryResultHolder
ORDER BY columnNumber DESC) b
WHERE columnNumber >= N
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)'),1,3,'');
SELECT @SQL = 'SELECT '+@SQL+CHAR(13)+CHAR(10)+'FROM #temporaryResultHolder'+CHAR(13)+CHAR(10)+'GROUP BY rowNumber'+CHAR(13)+CHAR(10)+'ORDER BY rowNumber';
EXECUTE sp_executeSQL @SQL;
END
GO
And we execute as follows: -
EXECUTE SSC_Multi_Split '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|','|',';';
December 11, 2012 at 8:59 am
Thanks Cadavre. This is cool stuff.
Can your code take two row terminators? pipe (|) and line break ()?
Example data:
field1;field2|
30;38469|
31;38469|
32;38469|
33;38469|
34;38469|
I didn't notice the line break in the original data. Very sorry.
The intent is that my user copies example data from excel and enters into a multiline text box in a webpage. He then hits a button that would transfer the input string (example data) to a stored proc that reads the data and splits into multiple column - multiple rows.
Thanks for all your help.
December 11, 2012 at 9:20 am
murthyvs (12/11/2012)
Thanks Cadavre. This is cool stuff.Can your code take two row terminators? pipe (|) and line break ()?
Example data:
field1;field2|
30;38469|
31;38469|
32;38469|
33;38469|
34;38469|
I didn't notice the line break in the original data. Very sorry.
The intent is that my user copies example data from excel and enters into a multiline text box in a webpage. He then hits a button that would transfer the input string (example data) to a stored proc that reads the data and splits into multiple column - multiple rows.
Thanks for all your help.
Why not just remove the line breaks?
DECLARE @exampleData VARCHAR(8000) = '30;38469|
31;38469|
32;38469|
33;38469|
34;38469|';
SET @exampleData = REPLACE(REPLACE(@exampleData,CHAR(13),''),CHAR(10),'');
EXECUTE SSC_Multi_Split @exampleData,'|',';'
December 11, 2012 at 11:47 am
Awesome .. thanks!!
December 11, 2012 at 6:51 pm
Here's another way:
DECLARE @MyString VARCHAR(8000) =
'30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'
SELECT col1=MAX(CASE ItemNumber%4 WHEN 1 THEN Item END)
,col2=MAX(CASE ItemNumber%4 WHEN 3 THEN Item END)
FROM PatternSplitCM(@MyString, '[0-9]')
WHERE [Matched] = 1
GROUP BY ItemNumber/4
PatternSplitCM is described here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/
Not sure it will be faster than using DelimitedSplit8K but since you're calling that function twice, it might give it a run for its money.
Since this is posted in a SQL 2005 forum, you'll need to note that the Tally Table CTE used in PatternSplitCM in the article requires SQL 2008. However it is easily replaced by an Itzik Ben-Gan style CTE tally table like the one that appears (I think) in PatternSplitQU (also in the article).
Edit: Revised (simplified) my vector of attack slightly.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 14, 2012 at 6:00 pm
Hi - the solution works as expected. Is there anyway to increase the size of the input string to 100,000. Its only limited to 8,000.
Thanks much!
December 14, 2012 at 8:31 pm
Jeff has said (he can correct me if I'm wrong) that DelimitedSplit8K is optimized for VARCHAR(8000) so change it to VARCHAR(MAX) with that in mind.
I think although I don't have rigid testing results that PatternSplitCM is not significantly affected if you change to VARCHAR(MAX).
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
December 14, 2012 at 8:38 pm
Yes, I have tried it. I have observed performance degradation. Query runs much slower with VARCHAR(MAX).
December 14, 2012 at 8:49 pm
murthyvs (12/14/2012)
Hi - the solution works as expected. Is there anyway to increase the size of the input string to 100,000. Its only limited to 8,000.Thanks much!
Unless your example is just a striking coincidence, I've seen such a thing before and I'd like to make a totally alternate suggestion if what I suspect is actually true.
It would appear that both fields in the string are sequential in their own right. If that is true (as it was with what I've seen before), then it will save you a huge amount of aggravation, the server a huge amount of processing time and I/O, your network an unnecessary blurb of 100K bytes for each usage, and the loss of feeling in your left arm when everyone jumps on you for resource usage, if you were to send just a starting value for each field and the number of field-pairs and let SQL very quickly generate even millions of pairs more quickly than you could ever transmit them.
For example, your example has fields that start with 30 and 38469 respectively and then both increment by 1 for 9 pairs of fields. Using the method I proposed above, you would pass only those 3 parameters over the pipe and then let SQL Server have at it. That would be capable of generating millions of rows if you needed it an it would happen a whole lot quicker than trying to send a million such field pairs over the pipe. For example, here code that does what I said o a million rows. I believe you'll find it to be remarkably quick.
--===== These would be parameters in a stored procedure
DECLARE @F1Start INT,
@F2Start INT,
@Pairs INT
;
SELECT @F1Start = 30,
@F2Start = 38469,
@Pairs = 1000000
;
--===== This would do the deed as previously described.
WITH
cteTally AS
(
SELECT TOP (@Pairs)
N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
)
SELECT Field1 = @F1Start + N,
Field2 = @F2Start + N
INTO #Temp
FROM cteTally
;
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 57 total)
You must be logged in to reply to this topic. Login to reply