Split input string into multicolumn - multirows

Question

Split input string into multicolumn - multirows

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 1

dwain.c (12/14/2012)
Jeff has said (he can correct me if I'm wrong) that DelimitedSplit8K is optimized for VARCHAR(8000) so change it to VARCHAR(MAX) with that in mind.
I think although I don't have rigid testing results that PatternSplitCM is not significantly affected if you change to VARCHAR(MAX).

It is, indeed, optimized for 8K. You also have to remember that VARCHAR(MAX) doesn't like to be joined to even by a Tally Table and will cause at least a 2 to 1 slowdown just by changing the function variables from VARCHAR(8000) to VARCHAR(MAX) even if the data stays under 8K.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

murthyvs SSCommitted Points: 1799 More actions · Answer 2

Hi Jeff - Sorry those things are not pairs/dont increment by one. I just made that example in excel.

murthyvs SSCommitted Points: 1799 More actions · Answer 3

Yes, DelimitedSplit8K is miraculously optimized for VARCHAR(8000). It runs very fast!

murthyvs SSCommitted Points: 1799 More actions · Answer 4

Hi - Quick question. Do you know why VARCHAR(MAX) doesnt work in this query?

It returns the following error:

Msg 240, Level 16, State 1, Line 110

Types don't match between the anchor and the recursive part in column "rowstrt" of recursive query "cte".

Thanks!

nigelrivett (12/11/2012)
This will do something like that.
It caters for any number of fields - just amend the final select for the number required
declare @s-2 varchar(8000)
select @s-2 = '30;38469;1|31;38470;1|32;38471;1|33;38472;1|34;38473;1|35;38474;1|36;38475;1|37;38476;1|38;38477;1|'
select @s-2 = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'
declare @rowterm varchar(1) = '|'
declare @fieldterm varchar(1) = ';'
;with cte as
(
select rowstrt = 1, rowend = charindex(@rowterm,@s)-1, seq = 1
union all
select rowstrt = rowend+2, rowend = charindex(@rowterm,@s,rowend+2)-1, seq = seq + 1
from cte where charindex(@rowterm,@s,rowend+2)<>0
) ,
cte2 as
(
select s = substring(@s, rowstrt,rowend-rowstrt+1), seq
from cte
) ,
cte3 as
(
select fldstrt = 1, fldend = charindex(@fieldterm,s)-1, seq, s, fldseq = 1 from cte2
union all
select fldstrt = fldend+2,
fldend = case when charindex(@fieldterm,s,fldend+2) <> 0 then charindex(@fieldterm,s,fldend+2)-1 else len(s) end,
seq, s, fldseq = fldseq+1
from cte3 where fldend < len(s)
)
select s1.s, s2.s, s3.s
from
(select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 1) s1
join (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 2) s2 on s1.seq = s2.seq
left join (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 3) s3 on s1.seq = s3.seq

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 5

Yes... both variables and any/all string literals must be defined as VARCHAR(MAX) including the ones that hold the single character delimiters. It's one of the great joys of recursive CTE's. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 6

murthyvs (12/14/2012)
Hi Jeff - Sorry those things are not pairs/dont increment by one. I just made that example in excel.

Like I said... remarkable coincidence. Thanks for the feedback.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

murthyvs SSCommitted Points: 1799 More actions · Answer 7

dwain.c (12/11/2012)
Here's another way:
DECLARE @MyString VARCHAR(8000) =
'30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'
SELECT col1=MAX(CASE ItemNumber%4 WHEN 1 THEN Item END)
,col2=MAX(CASE ItemNumber%4 WHEN 3 THEN Item END)
FROM PatternSplitCM(@MyString, '[0-9]')
WHERE [Matched] = 1
GROUP BY ItemNumber/4
PatternSplitCM is described here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/
Not sure it will be faster than using DelimitedSplit8K but since you're calling that function twice, it might give it a run for its money.
Since this is posted in a SQL 2005 forum, you'll need to note that the Tally Table CTE used in PatternSplitCM in the article requires SQL 2008. However it is easily replaced by an Itzik Ben-Gan style CTE tally table like the one that appears (I think) in PatternSplitQU (also in the article).
Edit: Revised (simplified) my vector of attack slightly.

Hi - Just exploring some VARCHAR(MAX) options. Will patternsplitcm work for this dataset?

dataset1:

A123W1;EESLGKNR912|

A12W2;EESLGKNR913|

A3W3;EESLGR914|

123W4;12EESLGKNR915|

123W5;EESL12NR916|

A123W;E31LGKNR917|

A123W7;EE18|

A123W8;1234|

1238;1234|

dataset2:

A123W1EESLGKNR912

A12W2EESLGKNR913

A3W3EESLGR914

123W412EESLGKNR915

123W5EESL12NR916

A123WE31LGKNR917

A123W7EE18

A123W81234

12381234

Thanks

mister.magoo SSC-Forever Points: 47174 More actions · Answer 8

Here's a way that uses XML, which you could generate in the web application before passing the data to SQL, or even just convert to XML and pump straight into a SQL table all within the web app...

Anyway, this is the SQL version:

-- set up some sample input data

declare @input varchar(max);

set @input = '30;38469|

31;38470|

32;38471|

33;38472|

34;38473|

35;38474|

36;38475|

37;38476|

38;38477|'

-- convert to xml

-- of course, this could be done in your web app and may be quicker.

declare @xml xml;

SET @xml =

CONVERT(xml,

'<row><col>'+

REPLACE(

@input,

CHAR(13),

''

),

CHAR(10),

''

),

'|',

'</col></row><row><col>'

),

';',

'</col><col>'

)

+'</col></row>'

);

-- and select the values out as rows/columns

select

nd.value('(col/text())[1]','varchar(200)') as field1,

nd.value('(col/text())[2]','varchar(200)') as field2

from @xml.nodes('row') as x(nd)

where not nd.value('(col/text())[1]','varchar(200)') is null

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Dwain Camps SSC Guru Points: 86978 More actions · Answer 9

murthyvs (12/14/2012)
dwain.c (12/11/2012)
Here's another way:
DECLARE @MyString VARCHAR(8000) =
'30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'
SELECT col1=MAX(CASE ItemNumber%4 WHEN 1 THEN Item END)
,col2=MAX(CASE ItemNumber%4 WHEN 3 THEN Item END)
FROM PatternSplitCM(@MyString, '[0-9]')
WHERE [Matched] = 1
GROUP BY ItemNumber/4
PatternSplitCM is described here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/
Not sure it will be faster than using DelimitedSplit8K but since you're calling that function twice, it might give it a run for its money.
Since this is posted in a SQL 2005 forum, you'll need to note that the Tally Table CTE used in PatternSplitCM in the article requires SQL 2008. However it is easily replaced by an Itzik Ben-Gan style CTE tally table like the one that appears (I think) in PatternSplitQU (also in the article).
Edit: Revised (simplified) my vector of attack slightly.
Hi - Just exploring some VARCHAR(MAX) options. Will patternsplitcm work for this dataset?
dataset1:
A123W1;EESLGKNR912|
A12W2;EESLGKNR913|
A3W3;EESLGR914|
123W4;12EESLGKNR915|
123W5;EESL12NR916|
A123W;E31LGKNR917|
A123W7;EE18|
A123W8;1234|
1238;1234|
dataset2:
A123W1EESLGKNR912
A12W2EESLGKNR913
A3W3EESLGR914
123W412EESLGKNR915
123W5EESL12NR916
A123WE31LGKNR917
A123W7EE18
A123W81234
12381234
Thanks

For dataset1 the answer is yes. Use pattern: '[0-9A-Z]' or if the database is case sensitive: '[0-9A-Za-z]'

For dataset2 it is possible the same pattern will yield the results you seek, except it is unclear what the row split character is. CHAR(10) or CHAR(15) perhaps?

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

murthyvs SSCommitted Points: 1799 More actions · Answer 10

Hi - this a nice method. didnt know this before.

I have created this function and using it in a stored proc..

CREATE FUNCTION [dbo].[FnXmlSplit](@input_string VARCHAR(max))

RETURNS @temptable TABLE (

field1 VARCHAR(max)

,field2 VARCHAR(max))

AS

BEGIN

DECLARE @xml XML;

SET @xml = CONVERT(XML, '<row><col>'

+ REPLACE(REPLACE(@input_string, CHAR(13), '</col></row><row><col>'), CHAR(9), '</col><col>' )

+ '</col></row>');

INSERT @temptable

select

Ltrim(Rtrim(replace(replace(field1,char(10),''),char(9),''))) as field1, Ltrim(Rtrim(replace(replace(field2,char(10),''),char(9),''))) as field2

from (

SELECT nd.value('(col/text())[1]', 'varchar(200)') AS field1

,nd.value('(col/text())[2]', 'varchar(200)') AS field2

FROM @xml.nodes('row') AS x(nd)

WHERE NOT nd.value('(col/text())[1]', 'varchar(200)') IS NULL) as a;

RETURN;

END;

Function call in the stored proc: select * from dbo.FnXmlSplit (@input_string).

For some reason, this method when used inside a stored proc, it runs slow. Any idea why? Thanks!

mister.magoo (12/15/2012)
Here's a way that uses XML, which you could generate in the web application before passing the data to SQL, or even just convert to XML and pump straight into a SQL table all within the web app...
Anyway, this is the SQL version:
-- set up some sample input data
declare @input varchar(max);
set @input = '30;38469|
31;38470|
32;38471|
33;38472|
34;38473|
35;38474|
36;38475|
37;38476|
38;38477|'
-- convert to xml
-- of course, this could be done in your web app and may be quicker.
declare @xml xml;
SET @xml =
CONVERT(xml,
'<row><col>'+
REPLACE(
REPLACE(
REPLACE(
REPLACE(
@input,
CHAR(13),
''
),
CHAR(10),
''
),
'|',
'</col></row><row><col>'
),
';',
'</col><col>'
)
+'</col></row>'
);
-- and select the values out as rows/columns
select
nd.value('(col/text())[1]','varchar(200)') as field1,
nd.value('(col/text())[2]','varchar(200)') as field2
from @xml.nodes('row') as x(nd)
where not nd.value('(col/text())[1]','varchar(200)') is null

Edited: My input is two columns of excel rows

murthyvs SSCommitted Points: 1799 More actions · Answer 11

Hi dwain.c - I tried changing VARCHAR(8000) to VARCHAR(MAX). Ran the query with new pattern [0-9A-Za-z]. My input string is a very long string.

sample:

3038469

3138470

The results were as expected BUT the query returns fewer than expected rows. I believe somewhere in the function its hitting the 8000 limit. Any ideas? Thanks!

mister.magoo SSC-Forever Points: 47174 More actions · Answer 12

murthyvs (12/16/2012)
Hi - this a nice method. didnt know this before.
I have created this function and using it in a stored proc..
Function call in the stored proc: select * from dbo.FnXmlSplit (@input_string).
For some reason, this method when used inside a stored proc, it runs slow. Any idea why? Thanks!
Edited: My input is two columns of excel rows

Yeah, it's not the fastest way to do it.

If you want speed, split the data at source or persist with dwain's patter split or Jeff's DelimitedSplit8K

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mtassin SSC-Insane Points: 23099 More actions · Answer 13

murthyvs (12/16/2012)
Hi dwain.c - I tried changing VARCHAR(8000) to VARCHAR(MAX). Ran the query with new pattern [0-9A-Za-z]. My input string is a very long string.
sample:
3038469
3138470
The results were as expected BUT the query returns fewer than expected rows. I believe somewhere in the function its hitting the 8000 limit. Any ideas? Thanks!

The Delimittedsplit8k only counts up to E4's max which is 10,000. If you have more than 10,000 characters in the string you'll need to increase the size of the tally based CTE by creating an E8 which would cross 10,000 by 10,000 and handle

100,000,000 columns of data for searching for delimitters.

--Mark Tassin
MCITP - SQL Server DBA
Proud member of the Anti-RBAR alliance.
For help with Performance click this link[/url]
For tips on how to post your problems[/url]

Dwain Camps SSC Guru Points: 86978 More actions · Answer 14

mtassin (12/17/2012)
murthyvs (12/16/2012)
Hi dwain.c - I tried changing VARCHAR(8000) to VARCHAR(MAX). Ran the query with new pattern [0-9A-Za-z]. My input string is a very long string.
sample:
3038469
3138470
The results were as expected BUT the query returns fewer than expected rows. I believe somewhere in the function its hitting the 8000 limit. Any ideas? Thanks!
The Delimittedsplit8k only counts up to E4's max which is 10,000. If you have more than 10,000 characters in the string you'll need to increase the size of the tally based CTE by creating an E8 which would cross 10,000 by 10,000 and handle
100,000,000 columns of data for searching for delimitters.

The same applies to the Tally tables generated within PatternSplitCM.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Steven Willis SSCrazy Eights Points: 9893 More actions · Answer 15

I know Jeff Moden's splitter function has already been discussed. But this is how I would do it with no aggregates or case statements:

DECLARE

@InputString VARCHAR(8000)

,@Delimiter1 CHAR(1)

,@Delimiter2 CHAR(1)

SET @InputString = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'

SET @Delimiter1 = '|'

SET @Delimiter2 = ';'

--to get rid of the extra '|' character at the end

SET @InputString = LEFT(@InputString,LEN(@InputString)-1)

SELECT

(SELECT Item FROM dbo.DelimitedSplit8K(r1.Item,@Delimiter2) WHERE ItemNumber = 1) AS Col1

,(SELECT Item FROM dbo.DelimitedSplit8K(r1.Item,@Delimiter2) WHERE ItemNumber = 2) AS Col2

FROM

(

SELECT

dsk1.Item

FROM

dbo.DelimitedSplit8K(@InputString,@Delimiter1) AS dsk1

) r1

Output:

Col1Col2

3038469

3138470

3238471

3338472

3438473

3538474

3638475

3738476

3838477