tsql help

Question

tsql help

vsamantha35

SSChampion

Points: 11551
More actions
October 4, 2017 at 5:25 am

#340583

Hi Experts,
Need tsql help.
I have a table row as follows. Basically, it will contain a huge string values delimited by a pipe '|' symbol.
requirement is to spilt single column value into multiple columns. i.e. in below rows , i need to see 9 different columns values are split based on '|' delimeter.
How to do that?
Test data
===========
create table t3
(c1 varchar(max)
)
insert into t3
select 'abc|203|12|jsjsjsj HILLS|test|abc XXXXXX o1123|9389||'
output should be
c1 c2 c3 c4 c5 c6 c7 c8 c9
abc 203 12 jsjsjsj HILLS test abc XXXXXX o1123 9389
Can anyone please help on this.
Thanks in advance.
Sam

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99137 More actions · Answer 1

One method, using Jeff's DelimitedSplit8K function:
WITH PT AS ( SELECT * FROM t3 CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS) SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1, MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2, MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3, MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4, MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5, MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6, MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7, MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8, MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9 FROM PT;

This could be shortened to without the CTE.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

Phil Parkin SSC Guru Points: 247125 More actions · Answer 2

Thom A - Wednesday, October 4, 2017 5:39 AM
One method, using Jeff's DelimitedSplit8K function:
WITH PT AS ( SELECT * FROM t3 CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS) SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1, MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2, MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3, MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4, MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5, MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6, MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7, MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8, MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9 FROM PT;
This could be shortened to without the CTE.

Yes, this does the job. Note that the ELSE NULL part above is the default for CASE, thus the query may be shortened:
WITH PT AS ( SELECT * FROM t3 CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS) SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item END) AS c1, MAX(CASE WHEN ItemNumber = 2 THEN Item END) AS c2, MAX(CASE WHEN ItemNumber = 3 THEN Item END) AS c3, MAX(CASE WHEN ItemNumber = 4 THEN Item END) AS c4, MAX(CASE WHEN ItemNumber = 5 THEN Item END) AS c5, MAX(CASE WHEN ItemNumber = 6 THEN Item END) AS c6, MAX(CASE WHEN ItemNumber = 7 THEN Item END) AS c7, MAX(CASE WHEN ItemNumber = 8 THEN Item END) AS c8, MAX(CASE WHEN ItemNumber = 9 THEN Item END) AS c9 FROM PT;

sgmunson SSC Guru Points: 110640 More actions · Answer 3

Thom A - Wednesday, October 4, 2017 5:39 AM
One method, using Jeff's DelimitedSplit8K function:
WITH PT AS ( SELECT * FROM t3 CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS) SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1, MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2, MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3, MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4, MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5, MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6, MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7, MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8, MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9 FROM PT;
This could be shortened to without the CTE.

FYI, Jeff's DelimitedSplit8K routine only handles varchar up to 8000 characters, and does NOT handle strings longer than that. For most situations such as this, however, varchar(max) is overkill to begin with, so I usually advise against it's use. I'd take the time to figure out how large each of the 9 possible values can be, and size the field to be able to exactly fit that many characters, including the pipe symbols, unless you have control over the database table, in which case I'd never let that string concatenation into the database in the first place, and would separate it on the way in using that splitter function.

Steve (aka sgmunson)
Rent Servers for Income (picks and shovels strategy)

Avi1 SSCrazy Points: 2323 More actions · Answer 4

If you want to process in bulk (expecting more rows and lot of columns) then you can explore BCP route as well, spit all the data as txt file (using BCP) and load it in the table using BCP. if your target is in different server it will surely benefit in terms of performance. you can test it in your scenario

Phil Parkin SSC Guru Points: 247125 More actions · Answer 5

Avi1 - Wednesday, October 4, 2017 12:33 PM
If you want to process in bulk (expecting more rows and lot of columns) then you can explore BCP route as well, spit all the data as txt file (using BCP) and load it in the table using BCP. if your target is in different server it will surely benefit in terms of performance. you can test it in your scenario

This is a rather bold statement, which I suggest you need to back up with some evidence of your testing of the different scenarios.

Avi1 SSCrazy Points: 2323 More actions · Answer 6

We implemented it few years back in 2008 R2, We had one stage table which was loaded by external process one record in the table was actually one delimited record 30+ columns, everytop of hour 200k records. these rows have some text columns. Stage table was in a different server, we used BCP to generate the file in stage server and pushed data from there using BCP. we tried few other options but this worked well. I don't have the numbers, but that worked well there.

Phil Parkin SSC Guru Points: 247125 More actions · Answer 7

Avi1 - Wednesday, October 4, 2017 1:09 PM
We implemented it few years back in 2008 R2, We had one stage table which was loaded by external process one record in the table was actually one delimited record 30+ columns, everytop of hour 200k records. these rows have some text columns. Stage table was in a different server, we used BCP to generate the file in stage server and pushed data from there using BCP. we tried few other options but this worked well. I don't have the numbers, but that worked well there.

I'm sure it did. But, as you've probably already realised, if you use absolutes on this forum, you will generally be asked for proof ... hence my post

Thom A SSC Guru Points: 99137 More actions · Answer 8

Avi1 - Wednesday, October 4, 2017 1:09 PM
We implemented it few years back in 2008 R2, We had one stage table which was loaded by external process one record in the table was actually one delimited record 30+ columns, everytop of hour 200k records. these rows have some text columns. Stage table was in a different server, we used BCP to generate the file in stage server and pushed data from there using BCP. we tried few other options but this worked well. I don't have the numbers, but that worked well there.

Perhapsyyou could provide a sample command that might display these benefits? I have several areas where I have to split out delimited strings before moving onto the next task at hand. If you are proposing there is a quicker way, then you have my attention.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

Alan Burstein SSC Guru Points: 61141 More actions · Answer 9

sgmunson - Wednesday, October 4, 2017 6:19 AM
Thom A - Wednesday, October 4, 2017 5:39 AM
One method, using Jeff's DelimitedSplit8K function:
WITH PT AS ( SELECT * FROM t3 CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS) SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1, MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2, MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3, MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4, MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5, MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6, MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7, MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8, MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9 FROM PT;
This could be shortened to without the CTE.
FYI, Jeff's DelimitedSplit8K routine only handles varchar up to 8000 characters, and does NOT handle strings longer than that. For most situations such as this, however, varchar(max) is overkill to begin with, so I usually advise against it's use. I'd take the time to figure out how large each of the 9 possible values can be, and size the field to be able to exactly fit that many characters, including the pipe symbols, unless you have control over the database table, in which case I'd never let that string concatenation into the database in the first place, and would separate it on the way in using that splitter function.

+1000 To all of this.

3NF is the best approach. If that's not an option then the next best thing thing to do is drop down to at varchar(8000) or less.

As to faster ways to split that string here's three...

1. CROSS APPLY
You have much, much better performance doing a Cascading Cross Apply like so:
DECLARE @string varchar(max) = 'abc|203|12|jsjsjsj HILLS|test|abc XXXXXX o1123|9389||';

SELECT col1 = substring(v.s, 1, d1.p-2), col2 = substring(v.s, d1.p, d2.p-(d1.p+1)), col3 = substring(v.s, d2.p, d3.p-(d2.p+1)), col4 = substring(v.s, d3.p, d4.p-(d3.p+1)), col5 = substring(v.s, d4.p, d5.p-(d4.p+1)), col6 = substring(v.s, d5.p, d6.p-(d5.p+1)), col7 = substring(v.s, d6.p, d7.p-(d6.p+1)), col8 = substring(v.s, d7.p, d8.p-(d7.p+1)), col9 = substring(v.s, d8.p, v.l) FROM (VALUES (@string, datalength(@string))) v(s,l) CROSS APPLY (VALUES (charindex('|', v.s)+1)) d1(p) CROSS APPLY (VALUES (charindex('|', v.s, d1.p)+1)) d2(p) CROSS APPLY (VALUES (charindex('|', v.s, d2.p)+1)) d3(p) CROSS APPLY (VALUES (charindex('|', v.s, d3.p)+1)) d4(p) CROSS APPLY (VALUES (charindex('|', v.s, d4.p)+1)) d5(p) CROSS APPLY (VALUES (charindex('|', v.s, d5.p)+1)) d6(p) CROSS APPLY (VALUES (charindex('|', v.s, d6.p)+1)) d7(p) CROSS APPLY (VALUES (charindex('|', v.s, d7.p)+1)) d8(p) CROSS APPLY (VALUES (charindex('|', v.s, d8.p)+1)) d9(p);

2. Indexed View
The fastest way, by far is to split the string ahead of time via indexed view (note this thread.) This is sometimes an option If the data is pretty static as is the case in data warehouse environments.

3. NGrams2b and LEAD
For fun I put together how you would do this using NGrams2b (found here) and what I learned reading Reaping the benefits of the Window functions in T-SQL.
I also wanted to show two ways to further simplify that aforementioned case statement.

SELECT --itemNumber, --, itemStart, itemLen, col1 = MAX(IIF(itemNumber = 0, substring(@string, itemStart, itemLen), NULL)), col2 = MAX(CASE itemnumber WHEN 1 THEN substring(@string, itemStart, itemLen) END), col3 = MAX(CASE itemnumber WHEN 2 THEN substring(@string, itemStart, itemLen) END), col4 = MAX(CASE itemnumber WHEN 3 THEN substring(@string, itemStart, itemLen) END), col5 = MAX(CASE itemnumber WHEN 4 THEN substring(@string, itemStart, itemLen) END), col6 = MAX(CASE itemnumber WHEN 5 THEN substring(@string, itemStart, itemLen) END), col7 = MAX(CASE itemnumber WHEN 6 THEN substring(@string, itemStart, itemLen) END), col8 = MAX(CASE itemnumber WHEN 7 THEN substring(@string, itemStart, itemLen) END), col9 = MAX(CASE itemnumber WHEN 8 THEN substring(@string, itemStart, itemLen) END) FROM ( SELECT itemNumber, itemStart = position+1, itemLen = LEAD(position,1,LEN(@string)+1) OVER (ORDER BY (position)) - (position+1) FROM ( SELECT itemNumber = CAST(0 as int), position = CAST(0 as int) UNION ALL SELECT ROW_NUMBER() OVER (ORDER BY position), position FROM dbo.ngrams2b(@string,1) WHERE token = '|' ) delim ) split;

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

vsamantha35 SSChampion Points: 11551 More actions · Answer 10

Hi All,

Thanks for the suggestions. Thanks a lot.
One thing, came into my mind, how can we make this dynamic?? Meaning, if the number of columns are not determined then how can reuse it for different string with different number of columns to be returned?

Thom A SSC Guru Points: 99137 More actions · Answer 11

vsamantha35 - Thursday, October 5, 2017 1:54 AM
Hi All,
Thanks for the suggestions. Thanks a lot.
One thing, came into my mind, how can we make this dynamic?? Meaning, if the number of columns are not determined then how can reuse it for different string with different number of columns to be returned?

I was "afraid" you were going to ask that (although I think most of us were expecting it). Will you only be dealing with one line at a time or many? If many, and one row has 10 delimiters (thus 11 columns) and the other 7 (thus 8), what should the last 3 columns in the latter show? NULLs, blank strings, something else?

If you are dealing with several rows at a time (especially if they have different number of columns), please could you provide some consumable sample data for us?

Thanks,.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

Jeff Moden SSC Guru Points: 1004564 More actions · Answer 12

Alan.B - Wednesday, October 4, 2017 11:37 PM
sgmunson - Wednesday, October 4, 2017 6:19 AM
Thom A - Wednesday, October 4, 2017 5:39 AM
One method, using Jeff's DelimitedSplit8K function:
WITH PT AS (     SELECT *     FROM t3    CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS) SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1,    MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2,    MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3,    MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4,    MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5,    MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6,    MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7,    MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8,    MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9 FROM PT;
This could be shortened to without the CTE.
FYI, Jeff's DelimitedSplit8K routine only handles varchar up to 8000 characters, and does NOT handle strings longer than that.   For most situations such as this, however, varchar(max) is overkill to begin with, so I usually advise against it's use.   I'd take the time to figure out how large each of the 9 possible values can be, and size the field to be able to exactly fit that many characters, including the pipe symbols, unless you have control over the database table, in which case I'd never let that string concatenation into the database in the first place, and would separate it on the way in using that splitter function.
+1000 To all of this.
3NF is the best approach. If that's not an option then the next best thing thing to do is drop down to at varchar(8000) or less.
As to faster ways to split that string here's three...
1. CROSS APPLY
You have much, much better performance doing a Cascading Cross Apply like so:
DECLARE @string varchar(max) = 'abc|203|12|jsjsjsj HILLS|test|abc XXXXXX o1123|9389||';
SELECT col1 = substring(v.s, 1, d1.p-2), col2 = substring(v.s, d1.p, d2.p-(d1.p+1)), col3 = substring(v.s, d2.p, d3.p-(d2.p+1)), col4 = substring(v.s, d3.p, d4.p-(d3.p+1)), col5 = substring(v.s, d4.p, d5.p-(d4.p+1)), col6 = substring(v.s, d5.p, d6.p-(d5.p+1)), col7 = substring(v.s, d6.p, d7.p-(d6.p+1)), col8 = substring(v.s, d7.p, d8.p-(d7.p+1)), col9 = substring(v.s, d8.p, v.l) FROM (VALUES (@string, datalength(@string))) v(s,l) CROSS APPLY (VALUES (charindex('|', v.s)+1))  d1(p) CROSS APPLY (VALUES (charindex('|', v.s, d1.p)+1)) d2(p) CROSS APPLY (VALUES (charindex('|', v.s, d2.p)+1)) d3(p) CROSS APPLY (VALUES (charindex('|', v.s, d3.p)+1)) d4(p) CROSS APPLY (VALUES (charindex('|', v.s, d4.p)+1)) d5(p) CROSS APPLY (VALUES (charindex('|', v.s, d5.p)+1)) d6(p) CROSS APPLY (VALUES (charindex('|', v.s, d6.p)+1)) d7(p) CROSS APPLY (VALUES (charindex('|', v.s, d7.p)+1)) d8(p) CROSS APPLY (VALUES (charindex('|', v.s, d8.p)+1)) d9(p);
2. Indexed View
The fastest way, by far is to split the string ahead of time via indexed view (note this thread.) This is sometimes an option If the data is pretty static as is the case in data warehouse environments.
3. NGrams2b and LEAD
For fun I put together how you would do this using NGrams2b (found here) and what I learned reading Reaping the benefits of the Window functions in T-SQL.
I also wanted to show two ways to further simplify that aforementioned case statement.
SELECT --itemNumber, --, itemStart, itemLen, col1 = MAX(IIF(itemNumber = 0, substring(@string, itemStart, itemLen), NULL)), col2 = MAX(CASE itemnumber WHEN 1 THEN substring(@string, itemStart, itemLen) END), col3 = MAX(CASE itemnumber WHEN 2 THEN substring(@string, itemStart, itemLen) END), col4 = MAX(CASE itemnumber WHEN 3 THEN substring(@string, itemStart, itemLen) END), col5 = MAX(CASE itemnumber WHEN 4 THEN substring(@string, itemStart, itemLen) END), col6 = MAX(CASE itemnumber WHEN 5 THEN substring(@string, itemStart, itemLen) END), col7 = MAX(CASE itemnumber WHEN 6 THEN substring(@string, itemStart, itemLen) END), col8 = MAX(CASE itemnumber WHEN 7 THEN substring(@string, itemStart, itemLen) END), col9 = MAX(CASE itemnumber WHEN 8 THEN substring(@string, itemStart, itemLen) END) FROM ( SELECT itemNumber, itemStart = position+1, itemLen = LEAD(position,1,LEN(@string)+1) OVER (ORDER BY (position)) - (position+1) FROM ( SELECT itemNumber = CAST(0 as int), position = CAST(0 as int) UNION ALL SELECT ROW_NUMBER() OVER (ORDER BY position), position FROM dbo.ngrams2b(@string,1) WHERE token = '|' ) delim ) split;

The one thing missing from all of this is performance testing. Although Chris posted some results in his fine article, he didn't post the source of data he used nor a test data generator.

As for the indexed view, it's fast because the answer is already materialized. It would be very interesting to compare the performance of inserts to a plain table to the performance of a table with such a materialized view. I've not yet tested any of it to be sure but my hunch is that you've moved any performance problems from SELECTs to INSERTs.

Also, if you read Chris' article carefully, he also states that cCA's end up being a performance problem when you use more than 4 or 5. Again, a test on the tipping point is missing.

There is no question about Eirikur 's method, though. I've tested it myself and not that I actually have severs at or above 2012, intend to modify our copies of DelimitedSplit8K to use his fine modification.

That, not withstanding, very nice post, Alan.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

tsql help

Cookies on SQLServerCentral