October 4, 2017 at 5:25 am
Hi Experts,
Need tsql help.
I have a table row as follows. Basically, it will contain a huge string values delimited by a pipe '|' symbol.
requirement is to spilt single column value into multiple columns. i.e. in below rows , i need to see 9 different columns values are split based on '|' delimeter.
How to do that?
Test data
===========
create table t3
(c1 varchar(max)
)
insert into t3
select 'abc|203|12|jsjsjsj HILLS|test|abc XXXXXX o1123|9389||'
output should be
c1 c2 c3 c4 c5 c6 c7 c8 c9
abc 203 12 jsjsjsj HILLS test abc XXXXXX o1123 9389
Can anyone please help on this.
Thanks in advance.
Sam
October 4, 2017 at 5:39 am
One method, using Jeff's DelimitedSplit8K function:WITH PT AS (
SELECT *
FROM t3
CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS)
SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1,
MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2,
MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3,
MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4,
MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5,
MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6,
MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7,
MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8,
MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9
FROM PT;
This could be shortened to without the CTE.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
October 4, 2017 at 6:18 am
Thom A - Wednesday, October 4, 2017 5:39 AMOne method, using Jeff's DelimitedSplit8K function:WITH PT AS (
SELECT *
FROM t3
CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS)
SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1,
MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2,
MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3,
MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4,
MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5,
MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6,
MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7,
MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8,
MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9
FROM PT;This could be shortened to without the CTE.
Yes, this does the job. Note that the ELSE NULL part above is the default for CASE, thus the query may be shortened:WITH PT AS (
SELECT *
FROM t3
CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS)
SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item END) AS c1,
MAX(CASE WHEN ItemNumber = 2 THEN Item END) AS c2,
MAX(CASE WHEN ItemNumber = 3 THEN Item END) AS c3,
MAX(CASE WHEN ItemNumber = 4 THEN Item END) AS c4,
MAX(CASE WHEN ItemNumber = 5 THEN Item END) AS c5,
MAX(CASE WHEN ItemNumber = 6 THEN Item END) AS c6,
MAX(CASE WHEN ItemNumber = 7 THEN Item END) AS c7,
MAX(CASE WHEN ItemNumber = 8 THEN Item END) AS c8,
MAX(CASE WHEN ItemNumber = 9 THEN Item END) AS c9
FROM PT;
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 4, 2017 at 6:19 am
Thom A - Wednesday, October 4, 2017 5:39 AMOne method, using Jeff's DelimitedSplit8K function:WITH PT AS (
SELECT *
FROM t3
CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS)
SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1,
MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2,
MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3,
MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4,
MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5,
MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6,
MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7,
MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8,
MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9
FROM PT;This could be shortened to without the CTE.
FYI, Jeff's DelimitedSplit8K routine only handles varchar up to 8000 characters, and does NOT handle strings longer than that. For most situations such as this, however, varchar(max) is overkill to begin with, so I usually advise against it's use. I'd take the time to figure out how large each of the 9 possible values can be, and size the field to be able to exactly fit that many characters, including the pipe symbols, unless you have control over the database table, in which case I'd never let that string concatenation into the database in the first place, and would separate it on the way in using that splitter function.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 4, 2017 at 12:33 pm
If you want to process in bulk (expecting more rows and lot of columns) then you can explore BCP route as well, spit all the data as txt file (using BCP) and load it in the table using BCP. if your target is in different server it will surely benefit in terms of performance. you can test it in your scenario
October 4, 2017 at 12:55 pm
Avi1 - Wednesday, October 4, 2017 12:33 PMIf you want to process in bulk (expecting more rows and lot of columns) then you can explore BCP route as well, spit all the data as txt file (using BCP) and load it in the table using BCP. if your target is in different server it will surely benefit in terms of performance. you can test it in your scenario
This is a rather bold statement, which I suggest you need to back up with some evidence of your testing of the different scenarios.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 4, 2017 at 1:09 pm
We implemented it few years back in 2008 R2, We had one stage table which was loaded by external process one record in the table was actually one delimited record 30+ columns, everytop of hour 200k records. these rows have some text columns. Stage table was in a different server, we used BCP to generate the file in stage server and pushed data from there using BCP. we tried few other options but this worked well. I don't have the numbers, but that worked well there.
October 4, 2017 at 1:17 pm
Avi1 - Wednesday, October 4, 2017 1:09 PMWe implemented it few years back in 2008 R2, We had one stage table which was loaded by external process one record in the table was actually one delimited record 30+ columns, everytop of hour 200k records. these rows have some text columns. Stage table was in a different server, we used BCP to generate the file in stage server and pushed data from there using BCP. we tried few other options but this worked well. I don't have the numbers, but that worked well there.
I'm sure it did. But, as you've probably already realised, if you use absolutes on this forum, you will generally be asked for proof ... hence my post 🙂
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 4, 2017 at 4:19 pm
Avi1 - Wednesday, October 4, 2017 1:09 PMWe implemented it few years back in 2008 R2, We had one stage table which was loaded by external process one record in the table was actually one delimited record 30+ columns, everytop of hour 200k records. these rows have some text columns. Stage table was in a different server, we used BCP to generate the file in stage server and pushed data from there using BCP. we tried few other options but this worked well. I don't have the numbers, but that worked well there.
Perhapsyyou could provide a sample command that might display these benefits? I have several areas where I have to split out delimited strings before moving onto the next task at hand. If you are proposing there is a quicker way, then you have my attention.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
October 4, 2017 at 11:37 pm
sgmunson - Wednesday, October 4, 2017 6:19 AMThom A - Wednesday, October 4, 2017 5:39 AMOne method, using Jeff's DelimitedSplit8K function:WITH PT AS (
SELECT *
FROM t3
CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS)
SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1,
MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2,
MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3,
MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4,
MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5,
MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6,
MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7,
MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8,
MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9
FROM PT;This could be shortened to without the CTE.
FYI, Jeff's DelimitedSplit8K routine only handles varchar up to 8000 characters, and does NOT handle strings longer than that. For most situations such as this, however, varchar(max) is overkill to begin with, so I usually advise against it's use. I'd take the time to figure out how large each of the 9 possible values can be, and size the field to be able to exactly fit that many characters, including the pipe symbols, unless you have control over the database table, in which case I'd never let that string concatenation into the database in the first place, and would separate it on the way in using that splitter function.
+1000 To all of this.
3NF is the best approach. If that's not an option then the next best thing thing to do is drop down to at varchar(8000) or less.
As to faster ways to split that string here's three...
1. CROSS APPLY
You have much, much better performance doing a Cascading Cross Apply like so:DECLARE @string varchar(max) = 'abc|203|12|jsjsjsj HILLS|test|abc XXXXXX o1123|9389||';
SELECT
col1 = substring(v.s, 1, d1.p-2),
col2 = substring(v.s, d1.p, d2.p-(d1.p+1)),
col3 = substring(v.s, d2.p, d3.p-(d2.p+1)),
col4 = substring(v.s, d3.p, d4.p-(d3.p+1)),
col5 = substring(v.s, d4.p, d5.p-(d4.p+1)),
col6 = substring(v.s, d5.p, d6.p-(d5.p+1)),
col7 = substring(v.s, d6.p, d7.p-(d6.p+1)),
col8 = substring(v.s, d7.p, d8.p-(d7.p+1)),
col9 = substring(v.s, d8.p, v.l)
FROM (VALUES (@string, datalength(@string))) v(s,l)
CROSS APPLY (VALUES (charindex('|', v.s)+1)) d1(p)
CROSS APPLY (VALUES (charindex('|', v.s, d1.p)+1)) d2(p)
CROSS APPLY (VALUES (charindex('|', v.s, d2.p)+1)) d3(p)
CROSS APPLY (VALUES (charindex('|', v.s, d3.p)+1)) d4(p)
CROSS APPLY (VALUES (charindex('|', v.s, d4.p)+1)) d5(p)
CROSS APPLY (VALUES (charindex('|', v.s, d5.p)+1)) d6(p)
CROSS APPLY (VALUES (charindex('|', v.s, d6.p)+1)) d7(p)
CROSS APPLY (VALUES (charindex('|', v.s, d7.p)+1)) d8(p)
CROSS APPLY (VALUES (charindex('|', v.s, d8.p)+1)) d9(p);
2. Indexed View
The fastest way, by far is to split the string ahead of time via indexed view (note this thread.) This is sometimes an option If the data is pretty static as is the case in data warehouse environments.
3. NGrams2b and LEAD
For fun I put together how you would do this using NGrams2b (found here) and what I learned reading Reaping the benefits of the Window functions in T-SQL.
I also wanted to show two ways to further simplify that aforementioned case statement.
SELECT --itemNumber, --, itemStart, itemLen,
col1 = MAX(IIF(itemNumber = 0, substring(@string, itemStart, itemLen), NULL)),
col2 = MAX(CASE itemnumber WHEN 1 THEN substring(@string, itemStart, itemLen) END),
col3 = MAX(CASE itemnumber WHEN 2 THEN substring(@string, itemStart, itemLen) END),
col4 = MAX(CASE itemnumber WHEN 3 THEN substring(@string, itemStart, itemLen) END),
col5 = MAX(CASE itemnumber WHEN 4 THEN substring(@string, itemStart, itemLen) END),
col6 = MAX(CASE itemnumber WHEN 5 THEN substring(@string, itemStart, itemLen) END),
col7 = MAX(CASE itemnumber WHEN 6 THEN substring(@string, itemStart, itemLen) END),
col8 = MAX(CASE itemnumber WHEN 7 THEN substring(@string, itemStart, itemLen) END),
col9 = MAX(CASE itemnumber WHEN 8 THEN substring(@string, itemStart, itemLen) END)
FROM
(
SELECT
itemNumber,
itemStart = position+1,
itemLen = LEAD(position,1,LEN(@string)+1) OVER (ORDER BY (position)) - (position+1)
FROM
(
SELECT itemNumber = CAST(0 as int), position = CAST(0 as int) UNION ALL
SELECT ROW_NUMBER() OVER (ORDER BY position), position
FROM dbo.ngrams2b(@string,1)
WHERE token = '|'
) delim
) split;
-- Itzik Ben-Gan 2001
October 5, 2017 at 1:54 am
Hi All,
Thanks for the suggestions. Thanks a lot.
One thing, came into my mind, how can we make this dynamic?? Meaning, if the number of columns are not determined then how can reuse it for different string with different number of columns to be returned?
October 5, 2017 at 2:36 am
vsamantha35 - Thursday, October 5, 2017 1:54 AMHi All,Thanks for the suggestions. Thanks a lot.
One thing, came into my mind, how can we make this dynamic?? Meaning, if the number of columns are not determined then how can reuse it for different string with different number of columns to be returned?
I was "afraid" you were going to ask that (although I think most of us were expecting it). Will you only be dealing with one line at a time or many? If many, and one row has 10 delimiters (thus 11 columns) and the other 7 (thus 8), what should the last 3 columns in the latter show? NULLs, blank strings, something else?
If you are dealing with several rows at a time (especially if they have different number of columns), please could you provide some consumable sample data for us?
Thanks,.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
October 22, 2017 at 9:30 pm
Alan.B - Wednesday, October 4, 2017 11:37 PMsgmunson - Wednesday, October 4, 2017 6:19 AMThom A - Wednesday, October 4, 2017 5:39 AMOne method, using Jeff's DelimitedSplit8K function:WITH PT AS (
SELECT *
FROM t3
CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS)
SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1,
MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2,
MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3,
MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4,
MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5,
MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6,
MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7,
MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8,
MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9
FROM PT;This could be shortened to without the CTE.
FYI, Jeff's DelimitedSplit8K routine only handles varchar up to 8000 characters, and does NOT handle strings longer than that. For most situations such as this, however, varchar(max) is overkill to begin with, so I usually advise against it's use. I'd take the time to figure out how large each of the 9 possible values can be, and size the field to be able to exactly fit that many characters, including the pipe symbols, unless you have control over the database table, in which case I'd never let that string concatenation into the database in the first place, and would separate it on the way in using that splitter function.
+1000 To all of this.
3NF is the best approach. If that's not an option then the next best thing thing to do is drop down to at varchar(8000) or less.
As to faster ways to split that string here's three...
1. CROSS APPLY
You have much, much better performance doing a Cascading Cross Apply like so:DECLARE @string varchar(max) = 'abc|203|12|jsjsjsj HILLS|test|abc XXXXXX o1123|9389||';
SELECT
col1 = substring(v.s, 1, d1.p-2),
col2 = substring(v.s, d1.p, d2.p-(d1.p+1)),
col3 = substring(v.s, d2.p, d3.p-(d2.p+1)),
col4 = substring(v.s, d3.p, d4.p-(d3.p+1)),
col5 = substring(v.s, d4.p, d5.p-(d4.p+1)),
col6 = substring(v.s, d5.p, d6.p-(d5.p+1)),
col7 = substring(v.s, d6.p, d7.p-(d6.p+1)),
col8 = substring(v.s, d7.p, d8.p-(d7.p+1)),
col9 = substring(v.s, d8.p, v.l)
FROM (VALUES (@string, datalength(@string))) v(s,l)
CROSS APPLY (VALUES (charindex('|', v.s)+1)) d1(p)
CROSS APPLY (VALUES (charindex('|', v.s, d1.p)+1)) d2(p)
CROSS APPLY (VALUES (charindex('|', v.s, d2.p)+1)) d3(p)
CROSS APPLY (VALUES (charindex('|', v.s, d3.p)+1)) d4(p)
CROSS APPLY (VALUES (charindex('|', v.s, d4.p)+1)) d5(p)
CROSS APPLY (VALUES (charindex('|', v.s, d5.p)+1)) d6(p)
CROSS APPLY (VALUES (charindex('|', v.s, d6.p)+1)) d7(p)
CROSS APPLY (VALUES (charindex('|', v.s, d7.p)+1)) d8(p)
CROSS APPLY (VALUES (charindex('|', v.s, d8.p)+1)) d9(p);2. Indexed View
The fastest way, by far is to split the string ahead of time via indexed view (note this thread.) This is sometimes an option If the data is pretty static as is the case in data warehouse environments.3. NGrams2b and LEAD
For fun I put together how you would do this using NGrams2b (found here) and what I learned reading Reaping the benefits of the Window functions in T-SQL.
I also wanted to show two ways to further simplify that aforementioned case statement.
SELECT --itemNumber, --, itemStart, itemLen,
col1 = MAX(IIF(itemNumber = 0, substring(@string, itemStart, itemLen), NULL)),
col2 = MAX(CASE itemnumber WHEN 1 THEN substring(@string, itemStart, itemLen) END),
col3 = MAX(CASE itemnumber WHEN 2 THEN substring(@string, itemStart, itemLen) END),
col4 = MAX(CASE itemnumber WHEN 3 THEN substring(@string, itemStart, itemLen) END),
col5 = MAX(CASE itemnumber WHEN 4 THEN substring(@string, itemStart, itemLen) END),
col6 = MAX(CASE itemnumber WHEN 5 THEN substring(@string, itemStart, itemLen) END),
col7 = MAX(CASE itemnumber WHEN 6 THEN substring(@string, itemStart, itemLen) END),
col8 = MAX(CASE itemnumber WHEN 7 THEN substring(@string, itemStart, itemLen) END),
col9 = MAX(CASE itemnumber WHEN 8 THEN substring(@string, itemStart, itemLen) END)
FROM
(
SELECT
itemNumber,
itemStart = position+1,
itemLen = LEAD(position,1,LEN(@string)+1) OVER (ORDER BY (position)) - (position+1)
FROM
(
SELECT itemNumber = CAST(0 as int), position = CAST(0 as int) UNION ALL
SELECT ROW_NUMBER() OVER (ORDER BY position), position
FROM dbo.ngrams2b(@string,1)
WHERE token = '|'
) delim
) split;
The one thing missing from all of this is performance testing. Although Chris posted some results in his fine article, he didn't post the source of data he used nor a test data generator.
As for the indexed view, it's fast because the answer is already materialized. It would be very interesting to compare the performance of inserts to a plain table to the performance of a table with such a materialized view. I've not yet tested any of it to be sure but my hunch is that you've moved any performance problems from SELECTs to INSERTs.
Also, if you read Chris' article carefully, he also states that cCA's end up being a performance problem when you use more than 4 or 5. Again, a test on the tipping point is missing.
There is no question about Eirikur 's method, though. I've tested it myself and not that I actually have severs at or above 2012, intend to modify our copies of DelimitedSplit8K to use his fine modification.
That, not withstanding, very nice post, Alan.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply