tsql help

  • Hi Experts,

    Need tsql help.
    I have a table row as follows. Basically, it will contain a huge string values delimited by a pipe '|' symbol.
    requirement is to spilt single column value into multiple columns. i.e. in below rows , i need to see 9 different columns values are split based on '|' delimeter.

    How to do that?

    Test data
    ===========
    create table t3
    (c1 varchar(max)
    )

    insert into t3
    select 'abc|203|12|jsjsjsj HILLS|test|abc XXXXXX o1123|9389||'

    output should be
    c1  c2  c3 c4     c5  c6      c7  c8 c9
    abc 203 12 jsjsjsj HILLS test abc XXXXXX o1123 9389 

    Can anyone please help on this.

    Thanks in advance.

    Sam

  • One method, using Jeff's DelimitedSplit8K function:
    WITH PT AS (
        SELECT *
        FROM t3
             CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS)
    SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1,
           MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2,
           MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3,
           MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4,
           MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5,
           MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6,
           MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7,
           MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8,
           MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9
    FROM PT;

    This could be shortened to without the CTE.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Wednesday, October 4, 2017 5:39 AM

    One method, using Jeff's DelimitedSplit8K function:
    WITH PT AS (
        SELECT *
        FROM t3
             CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS)
    SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1,
           MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2,
           MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3,
           MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4,
           MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5,
           MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6,
           MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7,
           MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8,
           MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9
    FROM PT;

    This could be shortened to without the CTE.

    Yes, this does the job. Note that the ELSE NULL part above is the default for CASE, thus the query may be shortened:
    WITH PT AS (
      SELECT *
      FROM t3
       CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS)
    SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item END) AS c1,
       MAX(CASE WHEN ItemNumber = 2 THEN Item END) AS c2,
       MAX(CASE WHEN ItemNumber = 3 THEN Item END) AS c3,
       MAX(CASE WHEN ItemNumber = 4 THEN Item END) AS c4,
       MAX(CASE WHEN ItemNumber = 5 THEN Item END) AS c5,
       MAX(CASE WHEN ItemNumber = 6 THEN Item END) AS c6,
       MAX(CASE WHEN ItemNumber = 7 THEN Item END) AS c7,
       MAX(CASE WHEN ItemNumber = 8 THEN Item END) AS c8,
       MAX(CASE WHEN ItemNumber = 9 THEN Item END) AS c9
    FROM PT;

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thom A - Wednesday, October 4, 2017 5:39 AM

    One method, using Jeff's DelimitedSplit8K function:
    WITH PT AS (
        SELECT *
        FROM t3
             CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS)
    SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1,
           MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2,
           MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3,
           MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4,
           MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5,
           MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6,
           MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7,
           MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8,
           MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9
    FROM PT;

    This could be shortened to without the CTE.

    FYI, Jeff's DelimitedSplit8K routine only handles varchar up to 8000 characters, and does NOT handle strings longer than that.   For most situations such as this, however, varchar(max) is overkill to begin with, so I usually advise against it's use.   I'd take the time to figure out how large each of the 9 possible values can be, and size the field to be able to exactly fit that many characters, including the pipe symbols, unless you have control over the database table, in which case I'd never let that string concatenation into the database in the first place, and would separate it on the way in using that splitter function.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • If you want to process in bulk (expecting more rows and lot of columns) then you can explore BCP route as well, spit all the data as txt file (using BCP) and load it in the table using BCP. if your target is in different server it will surely benefit in terms of performance. you can test it in your scenario

  • Avi1 - Wednesday, October 4, 2017 12:33 PM

    If you want to process in bulk (expecting more rows and lot of columns) then you can explore BCP route as well, spit all the data as txt file (using BCP) and load it in the table using BCP. if your target is in different server it will surely benefit in terms of performance. you can test it in your scenario

    This is a rather bold statement, which I suggest you need to back up with some evidence of your testing of the different scenarios.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • We implemented it few years back in 2008 R2, We had one stage table which was loaded by external process one record in the table was actually one delimited record 30+ columns, everytop of hour 200k records. these rows have some text columns. Stage table was in a different server, we used BCP to generate the file in stage server and pushed data from there using BCP. we tried few other options but this worked well. I don't have the numbers, but that worked well there.

  • Avi1 - Wednesday, October 4, 2017 1:09 PM

    We implemented it few years back in 2008 R2, We had one stage table which was loaded by external process one record in the table was actually one delimited record 30+ columns, everytop of hour 200k records. these rows have some text columns. Stage table was in a different server, we used BCP to generate the file in stage server and pushed data from there using BCP. we tried few other options but this worked well. I don't have the numbers, but that worked well there.

    I'm sure it did. But, as you've probably already realised, if you use absolutes on this forum, you will generally be asked for proof ... hence my post 🙂

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Avi1 - Wednesday, October 4, 2017 1:09 PM

    We implemented it few years back in 2008 R2, We had one stage table which was loaded by external process one record in the table was actually one delimited record 30+ columns, everytop of hour 200k records. these rows have some text columns. Stage table was in a different server, we used BCP to generate the file in stage server and pushed data from there using BCP. we tried few other options but this worked well. I don't have the numbers, but that worked well there.

    Perhapsyyou could provide a sample command that might display these benefits? I have several areas where I have to split out delimited strings before moving onto the next task at hand. If you are proposing there is a quicker way, then you have my attention.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • sgmunson - Wednesday, October 4, 2017 6:19 AM

    Thom A - Wednesday, October 4, 2017 5:39 AM

    One method, using Jeff's DelimitedSplit8K function:
    WITH PT AS (
        SELECT *
        FROM t3
             CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS)
    SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1,
           MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2,
           MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3,
           MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4,
           MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5,
           MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6,
           MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7,
           MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8,
           MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9
    FROM PT;

    This could be shortened to without the CTE.

    FYI, Jeff's DelimitedSplit8K routine only handles varchar up to 8000 characters, and does NOT handle strings longer than that.   For most situations such as this, however, varchar(max) is overkill to begin with, so I usually advise against it's use.   I'd take the time to figure out how large each of the 9 possible values can be, and size the field to be able to exactly fit that many characters, including the pipe symbols, unless you have control over the database table, in which case I'd never let that string concatenation into the database in the first place, and would separate it on the way in using that splitter function.

    +1000 To all of this. 

    3NF is the best approach. If that's not an option then the next best thing thing to do is drop down to at varchar(8000) or less.

    As to faster ways to split that string here's three... 

    1. CROSS APPLY
    You have much, much better performance doing a Cascading Cross Apply like so:
    DECLARE @string varchar(max) = 'abc|203|12|jsjsjsj HILLS|test|abc XXXXXX o1123|9389||';

    SELECT
    col1 = substring(v.s, 1, d1.p-2),
    col2 = substring(v.s, d1.p, d2.p-(d1.p+1)),
    col3 = substring(v.s, d2.p, d3.p-(d2.p+1)),
    col4 = substring(v.s, d3.p, d4.p-(d3.p+1)),
    col5 = substring(v.s, d4.p, d5.p-(d4.p+1)),
    col6 = substring(v.s, d5.p, d6.p-(d5.p+1)),
    col7 = substring(v.s, d6.p, d7.p-(d6.p+1)),
    col8 = substring(v.s, d7.p, d8.p-(d7.p+1)),
    col9 = substring(v.s, d8.p, v.l)
    FROM (VALUES (@string, datalength(@string))) v(s,l)
    CROSS APPLY (VALUES (charindex('|', v.s)+1))  d1(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d1.p)+1)) d2(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d2.p)+1)) d3(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d3.p)+1)) d4(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d4.p)+1)) d5(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d5.p)+1)) d6(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d6.p)+1)) d7(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d7.p)+1)) d8(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d8.p)+1)) d9(p);

    2. Indexed View
    The fastest way, by far is to split the string ahead of time via indexed view (note this thread.)  This is sometimes an option If the data is pretty static as is the case in data warehouse environments. 

    3. NGrams2b and LEAD
    For fun I put together how you would do this using NGrams2b (found here) and what I learned reading Reaping the benefits of the Window functions in T-SQL.
    I also wanted to show two ways to further simplify that aforementioned case statement. 

    SELECT --itemNumber, --, itemStart, itemLen,
    col1 = MAX(IIF(itemNumber = 0, substring(@string, itemStart, itemLen), NULL)),
    col2 = MAX(CASE itemnumber WHEN 1 THEN substring(@string, itemStart, itemLen) END),
    col3 = MAX(CASE itemnumber WHEN 2 THEN substring(@string, itemStart, itemLen) END),
    col4 = MAX(CASE itemnumber WHEN 3 THEN substring(@string, itemStart, itemLen) END),
    col5 = MAX(CASE itemnumber WHEN 4 THEN substring(@string, itemStart, itemLen) END),
    col6 = MAX(CASE itemnumber WHEN 5 THEN substring(@string, itemStart, itemLen) END),
    col7 = MAX(CASE itemnumber WHEN 6 THEN substring(@string, itemStart, itemLen) END),
    col8 = MAX(CASE itemnumber WHEN 7 THEN substring(@string, itemStart, itemLen) END),
    col9 = MAX(CASE itemnumber WHEN 8 THEN substring(@string, itemStart, itemLen) END)
    FROM
    (
    SELECT
      itemNumber,
      itemStart = position+1,
      itemLen = LEAD(position,1,LEN(@string)+1) OVER (ORDER BY (position)) - (position+1)
    FROM
    (
      SELECT itemNumber = CAST(0 as int), position = CAST(0 as int) UNION ALL
      SELECT ROW_NUMBER() OVER (ORDER BY position), position
      FROM dbo.ngrams2b(@string,1)
      WHERE token = '|'
    ) delim
    ) split;

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Hi All,

    Thanks for the suggestions. Thanks a lot.
    One thing, came into my mind, how can we make this dynamic?? Meaning, if the number of columns are not determined then how can reuse it for different string with different number of columns to be returned?

  • vsamantha35 - Thursday, October 5, 2017 1:54 AM

    Hi All,

    Thanks for the suggestions. Thanks a lot.
    One thing, came into my mind, how can we make this dynamic?? Meaning, if the number of columns are not determined then how can reuse it for different string with different number of columns to be returned?

    I was "afraid" you were going to ask that (although I think most of us were expecting it). Will you only be dealing with one line at a time or many? If many, and one row has 10 delimiters (thus 11 columns) and the other 7 (thus 8), what should the last 3 columns in the latter show? NULLs, blank strings, something else?

    If you are dealing with several rows at a time (especially if they have different number of columns), please could you provide some consumable sample data for us?

    Thanks,.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Alan.B - Wednesday, October 4, 2017 11:37 PM

    sgmunson - Wednesday, October 4, 2017 6:19 AM

    Thom A - Wednesday, October 4, 2017 5:39 AM

    One method, using Jeff's DelimitedSplit8K function:
    WITH PT AS (
        SELECT *
        FROM t3
             CROSS APPLY dbo.DelimitedSplit8K (t3.c1,'|') DS)
    SELECT MAX(CASE WHEN ItemNumber = 1 THEN Item ELSE NULL END) AS c1,
           MAX(CASE WHEN ItemNumber = 2 THEN Item ELSE NULL END) AS c2,
           MAX(CASE WHEN ItemNumber = 3 THEN Item ELSE NULL END) AS c3,
           MAX(CASE WHEN ItemNumber = 4 THEN Item ELSE NULL END) AS c4,
           MAX(CASE WHEN ItemNumber = 5 THEN Item ELSE NULL END) AS c5,
           MAX(CASE WHEN ItemNumber = 6 THEN Item ELSE NULL END) AS c6,
           MAX(CASE WHEN ItemNumber = 7 THEN Item ELSE NULL END) AS c7,
           MAX(CASE WHEN ItemNumber = 8 THEN Item ELSE NULL END) AS c8,
           MAX(CASE WHEN ItemNumber = 9 THEN Item ELSE NULL END) AS c9
    FROM PT;

    This could be shortened to without the CTE.

    FYI, Jeff's DelimitedSplit8K routine only handles varchar up to 8000 characters, and does NOT handle strings longer than that.   For most situations such as this, however, varchar(max) is overkill to begin with, so I usually advise against it's use.   I'd take the time to figure out how large each of the 9 possible values can be, and size the field to be able to exactly fit that many characters, including the pipe symbols, unless you have control over the database table, in which case I'd never let that string concatenation into the database in the first place, and would separate it on the way in using that splitter function.

    +1000 To all of this. 

    3NF is the best approach. If that's not an option then the next best thing thing to do is drop down to at varchar(8000) or less.

    As to faster ways to split that string here's three... 

    1. CROSS APPLY
    You have much, much better performance doing a Cascading Cross Apply like so:
    DECLARE @string varchar(max) = 'abc|203|12|jsjsjsj HILLS|test|abc XXXXXX o1123|9389||';

    SELECT
    col1 = substring(v.s, 1, d1.p-2),
    col2 = substring(v.s, d1.p, d2.p-(d1.p+1)),
    col3 = substring(v.s, d2.p, d3.p-(d2.p+1)),
    col4 = substring(v.s, d3.p, d4.p-(d3.p+1)),
    col5 = substring(v.s, d4.p, d5.p-(d4.p+1)),
    col6 = substring(v.s, d5.p, d6.p-(d5.p+1)),
    col7 = substring(v.s, d6.p, d7.p-(d6.p+1)),
    col8 = substring(v.s, d7.p, d8.p-(d7.p+1)),
    col9 = substring(v.s, d8.p, v.l)
    FROM (VALUES (@string, datalength(@string))) v(s,l)
    CROSS APPLY (VALUES (charindex('|', v.s)+1))  d1(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d1.p)+1)) d2(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d2.p)+1)) d3(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d3.p)+1)) d4(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d4.p)+1)) d5(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d5.p)+1)) d6(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d6.p)+1)) d7(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d7.p)+1)) d8(p)
    CROSS APPLY (VALUES (charindex('|', v.s, d8.p)+1)) d9(p);

    2. Indexed View
    The fastest way, by far is to split the string ahead of time via indexed view (note this thread.)  This is sometimes an option If the data is pretty static as is the case in data warehouse environments. 

    3. NGrams2b and LEAD
    For fun I put together how you would do this using NGrams2b (found here) and what I learned reading Reaping the benefits of the Window functions in T-SQL.
    I also wanted to show two ways to further simplify that aforementioned case statement. 

    SELECT --itemNumber, --, itemStart, itemLen,
    col1 = MAX(IIF(itemNumber = 0, substring(@string, itemStart, itemLen), NULL)),
    col2 = MAX(CASE itemnumber WHEN 1 THEN substring(@string, itemStart, itemLen) END),
    col3 = MAX(CASE itemnumber WHEN 2 THEN substring(@string, itemStart, itemLen) END),
    col4 = MAX(CASE itemnumber WHEN 3 THEN substring(@string, itemStart, itemLen) END),
    col5 = MAX(CASE itemnumber WHEN 4 THEN substring(@string, itemStart, itemLen) END),
    col6 = MAX(CASE itemnumber WHEN 5 THEN substring(@string, itemStart, itemLen) END),
    col7 = MAX(CASE itemnumber WHEN 6 THEN substring(@string, itemStart, itemLen) END),
    col8 = MAX(CASE itemnumber WHEN 7 THEN substring(@string, itemStart, itemLen) END),
    col9 = MAX(CASE itemnumber WHEN 8 THEN substring(@string, itemStart, itemLen) END)
    FROM
    (
    SELECT
      itemNumber,
      itemStart = position+1,
      itemLen = LEAD(position,1,LEN(@string)+1) OVER (ORDER BY (position)) - (position+1)
    FROM
    (
      SELECT itemNumber = CAST(0 as int), position = CAST(0 as int) UNION ALL
      SELECT ROW_NUMBER() OVER (ORDER BY position), position
      FROM dbo.ngrams2b(@string,1)
      WHERE token = '|'
    ) delim
    ) split;

    The one thing missing from all of this is performance testing.  Although Chris posted some results in his fine article, he didn't post the source of data he used nor a test data generator.

    As for the indexed view, it's fast because the answer is already materialized.  It would be very interesting to compare the performance of inserts to a plain table to the performance of a table with such a materialized view.  I've not yet tested any of it to be sure but my hunch is that you've moved any performance problems from SELECTs to INSERTs.

    Also, if you read Chris' article carefully, he also states that cCA's end up being a performance problem when you use more than 4 or 5.  Again, a test on the tipping point is missing.

    There is no question about Eirikur 's method, though.  I've tested it myself and not that I actually have severs at or above 2012, intend to modify our copies of DelimitedSplit8K to use his fine modification.

    That, not withstanding, very nice post, Alan.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply