SQL to split a non delimited string into delimited string

  • Jeff Moden (2/7/2011)


    Wolfmeister (2/7/2011)


    That's pretty slick! Thanks guys.

    Just making sure... do you understand how ColdCoffee's method works?

    Heck I didn't understand what he was doing with the CTEs till I realized he was just making a virtual tally table with them. I saw Tally in the query and figured he had a tally table he was assuming. 🙂



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • mtassin (2/8/2011)


    Jeff Moden (2/7/2011)


    Wolfmeister (2/7/2011)


    That's pretty slick! Thanks guys.

    Just making sure... do you understand how ColdCoffee's method works?

    Heck I didn't understand what he was doing with the CTEs till I realized he was just making a virtual tally table with them. I saw Tally in the query and figured he had a tally table he was assuming. 🙂

    I was not sure if the OP knows about Tally tables, so i thought i will just make a On-The-Fly Tally table.. The Exec Plan will become much better if i were using a traditional persistent Tally Table.

  • Wolfmeister (2/8/2011)


    SwePeso (2/8/2011)


    DECLARE @Sample VARCHAR(100) = '12345'

    ;WITH cte(CurrPos, LastPos, String)

    AS (

    SELECT1 AS CurrPos,

    DATALENGTH(@Sample) AS LastPos,

    CAST(@Sample AS VARCHAR(MAX)) AS String

    UNION ALL

    SELECTCurrPos + 1 AS CurrPos,

    LastPos,

    STUFF(String, 2 * CurrPos, 0, ',') AS String

    FROMcte

    WHERECurrPos < LastPos

    )

    SELECTString

    FROMcte

    WHERECurrPos = LastPos

    Thanks, SwePeso, that's pretty neat as well:-)

    Oh... be careful, now. Recursion in the form of a Recursive CTE can be as bad as a loop. Let's see what happens.

    First, I've converted ColdCoffee's method and Peso's method to an iTVF (inline Table Value Function) each.

    CREATE FUNCTION dbo.ColdCoffee

    (@string VARCHAR(100))

    RETURNS TABLE AS

    RETURN

    WITH

    TENS (N) AS (SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL

    SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL

    SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0),

    THOUSANDS (N) AS (SELECT 1 FROM TENS t1 CROSS JOIN TENS t2 CROSS JOIN TENS t3),

    TALLY (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM THOUSANDS)

    select STUFF(

    (SELECT ','+SUBSTRING(@string,N,1)

    FROM TALLY T_Outer

    WHERE N <= LEN(@string)

    FOR XML PATH('') ),1 ,1 ,'') Concat_Values

    ;

    GO

    CREATE FUNCTION dbo.Peso

    (@Sample VARCHAR(100))

    RETURNS TABLE AS

    RETURN

    WITH cte(CurrPos, LastPos, String)

    AS (

    SELECT 1 AS CurrPos,

    DATALENGTH(@Sample) AS LastPos,

    CAST(@Sample AS VARCHAR(MAX)) AS String

    UNION ALL

    SELECT CurrPos + 1 AS CurrPos,

    LastPos,

    STUFF(String, 2 * CurrPos, 0, ',') AS String

    FROM cte

    WHERE CurrPos < LastPos

    )

    SELECT String

    FROM cte

    WHERE CurrPos = LastPos

    ;

    GO

    Of course, we'll need some test data. Since both functions can handle just about any string, I used all GUID's for the test with only 10,000 rows.

    SELECT TOP 10000

    SomeString = CAST(NEWID() AS VARCHAR(36))

    INTO dbo.JBMTest

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    To take the display processing time out of the picture, I simply assign the results to a "throw-away" variable in the following test code.

    PRINT '========== ColdCoffee =========='

    SET STATISTICS TIME ON;

    DECLARE @Bitbucket VARCHAR(100);

    SELECT @Bitbucket = ca.Concat_Values

    FROM dbo.JBMTest test

    CROSS APPLY dbo.ColdCoffee(test.SomeString) ca;

    SET STATISTICS TIME OFF;

    GO

    PRINT '========== Peso =========='

    SET STATISTICS TIME ON;

    DECLARE @Bitbucket VARCHAR(100)

    SELECT @Bitbucket = ca.String

    FROM dbo.JBMTest test

    CROSS APPLY dbo.Peso(test.SomeString) ca;

    SET STATISTICS TIME OFF;

    GO

    Here are the results from my 8 year old desktop box...

    ========== ColdCoffee ==========

    SQL Server Execution Times:

    CPU time = 844 ms, elapsed time = 899 ms.

    ========== Peso ==========

    SQL Server Execution Times:

    CPU time = 47469 ms, elapsed time = 57220 ms.

    Like I said, you may want to avoid the "Hidden RBAR" of Recursive CTE's.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Results from my machine (i am not sure of my desktop's age :w00t: , but i can tell its faster than urs :-P)

    ========== ColdCoffee ==========

    SQL Server Execution Times:

    CPU time = 390 ms, elapsed time = 398 ms.

    ========== Peso ==========

    SQL Server Execution Times:

    CPU time = 16255 ms, elapsed time = 16889 ms.

  • ColdCoffee (2/8/2011)


    mtassin (2/8/2011)


    Jeff Moden (2/7/2011)


    Wolfmeister (2/7/2011)


    That's pretty slick! Thanks guys.

    Just making sure... do you understand how ColdCoffee's method works?

    Heck I didn't understand what he was doing with the CTEs till I realized he was just making a virtual tally table with them. I saw Tally in the query and figured he had a tally table he was assuming. 🙂

    I was not sure if the OP knows about Tally tables, so i thought i will just make a On-The-Fly Tally table.. The Exec Plan will become much better if i were using a traditional persistent Tally Table.

    Careful here, there is a tipping point when accessing a tally table from disk. It all depends on how big the table needs to be to support its use. If you need an extremely large tally table (> 1,000,000 for example, could be larger), it is possible that a dynamic tally table may actually be better. The tipping point is system dependent, but it does exist. At a previous employer I did observe this during some testing. I wish I could find the output from those tests, but not sure if I sent the info home.

  • Lynn Pettis (2/8/2011)


    ColdCoffee (2/8/2011)


    mtassin (2/8/2011)


    Jeff Moden (2/7/2011)


    Wolfmeister (2/7/2011)


    That's pretty slick! Thanks guys.

    Just making sure... do you understand how ColdCoffee's method works?

    Heck I didn't understand what he was doing with the CTEs till I realized he was just making a virtual tally table with them. I saw Tally in the query and figured he had a tally table he was assuming. 🙂

    I was not sure if the OP knows about Tally tables, so i thought i will just make a On-The-Fly Tally table.. The Exec Plan will become much better if i were using a traditional persistent Tally Table.

    Careful here, there is a tipping point when accessing a tally table from disk. It all depends on how big the table needs to be to support its use. If you need an extremely large tally table (> 1,000,000 for example, could be larger), it is possible that a dynamic tally table may actually be better. The tipping point is system dependent, but it does exist. At a previous employer I did observe this during some testing. I wish I could find the output from those tests, but not sure if I sent the info home.

    Absolutely agree here, Lynn.. Even i have had a query which performed better with dynamic tally than the traditional tally. So again, usage of type of Tally table "depends" on the context 🙂

  • Thanks all for your inputs. Much appreciated 🙂

  • Lynn Pettis (2/8/2011)


    Careful here, there is a tipping point when accessing a tally table from disk. It all depends on how big the table needs to be to support its use. If you need an extremely large tally table (> 1,000,000 for example, could be larger), it is possible that a dynamic tally table may actually be better. The tipping point is system dependent, but it does exist. At a previous employer I did observe this during some testing. I wish I could find the output from those tests, but not sure if I sent the info home.

    Have you found that happens anywhere other than in a "splitter"?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @jeff - I haven't been able to find anything as of yet, so some of this has to be taken as personal experience. What this means is that people need to test in their own environments and keep this in mind should performance problems begin to occur in the future.

  • Lynn Pettis (2/9/2011)


    @Jeff - I haven't been able to find anything as of yet, so some of this has to be taken as personal experience. What this means is that people need to test in their own environments and keep this in mind should performance problems begin to occur in the future.

    Heh... that's how myths are sometimes started. I'd rather see code that proves it. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well, it always better to have more than one person testing.

    The testing I did used actual data at work, so I may not be able reproduce it right away. Plus, have some other things on my plate right now that are a bit more critical.

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply