Splitting a String in SQL - Help Please!

  • Chirag (9/16/2008)


    What happened to op?

    I think we lost him a long time ago. He never did answer if it was for a single parameter or for a whole column. James A. Lawrence posted a solution (above) that could work either way. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sergiy (9/16/2008)


    But I know how it should be...

    If any time you come up with an idea how to split quoted string usint Tally table - you know who's gonna appreciate it the most. 😉

    It was like a good book... I couldn't put it down. 😛 I tested it on both of your strings but nothing else yet. Your test strings are pretty comprehensive. I also haven't tested it for performance... but we know how it should be. 😉 Lemme know if it hurls and I'll take another crack at it.

    CREATE FUNCTION dbo.TrueCsvSplit(@String VARCHAR(7998))

    /**********************************************************************************************************************

    Purpose:

    This function prepares a "TRUE CSV" string for splitting by correctly finding and replacing all possible valid

    delimiters with the "Thorn" or "þ" character. It also adds the Thorn to the very biginning and end of the converted

    string. Further, any spaces wrapped up between delimiters are eliminated.

    Revision History:

    Rev 00 - 17 Sep 2008 - Jeff Moden - Initial creation for SQLServerCentral.com

    **********************************************************************************************************************/

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    --===== Define and preset the local variables. Most are self explanatory. The "QuoteFlop" variables keep track if a

    -- given quote is "labeled" as a 1 or a 0. (1st quote is a 1, 2nd is a 0, 3rd is a 1, etc, etc)

    DECLARE @ReturnString VARCHAR(8000),

    @PrevChar VARCHAR(1), --MUST be VarChar or unwanted spaces appear

    @CurChar VARCHAR(1), --MUST be VarChar or unwanted spaces appear

    @PrevQuoteFlop TINYINT,

    @CurQuoteFlop TINYINT,

    @Keep TINYINT

    SELECT @PrevChar = ' ',

    @PrevQuoteFlop = 0

    --===== Find and conditionally replace all valid "true" CSV delimiters with the Thorn character using a set based

    -- loop formed by the Tally table. This is to make the actual splitting of a "true" CSV easier and will beat

    -- the tar out of the equivelent Cursor or While Loop.

    SELECT @CurChar = SUBSTRING(@String,t.N,1),

    @CurQuoteFlop = CASE WHEN @CurChar = '"' THEN ABS(@PrevQuoteFlop-1) ELSE @PrevQuoteFlop END,

    @Keep = CASE WHEN (@PrevQuoteFlop = 0 AND (@CurChar = ' ' OR @PrevChar = ' ')) THEN 0

    WHEN (@CurQuoteFlop = 0 AND @PrevQuoteFlop = 1 ) THEN 0

    WHEN (@PrevChar <> '"' AND @PrevQuoteFlop = 0 AND @CurChar = '"') THEN 0

    ELSE 1

    END,

    @PrevChar = @CurChar,

    @CurChar = CASE WHEN @PrevQuoteFlop = 0 AND @CurChar = ',' THEN 'þ' ELSE @CurChar END,

    @PrevQuoteFlop = @CurQuoteFlop,

    @ReturnString = COALESCE(@ReturnString,'') + CASE WHEN @Keep = 1 THEN @CurChar ELSE '' END

    FROM dbo.Tally t

    WHERE t.N <= LEN(@String)

    --===== Encapsulate the return string in Thorns and exit

    RETURN 'þ'+@ReturnString+'þ'

    END

    Here's the "test code"...

    CREATE TABLE #MyHead (RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, AString VARCHAR(7998))

    INSERT INTO #MyHead (Astring)

    SELECT '"Pick-up",,"",1234,"2008-08-24", "2008-08-24","17"", unknown brand"' UNION ALL

    SELECT '"""Next Day"" delivery",5.00,"9,5"" box",10,"2008-09-01", "2008-09-11","9"" nails"'

    SELECT RowNum,

    AString AS [Before],

    dbo.TrueCsvSplit(AString) AS [After]

    FROM #MyHead

    ... and, just for those who only watch, here's the results...

    RowNum Before After

    ----------- ------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------

    1 "Pick-up",,"",1234,"2008-08-24", "2008-08-24","17"", unknown brand" þPick-upþþþ1234þ2008-08-24þ2008-08-24þ17", unknown brandþ

    2 """Next Day"" delivery",5.00,"9,5"" box",10,"2008-09-01", "2008-09-11","9"" nails" þ"Next Day" deliveryþ5.00þ9,5" boxþ10þ2008-09-01þ2008-09-11þ9" nailsþ

    (2 row(s) affected)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well, I see what you've done, and I'm gonna try it.

    I cannot say I really love the idea with "thorn" character because there is no guarantee some other developer on that side of the pipeline will use the same character as a replacement for CR/LF in Description fields.

    And I'm probably not gonna be there at the time. And process having my signature on it will fail.

    Or worse, I'm gonna be there and gonna get a call at 3am.

    Secondly, the point is to insert everything into such table:

    CREATE TABLE #Parse_Data (

    RowID int ,

    ColID int NOT NULL,

    ColValue nvarchar(400) COLLATE SQL_Latin1_General_CP1_CI_AS,

    PRIMARY KEY (RowID, ColID)

    )

    Unfortunately, the solution you posted does not resolve RBAR problem.

    It makes separate call to Tally for every row.

    But thank you anyway.

    When I have more time I'm gonna try to get anything from it.

    _____________
    Code for TallyGenerator

  • Yeaup... still RBAR... but the Tally table get's cached... While loop doesn't. 'Course, you knew that.

    If you don't like the Thorn character (used it mostly for visual), Tab character works great so folks can manage their entire enterprise on spreadsheets. :hehe:

    Best method for me has been to import like you did, find what I want, export it and reimport it letting BCP do the split or then use OpenRowSet to get the variable columns. I know, you can't do that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 46 through 48 (of 48 total)

You must be logged in to reply to this topic. Login to reply