Splitting a String in SQL - Help Please!

  • It sure does...

    ... but the OpenRowSet thing I showed does it right.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What happened to op?

    "Keep Trying"

  • Jeff Moden (9/16/2008)


    It sure does...

    It "eats" the comma from the last value.

    ... but the OpenRowSet thing I showed does it right.

    I explained why it's not an option.

    _____________
    Code for TallyGenerator

  • Sorry for not having posted sooner. I am still reading the thread. Aside from RBAR UDFs as I have suggested earlier, perhaps you can preprocess your data outside of SQL Server using C, C++, or C#. Afterward you can employe Jeff's or Sergiy's approaches without issue.

    Another option while using a preprocess approach (Source to Stage) would be to create file structures for each of your sources, eg.

    File Structure Alpha

    ===============

    Field1 10

    Field2 20

    Field3 15

    Field4 30

    File Structure Beta

    ===============

    Field1 5

    Field2 25

    Field3 10

    Field4 10

    Using the structures as a template against your source data affords you greater flexibility to clean/massage prior to dumping into a temp or stage table. Plus, you can apply each template to a particular data source over and over again. Preprocessing the data outside of SQL Server will get the job done faster and allow you to employ downstream routines as Jeff and Sergiy have suggested.

    But there is a margin of diminishing returns with every approach. I am familiar with what you are trying to accomplish and clearly everyone else who contributed posts to this thread is too. Being bias towards routines that I know work, I am still leaning on the side of a brute force RBAR approach using 100% SQL Server (UDFs). Like the preprocess template approach using tools outside of SQL Server, the routines that I spoke of in an earlier thread can be tailored toward each data source and reused over and over.

    -Mike

  • Getting back to my initial suggestion... aside from it being potentially slow / reliant on a '-' as a delimiter, is there any reason that wouldn't have worked? That's my standard practice when I need to split a field for display purposes, such as a name field that looks like 'John Smith (123-456-789) ext. 1234'. I know it'd be more efficient to do it in the app doing the dispaly, but that isn't always an option here.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • You'd better split it for data storage purposes.

    And (if needed) concatenate for display purposes.

    It would save you from a lot of problems.

    Google "Normalization Rules" - sometimes it's useful to know essential things about stuff you're working with.

    😛

    _____________
    Code for TallyGenerator

  • We do... for any fields that we are able to change. However, some fields we can't change without getting our front end or middle layer modified. That one in particular we have both ways(with the concatenated format, + fields for first name, last name, phone and ext), but there are a lot of things that still reference the old version. You can argue that it shouldn't have been that way in the first place, and I'd completely agree, but I wasn't here at that point.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • If I was you I'd create small and effective trigger on that table updating another, properly formatted table(s) when the value is updated.

    Then I'd gradually move selects from old to new structure.

    And I'd bomb my boss with articles about SQL injections to make him forbid ad-hoc SQL queries to database. Then I'd have a chance populate new structure without triggers.

    It's all about making professional decisions.

    _____________
    Code for TallyGenerator

  • Due to the nature of our setup, involving a complete custom built front end application, which only communicates through a custom built middle layer(which we have no access to modify in any way) and several hundred tables that I have very limited capabilities to alter(without affecting the front/middle layers), this isn't about simply deciding to be more uniform and implementing a process to get there. The entire database/coding structure is probably one the worst examples of RBAR logic (to borrow the term) imaginable. Literally thousands of triggers, many of which are several thousand lines long, and levels of sp/udf nesting that would take a week to completely map out.

    That said, I wasn't wondering how to make what I was suggesting unnecessary for the future, so much as I was suggesting it for the case of the OP, who also has data that should be normalized into 3 separate fields, but for reasons he did not specify, is not.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • More than one way to 'skin' a string! 😉

    CREATE TABLE #proditem (item1 VARCHAR(255))

    INSERT INTO #proditem values ('CSCAR009/0001/001-PASPBD-Pedestrian automatic sliding door')

    INSERT INTO #proditem values ('CSCAR009/0001/002-PASPBD2-Pedestrian automatic sliding double door')

    INSERT INTO #proditem values ('CSCAR009/0001/003-PASPBDADA-Pedestrian automatic sliding door for disabilities')

    Select LEFT(item1,PATINDEX('%-%',item1)-1) as firstID,

    LEFT(SUBSTRING(item1,PATINDEX('%-%',item1)+1,LEN(item1)),PATINDEX('%-%',SUBSTRING(item1,PATINDEX('%-%',item1)+1, LEN(item1)))-1) as secondID

    FROM #proditem

    firstID secondID

    CSCAR009/0001/001PASPBD

    CSCAR009/0001/002PASPBD2

    CSCAR009/0001/003PASPBDADA

    DROP TABLE #proditem

  • Sergiy (9/16/2008)


    Jeff Moden (9/16/2008)


    It sure does...

    It "eats" the comma from the last value.

    ... but the OpenRowSet thing I showed does it right.

    I explained why it's not an option.

    Could re-export what you think is good and reimport letting the split occur. Only other way I can think of to do it all in T-SQL would be to split each row into single characters in a table variable or temp table and do a quirky update with a counter that counts quotes and commas.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/16/2008)


    Could re-export what you think is good and reimport letting the split occur. Only other way I can think of to do it all in T-SQL would be to split each row into single characters in a table variable or temp table and do a quirky update with a counter that counts quotes and commas.

    Did not quite get it.

    Can you please show what kind of update you're talking about?

    No need for a working code, just some schema.

    _____________
    Code for TallyGenerator

  • By the way, when I said "It sure does", I meant it sure does fail... and in exactly the manner you suggested. Sorry for the confusion. 🙂

    I'll be back... I'm not sure I can pull this one off... guys that wrote OpenRowSet where a whole lot smarter than me. 😀

    I do see why you gave up... writing an "intelligent" true CSV splitter in T-SQL is gonna be tougher than I thought. If it were me, I'd be tempted to import like you do now, figure out which rows are worth saving, export them, and read them with OpenRowSet.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/16/2008)


    I do see why you gave up... writing an "intelligent" true CSV splitter in T-SQL is gonna be tougher than I thought.

    As I said, I eventually did it, but unfortunately using loop.

    That's why I need to use cursor when it comes to populating that long 3-column table.

    Everybody around is happy because I manage to upload 100k+ rows files in less than 10 minutes (including all data checks and validations) when previous version took about 3 hours.

    But I know how it should be...

    If any time you come up with an idea how to split quoted string usint Tally table - you know who's gonna appreciate it the most. 😉

    _____________
    Code for TallyGenerator

  • Heh... I might have it, ol' friend... I can't explain it, I just gotta do it. I'll be back. And you're right about the table... 3 columns. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 45 (of 48 total)

You must be logged in to reply to this topic. Login to reply