Help with Delimited String Parsing

  • Hi,

    I am looking for some ideas on processing an un-naturally delimited string. For each row, the first part of it is delimited by commas. This section represents the header. Following this on the same line, there are additional "items" delimited by '~'. This can repeat 0-n times and the column count identifies the split.

    A simplified example:

    Input:

    1,Order1~1~Order1Item1

    2,Order2~1~Order2Item1~2~Order2Item2

    Output should become:

    Order Table(OrderID, OrderDescription)

    1, Order1

    2, Order2

    Item Table(OrderID, ItemID, ItemDescription)

    1, 1, Order1Item1

    2, 1, Order2Item1

    2, 2, Order2Item2

    Currently, this is written in DTS and there are multiple passes moving the data between the database and text files. I'm re-writing this in SSIS. I can follow the same approach but would like something cleaner if it's available. If anyone has any ideas, I'd appreciate feedback.

    Some extra notes:

    1. Parsing manually would be a real pain becase there are actually 60 columns in the header table and 29 in the detail table.

    2. To make things worse, any T-SQL will need to be SQL 2000 compliant. I'm trying to upgrade but need to get all the DTS packages moved first.

    3. In the detail section, the first detail "row" is delimited by a ~ and the second by a CRLF.

    Thanks,

    Brian.

  • So if I'm reading this right you have have two sets of data in your input file in each row, the order information which is delimited by commas then the item information which is delimmited by ~?

    You might be able to first delimit the data by commas in the data source, that would give you all the header information except the last column would also have the detail information in it. Then you could use a derived column to split that last column into two columns, one with the header record column and the other with all the detail records.

    From there the header records would be ready to insert into the destination table but you would still need to work with the detail records. So you could pass the column containing the detail records into a script task in the data flow and use that to generate as output one record for each detail record that you parsed out of the input column. Then those records should also be ready to insert into a destination table.

    3. In the detail section, the first detail "row" is delimited by a ~ and the second by a CRLF.

    I'm not sure about this since in your example it looks like the detail records are just delimited by ~?

  • 3. In the detail section, the first detail "row" is delimited by a ~ and the second by a CRLF.

    I'm not sure about this since in your example it looks like the detail records are just delimited by ~?

    (I should have said last detail row instead of second) What I meant by this was that the "row delimiter" is ~ until the end of the line, when it changes to CRLF.

    1~Order2Item1~2~Order2Item2[CRLF]

    I've never used the Data Flow script component. Can you recommend any reading on how to do this?

    Thanks,

    Brian.

  • I believe that you might find your solution using code developed by Jeff Moden .... Read this article:

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Brian Carlson (9/7/2011)


    (I should have said last detail row instead of second) What I meant by this was that the "row delimiter" is ~ until the end of the line, when it changes to CRLF.

    1~Order2Item1~2~Order2Item2[CRLF]

    I've never used the Data Flow script component. Can you recommend any reading on how to do this?

    Thanks,

    Brian.

    Here's a decent series of articles on using script components , one of the examples in there should show you how to parse the input rows and create custom output rows.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply