importing tab delimited file with various line length

  • the goal is to import file so that i can work with it.

    here is a sample:

    A 919934 [tab] 1 [tab] jason

    B 2222 [tab] 999999 [tab] 111111 [tab] 1113

    C 2 [tab] 1

    ....

    first letter stands for particular code. any suggestions on how to parse this?

    i only need to pull certain lines of code. Help?

  • You can't easily parse that. I'd bring all the rows in to a single Varchar field and then use T-SQL to parse it.

    If you want to exclude rows depending on character 1 of the row, you could use a Conditional Split - that reduces your T-SQL work (and the number of db writes).

    If you want to do the whole job in SSIS, you may be able to get somewhere using derived columns after your Conditional Split. If that's not powerful enough, you'll have to add a Script Component.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thank you for your suggestion,

    the plan is to:

    #1 import file into single array

    #2 parse the right line into temp table, so that i can work with it

    can you give me example how to do #1?

  • rnikandrov (1/11/2010)


    Thank you for your suggestion,

    the plan is to:

    #1 import file into single array

    #2 parse the right line into temp table, so that i can work with it

    can you give me example how to do #1?

    How's this?

    CREATE TABLE #temp (Col1 varchar(500))

    BULK INSERT #temp FROM '<your file>'

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • that was embarrassing, i knew that 🙂

    thanks, i was just making things more difficult.

    How would i go about reading tab delimited row? (step 2)

  • Use a combination of LEFT, CHARINDEX and SUBSTRING to split the field up.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • ... and PATINDEX.

    You might want to check out the article by Phil Factor on http://www.simple-talk.com dealing with the "quirky update". It shows an excellent example of splitting up a line into multiple parts.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • I suggest you look at

    http://www.sqlservercentral.com/Forums/Topic845680-338-1.aspx

    For grins and giggles I modified the input data for the above forum to replace the commas in the string with tabs.

    INSERT INTO @test-2

    SELECT REPLACE('xyz, 54050, FY11, Apr, Budget, Version_1, 0160117',',',CHAR(9)) UNION ALL

    SELECT REPLACE('abc, 54050, FY11, May, Budget, Version_1, 0160117',',',CHAR(9))

    Then ran the CTE with suitable changes and it worked.

    Modified a portion of the CTE code:

    -- update the columns to hold the tab positions

    UPDATE #Test

    SET @Col1EndPos = Col1EndPos = CharIndex(CHAR(9), RowData),

    @Col2EndPos = Col2EndPos = CharIndex(CHAR(9), RowData, @Col1EndPos + 1),

    @Col3EndPos = Col3EndPos = CharIndex(CHAR(9), RowData, @Col2EndPos + 1),

    @Col4EndPos = Col4EndPos = CharIndex(CHAR(9), RowData, @Col3EndPos + 1),

    @Col5EndPos = Col5EndPos = CharIndex(CHAR(9), RowData, @Col4EndPos + 1)

    -- now, get the data for each column

    SELECT [ID] = ROW_NUMBER() OVER (ORDER BY RowData),

    [Account] = LEFT(RowData, @Col1EndPos-1),

    [Year] = SUBSTRING(RowData, @Col1EndPos+1, @Col2EndPos-@Col1EndPos-1),

    [Month] = SUBSTRING(RowData, @Col2EndPos+1, @Col3EndPos-@Col2EndPos-1),

    [Scenario] = SUBSTRING(RowData, @Col3EndPos+1, @Col4EndPos-@Col3EndPos-1),

    [Version] = SUBSTRING(RowData, @Col4EndPos+1, @Col5EndPos-@Col4EndPos-1),

    [Entity] = SUBSTRING(RowData, @Col5EndPos+1, len(RowData)-@Col5EndPos-1)

    FROM #Test

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • rnikandrov (1/8/2010)


    the goal is to import file so that i can work with it.

    here is a sample:

    A 919934 [tab] 1 [tab] jason

    B 2222 [tab] 999999 [tab] 111111 [tab] 1113

    C 2 [tab] 1

    ....

    first letter stands for particular code. any suggestions on how to parse this?

    i only need to pull certain lines of code. Help?

    What do you want to parse it on? Spaces, Tabs, or both?

    You also say that you only "need to pull certain lines of code". Which lines might those be?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/11/2010)


    rnikandrov (1/8/2010)


    the goal is to import file so that i can work with it.

    here is a sample:

    A 919934 [tab] 1 [tab] jason

    B 2222 [tab] 999999 [tab] 111111 [tab] 1113

    C 2 [tab] 1

    ....

    first letter stands for particular code. any suggestions on how to parse this?

    i only need to pull certain lines of code. Help?

    What do you want to parse it on? Spaces, Tabs, or both?

    You also say that you only "need to pull certain lines of code". Which lines might those be?

    To parse, is to break down into parts. It is shown that each line, of different length, is tab delimited. I thought it was clear.

  • bitbucket-25253 (1/11/2010)


    I suggest you look at

    http://www.sqlservercentral.com/Forums/Topic845680-338-1.aspx

    For grins and giggles I modified the input data for the above forum to replace the commas in the string with tabs.

    INSERT INTO @test-2

    SELECT REPLACE('xyz, 54050, FY11, Apr, Budget, Version_1, 0160117',',',CHAR(9)) UNION ALL

    SELECT REPLACE('abc, 54050, FY11, May, Budget, Version_1, 0160117',',',CHAR(9))

    Then ran the CTE with suitable changes and it worked.

    Modified a portion of the CTE code:

    -- update the columns to hold the tab positions

    UPDATE #Test

    SET @Col1EndPos = Col1EndPos = CharIndex(CHAR(9), RowData),

    @Col2EndPos = Col2EndPos = CharIndex(CHAR(9), RowData, @Col1EndPos + 1),

    @Col3EndPos = Col3EndPos = CharIndex(CHAR(9), RowData, @Col2EndPos + 1),

    @Col4EndPos = Col4EndPos = CharIndex(CHAR(9), RowData, @Col3EndPos + 1),

    @Col5EndPos = Col5EndPos = CharIndex(CHAR(9), RowData, @Col4EndPos + 1)

    -- now, get the data for each column

    SELECT [ID] = ROW_NUMBER() OVER (ORDER BY RowData),

    [Account] = LEFT(RowData, @Col1EndPos-1),

    [Year] = SUBSTRING(RowData, @Col1EndPos+1, @Col2EndPos-@Col1EndPos-1),

    [Month] = SUBSTRING(RowData, @Col2EndPos+1, @Col3EndPos-@Col2EndPos-1),

    [Scenario] = SUBSTRING(RowData, @Col3EndPos+1, @Col4EndPos-@Col3EndPos-1),

    [Version] = SUBSTRING(RowData, @Col4EndPos+1, @Col5EndPos-@Col4EndPos-1),

    [Entity] = SUBSTRING(RowData, @Col5EndPos+1, len(RowData)-@Col5EndPos-1)

    FROM #Test

    Thanks. This was good.

    Thank you everyone for your help!

  • In your example, there is no tab after char1, the 'code', yet the rest of the data has tabs delimiting the fields. So it's really not clear - it looks like a mix. Hence the question.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (1/12/2010)


    In your example, there is no tab after char1, the 'code', yet the rest of the data has tabs delimiting the fields. So it's really not clear - it looks like a mix. Hence the question.

    I see your point, however you were able to help me 🙂

    thank you again!

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply