importing tab delimited file with various line length

Question

importing tab delimited file with various line length

rnikandrov

SSChasing Mays

Points: 635
More actions
January 8, 2010 at 2:53 pm

#136358

the goal is to import file so that i can work with it.
here is a sample:
A 919934 [tab] 1 [tab] jason
B 2222 [tab] 999999 [tab] 111111 [tab] 1113
C 2 [tab] 1
....
first letter stands for particular code. any suggestions on how to parse this?
i only need to pull certain lines of code. Help?

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246962 More actions · Answer 1

You can't easily parse that. I'd bring all the rows in to a single Varchar field and then use T-SQL to parse it.

If you want to exclude rows depending on character 1 of the row, you could use a Conditional Split - that reduces your T-SQL work (and the number of db writes).

If you want to do the whole job in SSIS, you may be able to get somewhere using derived columns after your Conditional Split. If that's not powerful enough, you'll have to add a Script Component.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

rnikandrov SSChasing Mays Points: 635 More actions · Answer 2

Thank you for your suggestion,

the plan is to:

#1 import file into single array

#2 parse the right line into temp table, so that i can work with it

can you give me example how to do #1?

WayneS SSC Guru Points: 95460 More actions · Answer 3

rnikandrov (1/11/2010)
Thank you for your suggestion,
the plan is to:
#1 import file into single array
#2 parse the right line into temp table, so that i can work with it
can you give me example how to do #1?

How's this?

CREATE TABLE #temp (Col1 varchar(500))

BULK INSERT #temp FROM '<your file>'

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

rnikandrov SSChasing Mays Points: 635 More actions · Answer 4

that was embarrassing, i knew that 🙂

thanks, i was just making things more difficult.

How would i go about reading tab delimited row? (step 2)

Phil Parkin SSC Guru Points: 246962 More actions · Answer 5

Use a combination of LEFT, CHARINDEX and SUBSTRING to split the field up.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

WayneS SSC Guru Points: 95460 More actions · Answer 6

... and PATINDEX.

You might want to check out the article by Phil Factor on http://www.simple-talk.com dealing with the "quirky update". It shows an excellent example of splitting up a line into multiple parts.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Ron McCullough SSC Guru Points: 63877 More actions · Answer 7

I suggest you look at

http://www.sqlservercentral.com/Forums/Topic845680-338-1.aspx

For grins and giggles I modified the input data for the above forum to replace the commas in the string with tabs.

INSERT INTO @test-2

SELECT REPLACE('xyz, 54050, FY11, Apr, Budget, Version_1, 0160117',',',CHAR(9)) UNION ALL

SELECT REPLACE('abc, 54050, FY11, May, Budget, Version_1, 0160117',',',CHAR(9))

Then ran the CTE with suitable changes and it worked.

Modified a portion of the CTE code:

-- update the columns to hold the tab positions

UPDATE #Test

SET @Col1EndPos = Col1EndPos = CharIndex(CHAR(9), RowData),

@Col2EndPos = Col2EndPos = CharIndex(CHAR(9), RowData, @Col1EndPos + 1),

@Col3EndPos = Col3EndPos = CharIndex(CHAR(9), RowData, @Col2EndPos + 1),

@Col4EndPos = Col4EndPos = CharIndex(CHAR(9), RowData, @Col3EndPos + 1),

@Col5EndPos = Col5EndPos = CharIndex(CHAR(9), RowData, @Col4EndPos + 1)

-- now, get the data for each column

SELECT [ID] = ROW_NUMBER() OVER (ORDER BY RowData),

[Account] = LEFT(RowData, @Col1EndPos-1),

[Year] = SUBSTRING(RowData, @Col1EndPos+1, @Col2EndPos-@Col1EndPos-1),

[Month] = SUBSTRING(RowData, @Col2EndPos+1, @Col3EndPos-@Col2EndPos-1),

[Scenario] = SUBSTRING(RowData, @Col3EndPos+1, @Col4EndPos-@Col3EndPos-1),

[Version] = SUBSTRING(RowData, @Col4EndPos+1, @Col5EndPos-@Col4EndPos-1),

[Entity] = SUBSTRING(RowData, @Col5EndPos+1, len(RowData)-@Col5EndPos-1)

FROM #Test

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 8

rnikandrov (1/8/2010)
the goal is to import file so that i can work with it.
here is a sample:
A 919934 [tab] 1 [tab] jason
B 2222 [tab] 999999 [tab] 111111 [tab] 1113
C 2 [tab] 1
....
first letter stands for particular code. any suggestions on how to parse this?
i only need to pull certain lines of code. Help?

What do you want to parse it on? Spaces, Tabs, or both?

You also say that you only "need to pull certain lines of code". Which lines might those be?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

rnikandrov SSChasing Mays Points: 635 More actions · Answer 9

Jeff Moden (1/11/2010)
rnikandrov (1/8/2010)
the goal is to import file so that i can work with it.
here is a sample:
A 919934 [tab] 1 [tab] jason
B 2222 [tab] 999999 [tab] 111111 [tab] 1113
C 2 [tab] 1
....
first letter stands for particular code. any suggestions on how to parse this?
i only need to pull certain lines of code. Help?
What do you want to parse it on? Spaces, Tabs, or both?
You also say that you only "need to pull certain lines of code". Which lines might those be?

To parse, is to break down into parts. It is shown that each line, of different length, is tab delimited. I thought it was clear.

rnikandrov SSChasing Mays Points: 635 More actions · Answer 10

bitbucket-25253 (1/11/2010)
I suggest you look at
http://www.sqlservercentral.com/Forums/Topic845680-338-1.aspx
For grins and giggles I modified the input data for the above forum to replace the commas in the string with tabs.
INSERT INTO @test-2
SELECT REPLACE('xyz, 54050, FY11, Apr, Budget, Version_1, 0160117',',',CHAR(9)) UNION ALL
SELECT REPLACE('abc, 54050, FY11, May, Budget, Version_1, 0160117',',',CHAR(9))
Then ran the CTE with suitable changes and it worked.
Modified a portion of the CTE code:
-- update the columns to hold the tab positions
UPDATE #Test
SET @Col1EndPos = Col1EndPos = CharIndex(CHAR(9), RowData),
@Col2EndPos = Col2EndPos = CharIndex(CHAR(9), RowData, @Col1EndPos + 1),
@Col3EndPos = Col3EndPos = CharIndex(CHAR(9), RowData, @Col2EndPos + 1),
@Col4EndPos = Col4EndPos = CharIndex(CHAR(9), RowData, @Col3EndPos + 1),
@Col5EndPos = Col5EndPos = CharIndex(CHAR(9), RowData, @Col4EndPos + 1)
-- now, get the data for each column
SELECT [ID] = ROW_NUMBER() OVER (ORDER BY RowData),
[Account] = LEFT(RowData, @Col1EndPos-1),
[Year] = SUBSTRING(RowData, @Col1EndPos+1, @Col2EndPos-@Col1EndPos-1),
[Month] = SUBSTRING(RowData, @Col2EndPos+1, @Col3EndPos-@Col2EndPos-1),
[Scenario] = SUBSTRING(RowData, @Col3EndPos+1, @Col4EndPos-@Col3EndPos-1),
[Version] = SUBSTRING(RowData, @Col4EndPos+1, @Col5EndPos-@Col4EndPos-1),
[Entity] = SUBSTRING(RowData, @Col5EndPos+1, len(RowData)-@Col5EndPos-1)
FROM #Test

Thanks. This was good.

Thank you everyone for your help!

Phil Parkin SSC Guru Points: 246962 More actions · Answer 11

In your example, there is no tab after char1, the 'code', yet the rest of the data has tabs delimiting the fields. So it's really not clear - it looks like a mix. Hence the question.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

rnikandrov SSChasing Mays Points: 635 More actions · Answer 12

Phil Parkin (1/12/2010)
In your example, there is no tab after char1, the 'code', yet the rest of the data has tabs delimiting the fields. So it's really not clear - it looks like a mix. Hence the question.

I see your point, however you were able to help me 🙂

thank you again!