txt file importing

Question

txt file importing

Muhammad Furqan

Default port

Points: 1461
More actions
September 21, 2007 at 7:52 am

#198037

This topic might have been discussed many times.
I googled a lot for this, but no good solution.
Problem is that i have a text file, delimited by semicolon ( ; ) that i want to import, but number of columns are neither fixed nor in same order, and the main problem having that there is no limit on length of data in each column.
How can i import it???
I tried using openrowset, but its default length is 50 characters with datatype varchar, so my data gets truncated.
Tried a DTS too, but there as well, field width is 50 characters.
I tried to create a table with data types of all fields as text, but while inserting data, it gave me error of "exceeded maximum rows size of 8060"
Any idea, any workaround???

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

alen teplitsky SSC-Dedicated Points: 30014 More actions · Answer 1

i had something similar. i had to create almost every single column manually by specifying the length. took me a few days

Lowell SSC Guru Points: 323518 More actions · Answer 2

when you say the columns are not in the same order, what do you mean? is the data in name:value pairs, like 'fruit:apple;vegetable:peas and carrots;'

could you post a couple of sample lines, or a link to a sample since it's probably pretty big?

typically, this is what I do:

import the file into a table with a single varchar(max) column;

then i import from that table into the destination table; I could throw down some ugly examples, but if i saw sample data i could give a better example.

alternatively, if you know how many columns, you import them into a temp table where every column is defined as varchar(max),

then you could import into the final destination from the temp table, as long as the temp table'd data was smaller than the final destination's defined size.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Muhammad Furqan Default port Points: 1461 More actions · Answer 3

ok, some sample data

randomdata;10/10/2007;randomdata;randomdata;

Column1;Column3;Column5;Column4;Column2;

jk adkshfjhjkfhsajkfh sda;sdfs dfasdfsda;fsdaf sadfadsfsda;

dsf;sdf;sdf;sdfds;sdfsd;

So, here is the description.

First line, i dont want it to be in the table, but contains date that i want to store.

Then columns appear in any order every time the file is received, and CAN ALSO contain some new additional columns, even dropping some existing columns i.e. in next file columns may be

Column3;Column87;Column2;Column10;

And here comes the data description, simply can be of any length.

Now problems are that in a DTS, the default column length is 50, so data gets truncated, and i could not get any mean in DTS to skip first line either.

Any idea??? Keep in mind that i do want to store the date too

Lowell SSC Guru Points: 323518 More actions · Answer 4

sorry, i cant seem to see how you know which column is which;there's no obvious pattern in the sample you rpovided.

If you could determine that column 3 was always an int, or some other contants, we could help envision a solution.

as for teh first row having a date, i would simply bulk insert just the first row with one process, and then bulk insert the file skipping the first row for the rest.

but how would you know column 5 was dropped or added?

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 5

Muhammad Furqan (22/09/2007)
ok, some sample data
randomdata;10/10/2007;randomdata;randomdata;
Column1;Column3;Column5;Column4;Column2;
jk adkshfjhjkfhsajkfh sda;sdfs dfasdfsda;fsdaf sadfadsfsda;
dsf;sdf;sdf;sdfds;sdfsd;
So, here is the description.
First line, i dont want it to be in the table, but contains date that i want to store.
Then columns appear in any order every time the file is received, and CAN ALSO contain some new additional columns, even dropping some existing columns i.e. in next file columns may be
Column3;Column87;Column2;Column10;
And here comes the data description, simply can be of any length.
Now problems are that in a DTS, the default column length is 50, so data gets truncated, and i could not get any mean in DTS to skip first line either.
Any idea??? Keep in mind that i do want to store the date too

So, what you're saying is that...

1: Always skip the first line of the file no matter what...

2: The second line of the file will always have what the correct column order for the data is...

3: The data will be in the same order as indicated by the second line...

4: The data will ALWAYS have the same number of delimiters and columns as the second line...

Is that correct?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Muhammad Furqan Default port Points: 1461 More actions · Answer 6

1: Always skip the first line of the file no matter what...
2: The second line of the file will always have what the correct column order for the data is...
3: The data will be in the same order as indicated by the second line...
4: The data will ALWAYS have the same number of delimiters and columns as the second line...

1: True

2: False - Column order is never correct

3: True

4: False - Can be Less as well

Muhammad Furqan Default port Points: 1461 More actions · Answer 7

Lowell (9/22/2007)
sorry, i cant seem to see how you know which column is which;there's no obvious pattern in the sample you rpovided.
If you could determine that column 3 was always an int, or some other contants, we could help envision a solution.
as for teh first row having a date, i would simply bulk insert just the first row with one process, and then bulk insert the file skipping the first row for the rest.
but how would you know column 5 was dropped or added?

"which column is which" is determined by the column name, "column1" , "column2" etc will always be same.

I cant determine that column 3 will always be int, that contain some other data too, so one thing confirmed that i will have to use text data types,

finally, which column dropped, that will simply vanish from file.

Thats whole mess. i think i should not have get into this project. 🙁

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 8

Yeaup... quite the mess.

This could be done all in T-SQL by importing the whole row into a wide column and doing a split (using a function or some clever code) to another table. I'm thinking that would take a fair amount of time though (performance wise)...

The two big problems is that row 1 will not have the same number of delimiters as row 2 so can't use BCP or BULK INSERT as the file stands. Same for the data rows... may not have the same number of delimiters as row 2.

You could write some VBS to strip off the first row and add the missing delimiters to the data rows... that would run quite fast. If the number of the rows didn't exceed the capacity of a spreadsheet, you could do a similar thing in the spreadsheet... but that would require some manual intervention.

Lemme know if you think the T-SQL solution I mentioned is viable... start with telling us how many rows you expect in a given file.

By the way... how are you going to know the name(s) of the file(s) you try to import?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004407 More actions · Answer 9

p.s. The best way to fix this is to pummel the vendor providing the data into submitting properly formatted data. The first row should be in a separate "control" file and the rest of the rows should all have the same number of and type of delimiters.

Remember, the answer is always "No" unless you ask... 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Phil Factor SSC-Insane Points: 20244 More actions · Answer 10

Where I've had to deal with this sort of problem in the past, and it is difficult to avoid them if you are dealing with the output of engineering test equipment, I've found that by far the best approach is to read the datafeed via BCP into a table with an identity field and one column -varchar(max) / Varchar(8000) which holds the line. When you've pulled the entire mess in, you can bash it into whatever shape you can with string functions, according to whatever arcane rules apply, cursing all the way.

Best wishes,
Phil Factor

Lowell SSC Guru Points: 323518 More actions · Answer 11

ok i got a vb project to work based on your parameters...here's my question;

in your example, the file had 5 columns. typically, BULK INSERT would expect 5 columns for all rows of the data in order to import.

in one of the rows, only 3 columns out of 5 were represented.

If the first three columns from row #2 are the fields for that specific row, I can logically import them into the columns from row 2, is that a safe assumption? otherwise, there's no way to guess which column was not included, unless you assume the were cut off fromt he end.

VB6 Code Example:

Dim FirstRow As String

Dim CurrentRow As String

Dim fsoobj As FileSystemObject

Dim strObj As TextStream

Dim Headers() As String

Dim RowData() As String

Dim MaxSize() As Long

Dim ImportantDate As Date

Dim i As Long

Dim rows As Long

Dim Conn As ADODB.Connection

Dim sql As String

'Psuedocode:

'1:go thru file once to get max() size for each element.

'2: create a table with the columns

'3: process the file for data

'because the column widths are unknown, I'm processing the file twice.

'once to get the column sizes, the second

rows = 1 'count of lines processed.

'##############################################################################

' Step 1 Get base info

'##############################################################################

Set fsoobj = New FileSystemObject

Set strObj = fsoobj.OpenTextFile(FullPathToFilename, ForReading, False)

Do While Not strObj.AtEndOfStream

CurrentRow = strObj.ReadLine

Select Case rows

Case 1

RowData = Split(CurrentRow, ";", , vbTextCompare)

ImportantDate = IIf(IsDate(RowData(1)), CDate(RowData(1)), Date) 'second element in array

Case 2

Headers = Split(CurrentRow, ";", , vbTextCompare)

ReDim MaxSize(UBound(Headers))

For i = 0 To UBound(RowData) - 1 'initialize it to zero

MaxSize(i) = 0

Next i

Case Else

RowData = Split(CurrentRow, ";", , vbTextCompare)

For i = 0 To UBound(RowData) - 1

MaxSize(i) = IIf(Len(RowData(i)) > MaxSize(i), Len(RowData(i)), MaxSize(i))

Next i

End Select

rows = rows + 1

Loop

strObj.Close

'##############################################################################

' Create the table

'##############################################################################

Set Conn = New ADODB.Connection

Conn.ConnectionString = "Provider=SQLOLEDB;Server=STORMSQL;database=GEO;network=DBMSSOCN;uid=sa;pwd=not my realpassword"

Conn.Open

sql = "CREATE TABLE STAGINGTABLE(" & vbCrLf

For i = 0 To UBound(Headers) - 1

sql = sql & "[" & Trim(Headers(i)) & "] VARCHAR(" & MaxSize(i) & ") ," & vbCrLf

Next i

sql = Left(sql, Len(sql) - 1) & ")" 'remove the trailing comma

Conn.Execute (sql)

'''table is now created, process the file via BULK insert for speed.

'can't do this in this case, because columns are not defined for all

'''so we switch to Row-ByAgonizing-Row processing here with the filestream, but it's not as efficient.

''sql = " BULK INSERT dbo.STAGINGTABLE " & vbCrLf

''sql = sql & " FROM '" & FullPathToFilename & "' " & vbCrLf

''sql = sql & " WITH " & vbCrLf

''sql = sql & " ( " & vbCrLf

''sql = sql & " FIELDTERMINATOR = ';', " & vbCrLf

''sql = sql & " ROWTERMINATOR = '', " & vbCrLf

''sql = sql & " FIRSTROW =3 " & vbCrLf

''sql = sql & " ) " & vbCrLf

''sql = sql & " " & vbCrLf

''Conn.Execute (sql)

Set fsoobj = New FileSystemObject

Set strObj = fsoobj.OpenTextFile(FullPathToFilename, ForReading, False)

rows = 1

Do While Not strObj.AtEndOfStream

CurrentRow = strObj.ReadLine

RowData = Split(CurrentRow, ";", , vbTextCompare)

sql = "INSERT INTO STAGINGTABLE ("

For i = 0 To UBound(RowData) - 1

sql = sql & Headers(i) & ","

Next i

sql = Left(sql, Len(sql) - 1) & ") VALUES (" 'remove the trailing comma

For i = 0 To UBound(RowData) - 1

sql = sql & "'" & RowData(i) & "',"

Next i

sql = Left(sql, Len(sql) - 1) & ") " 'remove the trailing comma

If rows >= 3 Then

Conn.Execute (sql)

End If

rows = rows + 1

Loop

Conn.Close

Set Conn = Nothing

Set strObj = Nothing

Set fsoobj = Nothing

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!