September 21, 2007 at 7:52 am
This topic might have been discussed many times.
I googled a lot for this, but no good solution.
Problem is that i have a text file, delimited by semicolon ( ; ) that i want to import, but number of columns are neither fixed nor in same order, and the main problem having that there is no limit on length of data in each column.
How can i import it???
I tried using openrowset, but its default length is 50 characters with datatype varchar, so my data gets truncated.
Tried a DTS too, but there as well, field width is 50 characters.
I tried to create a table with data types of all fields as text, but while inserting data, it gave me error of "exceeded maximum rows size of 8060"
Any idea, any workaround???
September 21, 2007 at 8:25 am
i had something similar. i had to create almost every single column manually by specifying the length. took me a few days
September 21, 2007 at 10:08 am
when you say the columns are not in the same order, what do you mean? is the data in name:value pairs, like 'fruit:apple;vegetable:peas and carrots;'
could you post a couple of sample lines, or a link to a sample since it's probably pretty big?
typically, this is what I do:
import the file into a table with a single varchar(max) column;
then i import from that table into the destination table; I could throw down some ugly examples, but if i saw sample data i could give a better example.
alternatively, if you know how many columns, you import them into a temp table where every column is defined as varchar(max),
then you could import into the final destination from the temp table, as long as the temp table'd data was smaller than the final destination's defined size.
Lowell
September 22, 2007 at 12:20 am
ok, some sample data
randomdata;10/10/2007;randomdata;randomdata;
Column1;Column3;Column5;Column4;Column2;
jk adkshfjhjkfhsajkfh sda;sdfs dfasdfsda;fsdaf sadfadsfsda;
dsf;sdf;sdf;sdfds;sdfsd;
So, here is the description.
First line, i dont want it to be in the table, but contains date that i want to store.
Then columns appear in any order every time the file is received, and CAN ALSO contain some new additional columns, even dropping some existing columns i.e. in next file columns may be
Column3;Column87;Column2;Column10;
And here comes the data description, simply can be of any length.
Now problems are that in a DTS, the default column length is 50, so data gets truncated, and i could not get any mean in DTS to skip first line either.
Any idea??? Keep in mind that i do want to store the date too
September 22, 2007 at 6:22 pm
sorry, i cant seem to see how you know which column is which;there's no obvious pattern in the sample you rpovided.
If you could determine that column 3 was always an int, or some other contants, we could help envision a solution.
as for teh first row having a date, i would simply bulk insert just the first row with one process, and then bulk insert the file skipping the first row for the rest.
but how would you know column 5 was dropped or added?
Lowell
September 23, 2007 at 10:28 am
Muhammad Furqan (22/09/2007)
ok, some sample datarandomdata;10/10/2007;randomdata;randomdata;
Column1;Column3;Column5;Column4;Column2;
jk adkshfjhjkfhsajkfh sda;sdfs dfasdfsda;fsdaf sadfadsfsda;
dsf;sdf;sdf;sdfds;sdfsd;
So, here is the description.
First line, i dont want it to be in the table, but contains date that i want to store.
Then columns appear in any order every time the file is received, and CAN ALSO contain some new additional columns, even dropping some existing columns i.e. in next file columns may be
Column3;Column87;Column2;Column10;
And here comes the data description, simply can be of any length.
Now problems are that in a DTS, the default column length is 50, so data gets truncated, and i could not get any mean in DTS to skip first line either.
Any idea??? Keep in mind that i do want to store the date too
So, what you're saying is that...
1: Always skip the first line of the file no matter what...
2: The second line of the file will always have what the correct column order for the data is...
3: The data will be in the same order as indicated by the second line...
4: The data will ALWAYS have the same number of delimiters and columns as the second line...
Is that correct?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 25, 2007 at 1:02 am
1: Always skip the first line of the file no matter what...
2: The second line of the file will always have what the correct column order for the data is...
3: The data will be in the same order as indicated by the second line...
4: The data will ALWAYS have the same number of delimiters and columns as the second line...
1: True
2: False - Column order is never correct
3: True
4: False - Can be Less as well
September 25, 2007 at 1:19 am
Lowell (9/22/2007)
sorry, i cant seem to see how you know which column is which;there's no obvious pattern in the sample you rpovided.If you could determine that column 3 was always an int, or some other contants, we could help envision a solution.
as for teh first row having a date, i would simply bulk insert just the first row with one process, and then bulk insert the file skipping the first row for the rest.
but how would you know column 5 was dropped or added?
"which column is which" is determined by the column name, "column1" , "column2" etc will always be same.
I cant determine that column 3 will always be int, that contain some other data too, so one thing confirmed that i will have to use text data types,
finally, which column dropped, that will simply vanish from file.
Thats whole mess. i think i should not have get into this project. 🙁
September 25, 2007 at 1:36 am
Yeaup... quite the mess.
This could be done all in T-SQL by importing the whole row into a wide column and doing a split (using a function or some clever code) to another table. I'm thinking that would take a fair amount of time though (performance wise)...
The two big problems is that row 1 will not have the same number of delimiters as row 2 so can't use BCP or BULK INSERT as the file stands. Same for the data rows... may not have the same number of delimiters as row 2.
You could write some VBS to strip off the first row and add the missing delimiters to the data rows... that would run quite fast. If the number of the rows didn't exceed the capacity of a spreadsheet, you could do a similar thing in the spreadsheet... but that would require some manual intervention.
Lemme know if you think the T-SQL solution I mentioned is viable... start with telling us how many rows you expect in a given file.
By the way... how are you going to know the name(s) of the file(s) you try to import?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 25, 2007 at 2:25 am
p.s. The best way to fix this is to pummel the vendor providing the data into submitting properly formatted data. The first row should be in a separate "control" file and the rest of the rows should all have the same number of and type of delimiters.
Remember, the answer is always "No" unless you ask... 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
September 25, 2007 at 2:56 am
Where I've had to deal with this sort of problem in the past, and it is difficult to avoid them if you are dealing with the output of engineering test equipment, I've found that by far the best approach is to read the datafeed via BCP into a table with an identity field and one column -varchar(max) / Varchar(8000) which holds the line. When you've pulled the entire mess in, you can bash it into whatever shape you can with string functions, according to whatever arcane rules apply, cursing all the way.
Best wishes,
Phil Factor
September 25, 2007 at 6:36 am
ok i got a vb project to work based on your parameters...here's my question;
in your example, the file had 5 columns. typically, BULK INSERT would expect 5 columns for all rows of the data in order to import.
in one of the rows, only 3 columns out of 5 were represented.
If the first three columns from row #2 are the fields for that specific row, I can logically import them into the columns from row 2, is that a safe assumption? otherwise, there's no way to guess which column was not included, unless you assume the were cut off fromt he end.
VB6 Code Example:
Dim FirstRow As String
Dim CurrentRow As String
Dim fsoobj As FileSystemObject
Dim strObj As TextStream
Dim Headers() As String
Dim RowData() As String
Dim MaxSize() As Long
Dim ImportantDate As Date
Dim i As Long
Dim rows As Long
Dim Conn As ADODB.Connection
Dim sql As String
'Psuedocode:
'1:go thru file once to get max() size for each element.
'2: create a table with the columns
'3: process the file for data
'because the column widths are unknown, I'm processing the file twice.
'once to get the column sizes, the second
rows = 1 'count of lines processed.
'##############################################################################
' Step 1 Get base info
'##############################################################################
Set fsoobj = New FileSystemObject
Set strObj = fsoobj.OpenTextFile(FullPathToFilename, ForReading, False)
Do While Not strObj.AtEndOfStream
CurrentRow = strObj.ReadLine
Select Case rows
Case 1
RowData = Split(CurrentRow, ";", , vbTextCompare)
ImportantDate = IIf(IsDate(RowData(1)), CDate(RowData(1)), Date) 'second element in array
Case 2
Headers = Split(CurrentRow, ";", , vbTextCompare)
ReDim MaxSize(UBound(Headers))
For i = 0 To UBound(RowData) - 1 'initialize it to zero
MaxSize(i) = 0
Next i
Case Else
RowData = Split(CurrentRow, ";", , vbTextCompare)
For i = 0 To UBound(RowData) - 1
MaxSize(i) = IIf(Len(RowData(i)) > MaxSize(i), Len(RowData(i)), MaxSize(i))
Next i
End Select
rows = rows + 1
Loop
strObj.Close
'##############################################################################
' Create the table
'##############################################################################
Set Conn = New ADODB.Connection
Conn.ConnectionString = "Provider=SQLOLEDB;Server=STORMSQL;database=GEO;network=DBMSSOCN;uid=sa;pwd=not my realpassword"
Conn.Open
sql = "CREATE TABLE STAGINGTABLE(" & vbCrLf
For i = 0 To UBound(Headers) - 1
sql = sql & "[" & Trim(Headers(i)) & "] VARCHAR(" & MaxSize(i) & ") ," & vbCrLf
Next i
sql = Left(sql, Len(sql) - 1) & ")" 'remove the trailing comma
Conn.Execute (sql)
'''table is now created, process the file via BULK insert for speed.
'can't do this in this case, because columns are not defined for all
'''so we switch to Row-ByAgonizing-Row processing here with the filestream, but it's not as efficient.
''sql = " BULK INSERT dbo.STAGINGTABLE " & vbCrLf
''sql = sql & " FROM '" & FullPathToFilename & "' " & vbCrLf
''sql = sql & " WITH " & vbCrLf
''sql = sql & " ( " & vbCrLf
''sql = sql & " FIELDTERMINATOR = ';', " & vbCrLf
''sql = sql & " ROWTERMINATOR = '', " & vbCrLf
''sql = sql & " FIRSTROW =3 " & vbCrLf
''sql = sql & " ) " & vbCrLf
''sql = sql & " " & vbCrLf
''Conn.Execute (sql)
Set fsoobj = New FileSystemObject
Set strObj = fsoobj.OpenTextFile(FullPathToFilename, ForReading, False)
rows = 1
Do While Not strObj.AtEndOfStream
CurrentRow = strObj.ReadLine
RowData = Split(CurrentRow, ";", , vbTextCompare)
sql = "INSERT INTO STAGINGTABLE ("
For i = 0 To UBound(RowData) - 1
sql = sql & Headers(i) & ","
Next i
sql = Left(sql, Len(sql) - 1) & ") VALUES (" 'remove the trailing comma
For i = 0 To UBound(RowData) - 1
sql = sql & "'" & RowData(i) & "',"
Next i
sql = Left(sql, Len(sql) - 1) & ") " 'remove the trailing comma
If rows >= 3 Then
Conn.Execute (sql)
End If
rows = rows + 1
Loop
Conn.Close
Set Conn = Nothing
Set strObj = Nothing
Set fsoobj = Nothing
Lowell
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply