txt file importing

  • This topic might have been discussed many times.

    I googled a lot for this, but no good solution.

    Problem is that i have a text file, delimited by semicolon ( ; ) that i want to import, but number of columns are neither fixed nor in same order, and the main problem having that there is no limit on length of data in each column.

    How can i import it???

    I tried using openrowset, but its default length is 50 characters with datatype varchar, so my data gets truncated.

    Tried a DTS too, but there as well, field width is 50 characters.

    I tried to create a table with data types of all fields as text, but while inserting data, it gave me error of "exceeded maximum rows size of 8060"

    Any idea, any workaround???

  • i had something similar. i had to create almost every single column manually by specifying the length. took me a few days

  • when you say the columns are not in the same order, what do you mean? is the data in name:value pairs, like 'fruit:apple;vegetable:peas and carrots;'

    could you post a couple of sample lines, or a link to a sample since it's probably pretty big?

    typically, this is what I do:

    import the file into a table with a single varchar(max) column;

    then i import from that table into the destination table; I could throw down some ugly examples, but if i saw sample data i could give a better example.

    alternatively, if you know how many columns, you import them into a temp table where every column is defined as varchar(max),

    then you could import into the final destination from the temp table, as long as the temp table'd data was smaller than the final destination's defined size.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • ok, some sample data

    randomdata;10/10/2007;randomdata;randomdata;

    Column1;Column3;Column5;Column4;Column2;

    jk adkshfjhjkfhsajkfh sda;sdfs dfasdfsda;fsdaf sadfadsfsda;

    dsf;sdf;sdf;sdfds;sdfsd;

    So, here is the description.

    First line, i dont want it to be in the table, but contains date that i want to store.

    Then columns appear in any order every time the file is received, and CAN ALSO contain some new additional columns, even dropping some existing columns i.e. in next file columns may be

    Column3;Column87;Column2;Column10;

    And here comes the data description, simply can be of any length.

    Now problems are that in a DTS, the default column length is 50, so data gets truncated, and i could not get any mean in DTS to skip first line either.

    Any idea??? Keep in mind that i do want to store the date too

  • sorry, i cant seem to see how you know which column is which;there's no obvious pattern in the sample you rpovided.

    If you could determine that column 3 was always an int, or some other contants, we could help envision a solution.

    as for teh first row having a date, i would simply bulk insert just the first row with one process, and then bulk insert the file skipping the first row for the rest.

    but how would you know column 5 was dropped or added?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Muhammad Furqan (22/09/2007)


    ok, some sample data

    randomdata;10/10/2007;randomdata;randomdata;

    Column1;Column3;Column5;Column4;Column2;

    jk adkshfjhjkfhsajkfh sda;sdfs dfasdfsda;fsdaf sadfadsfsda;

    dsf;sdf;sdf;sdfds;sdfsd;

    So, here is the description.

    First line, i dont want it to be in the table, but contains date that i want to store.

    Then columns appear in any order every time the file is received, and CAN ALSO contain some new additional columns, even dropping some existing columns i.e. in next file columns may be

    Column3;Column87;Column2;Column10;

    And here comes the data description, simply can be of any length.

    Now problems are that in a DTS, the default column length is 50, so data gets truncated, and i could not get any mean in DTS to skip first line either.

    Any idea??? Keep in mind that i do want to store the date too

    So, what you're saying is that...

    1:  Always skip the first line of the file no matter what...

    2:  The second line of the file will always have what the correct column order for the data is...

    3:  The data will be in the same order as indicated by the second line...

    4:  The data will ALWAYS have the same number of delimiters and columns as the second line...

    Is that correct?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • 1: Always skip the first line of the file no matter what...

    2: The second line of the file will always have what the correct column order for the data is...

    3: The data will be in the same order as indicated by the second line...

    4: The data will ALWAYS have the same number of delimiters and columns as the second line...

    1: True

    2: False - Column order is never correct

    3: True

    4: False - Can be Less as well

  • Lowell (9/22/2007)


    sorry, i cant seem to see how you know which column is which;there's no obvious pattern in the sample you rpovided.

    If you could determine that column 3 was always an int, or some other contants, we could help envision a solution.

    as for teh first row having a date, i would simply bulk insert just the first row with one process, and then bulk insert the file skipping the first row for the rest.

    but how would you know column 5 was dropped or added?

    "which column is which" is determined by the column name, "column1" , "column2" etc will always be same.

    I cant determine that column 3 will always be int, that contain some other data too, so one thing confirmed that i will have to use text data types,

    finally, which column dropped, that will simply vanish from file.

    Thats whole mess. i think i should not have get into this project. 🙁

  • Yeaup... quite the mess.

    This could be done all in T-SQL by importing the whole row into a wide column and doing a split (using a function or some clever code) to another table.  I'm thinking that would take a fair amount of time though (performance wise)...

    The two big problems is that row 1 will not have the same number of delimiters as row 2 so can't use BCP or BULK INSERT as the file stands.  Same for the data rows... may not have the same number of delimiters as row 2.

    You could write some VBS to strip off the first row and add the missing delimiters to the data rows... that would run quite fast.  If the number of the rows didn't exceed the capacity of a spreadsheet, you could do a similar thing in the spreadsheet... but that would require some manual intervention.

    Lemme know if you think the T-SQL solution I mentioned is viable... start with telling us how many rows you expect in a given file.

    By the way... how are you going to know the name(s) of the file(s) you try to import?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s.  The best way to fix this is to pummel the vendor providing the data into submitting properly formatted data.  The first row should be in a separate "control" file and the rest of the rows should all have the same number of and type of delimiters.

    Remember, the answer is always "No" unless you ask... 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Where I've had to deal with this sort of problem in the past, and it is difficult to avoid them if you are dealing with the output of engineering test equipment, I've found that by far the best approach is to read the datafeed via BCP into a table with an identity field and one column  -varchar(max) / Varchar(8000) which holds the line. When you've pulled the entire mess in, you can bash it into whatever shape you can with string functions, according to whatever arcane rules apply, cursing all the way.

    Best wishes,
    Phil Factor

  • ok i got a vb project to work based on your parameters...here's my question;

    in your example, the file had 5 columns. typically, BULK INSERT would expect 5 columns for all rows of the data in order to import.

    in one of the rows, only 3 columns out of 5 were represented.

    If the first three columns from row #2 are the fields for that specific row, I can logically import them into the columns from row 2, is that a safe assumption? otherwise, there's no way to guess which column was not included, unless you assume the were cut off fromt he end.

    VB6 Code Example:

        Dim FirstRow As String

        Dim CurrentRow As String

        Dim fsoobj As FileSystemObject

        Dim strObj As TextStream

        Dim Headers() As String

        Dim RowData() As String

        Dim MaxSize() As Long

        Dim ImportantDate As Date

        Dim i As Long

        Dim rows As Long

        Dim Conn As ADODB.Connection

        Dim sql As String

        'Psuedocode:

        '1:go thru file once to get max() size for each element.

        '2: create a table with the columns

        '3: process  the file for data

       

        'because the column widths are unknown, I'm processing the file twice.

        'once to get the column sizes, the second

        rows = 1   'count of lines processed.

        '##############################################################################

        ' Step 1 Get base info

        '##############################################################################

        Set fsoobj = New FileSystemObject

        Set strObj = fsoobj.OpenTextFile(FullPathToFilename, ForReading, False)

        Do While Not strObj.AtEndOfStream

            CurrentRow = strObj.ReadLine

            Select Case rows

            Case 1

                RowData = Split(CurrentRow, ";", , vbTextCompare)

                ImportantDate = IIf(IsDate(RowData(1)), CDate(RowData(1)), Date) 'second element in array

            Case 2

                Headers = Split(CurrentRow, ";", , vbTextCompare)

                ReDim MaxSize(UBound(Headers))

                For i = 0 To UBound(RowData) - 1 'initialize it to zero

                    MaxSize(i) = 0

                Next i

            Case Else

                RowData = Split(CurrentRow, ";", , vbTextCompare)

               

                For i = 0 To UBound(RowData) - 1

                    MaxSize(i) = IIf(Len(RowData(i)) > MaxSize(i), Len(RowData(i)), MaxSize(i))

                Next i

               

            End Select

            rows = rows + 1

        Loop

        strObj.Close

        '##############################################################################

        ' Create the table

        '##############################################################################

        Set Conn = New ADODB.Connection

        Conn.ConnectionString = "Provider=SQLOLEDB;Server=STORMSQL;database=GEO;network=DBMSSOCN;uid=sa;pwd=not my realpassword"

        Conn.Open

        sql = "CREATE TABLE STAGINGTABLE(" & vbCrLf

       

        For i = 0 To UBound(Headers) - 1

            sql = sql & "[" & Trim(Headers(i)) & "] VARCHAR(" & MaxSize(i) & ") ," & vbCrLf

        Next i

        sql = Left(sql, Len(sql) - 1) & ")" 'remove the trailing comma

       

        Conn.Execute (sql)

        '''table is now created, process the file via BULK insert for speed.

        'can't do this in this case, because columns are not defined for all

        '''so we switch to Row-ByAgonizing-Row processing here with the filestream, but it's not as efficient.

        ''sql = " BULK INSERT dbo.STAGINGTABLE " & vbCrLf

        ''sql = sql & "    FROM '" & FullPathToFilename & "' " & vbCrLf

        ''sql = sql & "    WITH  " & vbCrLf

        ''sql = sql & "       ( " & vbCrLf

        ''sql = sql & "          FIELDTERMINATOR = ';', " & vbCrLf

        ''sql = sql & "          ROWTERMINATOR = '', " & vbCrLf

        ''sql = sql & "          FIRSTROW =3 " & vbCrLf

        ''sql = sql & "       ) " & vbCrLf

        ''sql = sql & "  " & vbCrLf

        ''Conn.Execute (sql)

        Set fsoobj = New FileSystemObject

        Set strObj = fsoobj.OpenTextFile(FullPathToFilename, ForReading, False)

        rows = 1

        Do While Not strObj.AtEndOfStream

           

            CurrentRow = strObj.ReadLine

            RowData = Split(CurrentRow, ";", , vbTextCompare)

            sql = "INSERT INTO STAGINGTABLE ("

            For i = 0 To UBound(RowData) - 1

                sql = sql & Headers(i) & ","

            Next i

            sql = Left(sql, Len(sql) - 1) & ") VALUES (" 'remove the trailing comma

            For i = 0 To UBound(RowData) - 1

                sql = sql & "'" & RowData(i) & "',"

            Next i

            sql = Left(sql, Len(sql) - 1) & ") " 'remove the trailing comma

            If rows >= 3 Then

                Conn.Execute (sql)

            End If

            rows = rows + 1

        Loop

        Conn.Close

        Set Conn = Nothing

        Set strObj = Nothing

        Set fsoobj = Nothing

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply