Import a variable text file

  • I need to import a text file using a stored procedure. Problem is it is a variable field, variable row length file.  The fields are denoted with the following syntax (####) with #### being a 1 to 4 digit number describing the field. I.E.  (1), (72)  etc. The field value follows the field designator

    ex:  row 1:    00001(1)123456(72)34442233(73)3(4)fffff

           row 2:   00002(1)3456(72)3233(73)2(5)kjkjl(1501)kjk

           row 3:   00003(72)0151 (1)             (73)2(3)001289 (4)B

          ...

    So the rows will contain variable fields and variable length values in those fields.

    I have tried the following

    BULK INSERT compares..ispread FROM 'd:\compares\isp.txt' WITH (DATAFILETYPE = 'char',FIELDTERMINATOR = '(',ROWTERMINATOR = '\r\n')

    and get the following error

    Bulk Insert fails. Column is too long in the data file for row 1, column 70. Make sure the field terminator and row terminator are specified correctly.

    I know that there are less than 70 open parentheses in the file.

    My table is defined with col001 thru col070. It's like it is not parsing correctly on the (

    Have also tied select into with text file as linked server into table with just one column. Unfortunately my data contains commas and no field names and it takes the first row as the column names and splits fields on commas so the results are garbage.

    Any suggestions????

     

     

     

     

  • Here's one way to get your data into the database.

    Create a format file that defines a row that is one field wide and is wider than the widest possible row in your input file.  Like so:


    8.0

    1

    1 SQLCHAR 0 255 "\r\n" 1 StringColumn ""


    (Don't forget that extra carriage return at the end of the format file.)  Notice that the field terminator is \r\n.  Create a base table that is the same size as the field in your input file, like so:

    CREATE TABLE dbo.TestColumns (StringColumn varchar(255))

    bcp, when it reads a format file, seems to think that the last field terminator in a row is also the row terminator.  Since you only have the one field, it thinks \r\n is also the row terminator, and :

    TRUNCATE TABLE dbo.TestColumns

    BULK INSERT dbo.TestColumns FROM '\\yourserver\DATA\testbcp.txt'

     WITH (FORMATFILE = '\\yourserver\DATA\testbcp.fmt')

    select * from dbo.TestColumns

    Now you have to SUBSTRING your way through the table you just loaded, which is more tedious than difficult.  Good luck with that.

     

    There is no "i" in team, but idiot has two.
  • Thanks that worked, but I was really hoping to get the columns parsed out on insert. I already have the code to parse the ##) from the front of each column too bad TSQL does not have a split function like basic!

  • search around this site.  There are several different variations on a split function.  Here is mine:

    CREATE FUNCTION FN_Split (@InDelimitedString varchar(8000),

                              @InDelimiter       varchar(10) )

    RETURNS @tblArray TABLE (ElementID  smallint        IDENTITY(1,1),

                             Element    varchar(1000)                )

    AS

    BEGIN

       DECLARE @StrPos            smallint,

               @StrStart          smallint,

               @DelimiterLength   smallint

       SET @DelimiterLength = LEN(@InDelimiter)

       --loop through source string and add elements to destination table array

       WHILE LEN(@InDelimitedString) > 0

       BEGIN

          SET @StrPos = CHARINDEX (@InDelimiter, @InDelimitedString)

          IF @StrPos = 0

          BEGIN

             INSERT INTO @tblArray VALUES(@InDelimitedString)

             BREAK

          END

          ELSE

          BEGIN

             INSERT INTO @tblArray VALUES(SUBSTRING(@InDelimitedString, 1,@StrPos - 1))

             SET @StrStart = @StrPos + @DelimiterLength

             -- Shift source string left

             SET @InDelimitedString = SUBSTRING(@InDelimitedString, @StrStart , LEN(@InDelimitedString) - @StrStart + 1)

          END

       END

       RETURN

    END

    GO

    cheers

    Wayne

     

  • I hope someone can give me a clue here. I'm very new to T-SQL, so this will probably be a stupid question, but I'm obviously missing something fundamental here - I see how this split function works, but I don't understand the method of getting the data into and out of it. After modifying the Bulk Insert function above, I now have the raw, unsplit text sitting in a column of one table and I want to split it into many columns in another table. How to I get each field from the source table into the split function, and then put the output into the fields of the destination table? I've tried passing the source field name to the function, but that gives me a syntax error. I'm guessing that's because that only works for scalar functions, but that's only a guess.

    I have no clue. Any help would be appreciated.


    Best Regards,

    Carl E. Campbell
    nyprehabmed.org

  • Probably you need to read some topics from "user-defined functions" in BOL, especially about table functions.

    Hope it will help.

    _____________
    Code for TallyGenerator

  • I'm afraid it doesn't. I'd already looked through the BOL extensively, but in all of the examples I could find, data was sent to multistatement table valued user-defined functions via an explicit string. That doesn't help me. I need to send all of the fields in a column.

    If I could figure out how to extract one field at a time and send it to the function, that would do as well, but I haven't been able to figure that out how to loop through every row in order. If it was an array it would be easy, but after the bulk insert I don't have an array, just a table. If I have to go to great lengths to put the data into an array before sending it to the function, then I shouldn't have bothered to use the bulk insert at all.


    Best Regards,

    Carl E. Campbell
    nyprehabmed.org

  • I think I may have found the missing link. Cursors. As I mentioned, I'm very new to TSQL and there's nothing quite like Cursors in any of the languages I've used before (COBOL, FORTRAN, PASCAL, VBA, etc.) I have a feeling that when I finish reading up on this concept, I'll be able to figure out how to make this work.

    Thanks for your help...


    Best Regards,

    Carl E. Campbell
    nyprehabmed.org

  • Once you've got your 1 column table to SQL, you can try a variation of this kind of thing (just try running it and you should get the idea )...

    Much simpler than using split function, and much better than using cursors (you'll soon pick up that that's frowned upon by many because it can be inefficient and there are usually better methods ).

     

    --This whole SQL script is safe to run

    --Create sample data

    DECLARE @SampleData TABLE (Id INT IDENTITY(1, 1), s VARCHAR(1000))

    INSERT INTO @SampleData

          SELECT '00001(1)123456(72)34442233(73)3(4)fffff'

    UNION SELECT '00002(1)3456(72)3233(73)2(5)kjkjl(1501)kjk'

    UNION SELECT '00003(72)0151 (1)             (73)2(3)001289 (4)B'

    --Create numbers table

    DECLARE @Numbers_0_to_100 TABLE (i TINYINT identity(0,1), j BIT)

    INSERT INTO @Numbers_0_to_100 SELECT TOP 101 NULL FROM master.dbo.syscolumns

    --Pick out data values

    DECLARE @ParsedData TABLE (Id INT, Field INT, Value VARCHAR(100))

    INSERT INTO @ParsedData

    SELECT

        Id,

        i AS Field,

        left(

                substring    (

                             s+'(',

                             patindex('%(' + cast(i as varchar(3)) + ')%', s) + 2 + len(i),

                             1000

                             ),

                patindex     (

                             '%(%',

                             substring(

                                        s+'(',

                                        patindex(

                                                    '%(' + cast(i as varchar(3)) + ')%',

                                                    s

                                                ) + 1,

                                        1000

                                     )

                             ) - 2 - len(i)

            ) AS Value

    FROM @SampleData

        INNER JOIN @Numbers_0_to_100 ON s LIKE '%(' + cast(i as varchar(3)) + ')%'

    SELECT * FROM @ParsedData

    --SELECT 'MIN(CASE WHEN Field = ' + cast(i as varchar(3)) + ' THEN Value ELSE NULL END) AS col' + REPLICATE('0', 3-LEN(i)) + cast(i as varchar(3)) + ',' FROM @Numbers_0_to_100

    --Transpose to desired format

    SELECT

        Id,

        MIN(CASE WHEN Field = 1 THEN Value ELSE NULL END) AS col001,

        MIN(CASE WHEN Field = 2 THEN Value ELSE NULL END) AS col002,

        MIN(CASE WHEN Field = 3 THEN Value ELSE NULL END) AS col003,

        MIN(CASE WHEN Field = 4 THEN Value ELSE NULL END) AS col004

        --etc

    FROM

        @ParsedData

    GROUP BY

        Id

     

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • It took me some time to figure out what you were doing there, but it looks like a very clever way to solve the original problem in this discussion. Unfortunately, I'm still missing some things.

    My data is very different from the original. Perhaps I should have started another discussion, but this one talked about the bulk insert which looked like it might help. The only real similarity between my data and his is that we were both trying to import variable length, pipe delimited data from a text file. My data is HL7, which looks like this:

    FHS|^~\&|PT^BILLING|WASH|OK||20050101||||||||||||||||||||||||
    BHS|^~\&|PT^BILLING|WASH|||20050101||BHS|||
    MSH|^~&\|||||200412310841||DFT^P03||P|2.1|
    EVN|P03|200412310841|
    PID|||4311440||RACER^SPEED||06/03/84|F||||||||||10504236|
    PV1|||MH02^M20X^01|
    FT1|||BHS|20041229||CH|4313|||0001|||0782|||||I||mag9090|||
    FT1|||BHS|20041229||CH|4314|||0001|||0782|||||I||mag9090|||
    MSH|^~&\|||||200412310842||DFT^P03||P|2.1|
    EVN|P03|200412310842|
    PID|||1501292||BANNER^RACE||04/05/58|M||||||||||10537294|
    PV1|||M7HS^7221^01|
    FT1|||BHS|20041230||CH|4312|||0001|||0782|||||I||mag9090|||
    FT1|||BHS|20041230||CH|4313|||0001|||0782|||||I||mag9090|||
     

    So bulk inserting into a single field is the only way I saw that would get the data in there to begin with, then I can parse out the individual fields. The above code looks like a very nice way to use substring functions to do that, but I still have a quandry. It doesn't seem to allow me to bulk insert into a table variable. I can do it into a regular table, but that doesn't allow me to have an identity variable (that I can figure out) which seems to be integral to your code. I may be missing something, as there's a lot of code that is specifilally designed to deal with the file format in the original problem.


    Best Regards,

    Carl E. Campbell
    nyprehabmed.org

  • Hi Carl,

    I'm not sure I realised before that you weren't the original poster

    I'm not that familiar with BULK INSERT, but I just looked it up, had a play, and managed to insert into a regular table with an identity field. I think that's all you need, isn't it?

    The SQL I used (I didn't use 'mypath', of course):

    CREATE TABLE tblTest (Id INT IDENTITY(1, 1), VALUE VARCHAR(8000))

    BULK INSERT tblTest FROM '\\mypath\reportname.txt'

    WITH

        (FORMATFILE = '\\mypath\formatfile.fmt')

    The format file I used:

    8.0

    1

    1 SQLCHAR 0 8000 "\r\n" 2 VALUE SQL_Latin1_General_Cp437_BIN

    Once you've got this, you can follow it up with the other code I posted, and Robert's your mother's brother.

     

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Yes, that worked fine. And it looks (almost) exactly like what I'd tried earlier. I can't imagine what I did wrong, but... oh well.

    Anyway, it looks like I just have to fool with the substring clauses to make it look for pipes instead of all of that other stuff. Since I'm learning all of this "on the fly" so to speak, it's a bit of a shlog, but I'm certainly learning a lot.


    Best Regards,

    Carl E. Campbell
    nyprehabmed.org

  • Thanks for the update, and thanks for using the work "shlog". Made me smile

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply