bulk insert text file into two tables

  • I have a text file and I want to use bulk insert, I have header records and detail records, how can I make a difference to insert in boot tables;the text file can have lot a heaaders and details, "^" its then char delimiter; Whic its the better way to do it?

    example text file

    H^MP^319267228^17-FEB-06^^^^GXB -- its header row invoices

    D^GXB10269^US^23^EA^^^^.45632^^MX^^^^-- has two details inv_detail

    D^GXB10269^US^23^EA^^^^.45632^^MX^^^^

    H^MP^319310332^27-FEB-06^^^^GBX -- an another header row

    D^GBX10269^US^4^EA^^^^.07936^^MX^^^^ -- detail

  • Brenda,

    Sorry for the delay... it's a tough question and deserves a good answer.  Undoubtably, you've had a thousand people tell you that this can't or shouldn't be done in SQL.  This is why folks like XML so much... handles header/detail so very easily.  That would be the subject of another post but I've found that you can do just about everything you need, so far as data manipulation in SQL, and this is no exception.

    Neither BCP nor Bulk Insert deal very well with mixed records when the records have different number of delimiters for each record type.  SQL Server also doesn't have a record "splitter" so we'll need to make one.  To make a record splitter, we also need a "Tally" table which is nothing more than a relatively short table with a single column of sequential numbers.

    So here's what we'll do...

    1. Create a "Tally" table. (ONE TIME CODE. Don't include in stored proc)
    2. Create a record splitter based on the Tally table. (ONE TIME CODE. Don't include in stored proc)
    3. Import all of the records into a "flat" staging table. (Include in stored proc)
    4. Parse the Header records into a Header table. (Include in stored proc)
    5. Parse the Detail records into a Detail table. (Include in stored proc)

    Item 1, create the Tally table.  Here's the code and it runs very quickly.  If it doesn't run nearly instantaneously, something is wrong and you need to stop the query.  I make it go from 1 to 9999 even though VARCHAR (things to be split) has a limit of 8000 for other uses like generating 4 digit random numbers and the like.

    --===== Create a Tally table of numbers

     SELECT TOP 9999

            IDENTITY(INT,1,1) AS N

       INTO dbo.Tally

       FROM Master.dbo.SysColumns sc1,

            Master.dbo.SysColumns sc2

    --===== Add a clustered primary key for speed

      ALTER TABLE dbo.Tally

            ADD CONSTRAINT PK_Tally_N

                PRIMARY KEY CLUSTERED (N)

    --===== Grant access to the table

      GRANT SELECT ON dbo.Tally TO PUBLIC

    That wasn't so painful.  Notice that the cross-join in the code was intentional... saves a lot of time and code doing it that way.  The code makes a "basis" for making larger temporary tables to solve all sorts of other problems including finding missing dates, etc.  It'll make a million row table in about 4 seconds.  9999 has about 27 years worth of days, though, and most people don't need more than that.  Again, possible subjects of other posts.

    Item 2, Create a record splitter.  This is in the form of a convenient reusable function.  There are faster methods but they take way more code... sometimes convenience, readability of code, and consistency of use make funtions the way to go even though they are sometimes slower than dedicated code.  Again, subject of another post.  It would take too long to explain what's going on in this puppy so I'll leave the analysis of the code up to you.  Here's the fully documented production code I use (and it's the ONLY place where I have to use a (yech!) table variable)...

    CREATE FUNCTION dbo.fSplit

    /****************************************************************************************

     Purpose:

     This function splits names and strings of words based on a  delimiter of up to 255

     characters.

     Notes:

     1. Max length of string that can be split is 8000-(Length of delimiter X 2)

     2. Designed for use on MS-SQL Server 2000 and MSDE 2000.

     3. Will error under the following conditions:

        a. Length of delimiter <1 or >255

        b. Length of string + (Length of delimiter X 2) > 8000

     4. Unexpected results will occur if a leading or trailing delimiter is included in the

        string to be parsed

     5. Returns a blank (space) when multiple delimiters are encountered (ie. 'Jeff,,Moden'

        would return a blank as the 2nd word).

     Usage:

     dbo.fSplit(StringToBeSplit,SegmentToReturn,DelimiterString)

    -----------------------------------------------------------------------------------------

     Revisions:

     Rev 00 - 06/05/2005 - Jeff Moden - Initial creation and test (single space delimiter)

                                 EMAIL: jbmoden@ameritech.net (has nasty spam filters)

     Rev 01 - 06/06/2005 - Jeff Moden - Add delimiter character as an input

     Rev 02 - 08/17/2005 - Jeff Moden - Increase possible size of delimiter to 10 characters

     Rev 03 - 09/03/2005 - Jeff Moden - Increase possible size of delimiter to 255 characters

    ****************************************************************************************/

    --===== Define the input parameters

           (

            @pString VARCHAR(8000), --String to be split

            @pWordNum SMALLINT,     --Number of the word to return

            @pDelim VARCHAR(255)    --Delimiter to base the split on (max is 255 or BOOM!)

            )

    --===== Define the return type

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    --======================================================================================

    --      Body of function

    --======================================================================================

    --===== Declare the local variables

    DECLARE @LenDelim TINYINT --Holds length of delimiter (max is 255 or BOOM!)

    DECLARE @Words TABLE

            (

            WordNum INT IDENTITY(1,1), --The number of the word that has been split out

            Word    VARCHAR(8000)      --The word that has been split out

            )

    --===== Determine length of delimiter.  Had to do this way because spaces have 0 for a

         -- value of LEN

        SET @LENDelim = LEN('|'+@pDelim+'|')-2

    --===== Prepare the string to be split for parsing by adding a delimiter to each end

        SET @pString = @pDelim+@pString+@pDelim

    --===== Do the parsing and store the parsed words in the table variable

     INSERT INTO @Words (Word)

     SELECT SUBSTRING(@pString,    --What to split

                      N+@LenDelim, --Where to start

                      CHARINDEX(@pDelim,@pString,N+@LenDelim)-@LenDelim-N --Length

                     )

       FROM dbo.Tally

      WHERE SUBSTRING(@pString,N,@LenDelim) = @pDelim --Finds leading delimiters

        AND N < LEN(@pString)-@LenDelim               --except the last 1

      ORDER BY N --Force the order of words or segments to be correct

    --===== Return the word indicated by the input parameter

     RETURN (SELECT Word

               FROM @Words

              WHERE WordNum = @pWordNum)

    --======================================================================================

    --      End of function

    --======================================================================================

    END

    GO

    Item 3, Import all the records into a "Flat" staging table.  This is where most folks raise an eyebrow at me.  Why a "flat" single column table and then parse the records?  Because it's (1) nasty fast, (2) very easy to write and maintain, (3) it allows me to do almost everything set based.  This Bulk Insert will import a million records in just a couple of seconds.  I use a temp table as the staging table just because it's my nature and my DBA's have made TempDB nice and large just to handle such things (another possibile subject for another post).  You could certainly use a permanent table.  DO NOT use a table variable for this... table variables do not use statistics which makes them damned slow for large record sets.  All the other bad rumors you heard about the use of Temp tables are simply not true.  Don't take my word for it, though... see the following URL...

    http://support.microsoft.com/default.aspx?scid=kb;en-us;305977&Product=sql2k

    In order for this to work, we have to have an IDENTITY column on the Staging table to preserve the order that the records are inserted.  In order to do that, BULK INSERT is going to need a "BCP FORMAT" file so we can ignore the identity column.  Here's what the file should look like...

    8.0

    2

    1    SQLCHAR 0 0   ""    0    IgnoreMe ""

    2    SQLCHAR 0 400 "\n"  2    RawData  ""

    ... and it should probably be stored in the same directory as your raw import files...

    ...and here's the code for Item 3...

    --===== If the temporary "staging" table already exists, drop it

         IF OBJECT_ID('TempDB..#Staging') IS NOT NULL

            DROP TABLE #Staging

    --===== Create a temporary "staging" table to hold the flat import

     CREATE TABLE #Staging

            (

            RowNum  INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

            RawData VARCHAR(400)

            )

    --===== Import the raw data from the file into the staging table

       BULK INSERT #Staging

       FROM 'C:\Temp\BulkInsertTest.txt' --<< CHANGE THIS TO MATCH YOUR SITUATION!

       WITH (

            DATAFILETYPE = 'CHAR',

            FORMATFILE   = 'C:\Temp\BulkInsertTest.fmt', --<< CHANGE THIS TO MATCH YOUR SITUATION!

            BATCHSIZE    = 50000

            )

    Items 4 and 5 are nearly identical.  Since I don't know what your Header or Detail tables look like, I made my own for this little experiment just to make sure the code works correctly... this code is NOT part of anything you'd need to do as your tables already exist...

    --===== Create a temp table to hold header info for this experiment

         -- Added a HeaderID column to ensure uniqueness of records

         -- Does not include the "H" record type

         IF OBJECT_ID('TempDB..#Header') IS NOT NULL

            DROP TABLE #Header

     CREATE TABLE #Header

            (

            HeaderID   INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,

            HdrCol02   VARCHAR(10),

            PhoneNum   CHAR(10), --A guess on my part

            XactDate   DATETIME, --A guess on my part

            HdrCol05   VARCHAR(10),

            HdrCol06   VARCHAR(10),

            HdrCol07   VARCHAR(10),

            HdrCol08   VARCHAR(10)

            )

    --===== Create a temp table to hold detail info for this experiment

         -- Added a DetailID column to ensure uniqueness of records

         -- Does not include the "D" record type

         IF OBJECT_ID('TempDB..#Detail') IS NOT NULL

            DROP TABLE #Detail

     CREATE TABLE #Detail

            (

            DetailID    INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,

            ServiceType VARCHAR(10),   --Guess on my part

            Country     CHAR(2),       --Guess on my part

            Minutes     INT,           --Guess on my part

            TimeZone    VARCHAR(10),   --Guess on my part

            DtlCol06    VARCHAR(10),

            DtlCol07    VARCHAR(10),

            DtlCol08    VARCHAR(10),

            Cost        DECIMAL(10,5), --Guess on my part

            DtlCol10    VARCHAR(10),

            DtlCol11    VARCHAR(10),

            DtlCol12    VARCHAR(10),

            DtlCol13    VARCHAR(10),

            DtlCol14    VARCHAR(10),

            DtlCol15    VARCHAR(10),

            HeaderID    INT            --<<Looky, looky! MAKE SURE YOU HAVE THIS!!

            )

    Of course, your's is probably different... just bear with me for the experiment...

    The following code is for Items 4 and 5 which does the actual import... the big problem to overcome with the data you posted is that we have to have some way to identify which detail records go with which header records once they're parsed and packed away in a table.  That means we have to find each header record, post it to the header table, get the HeaderID we just posted, find all the detail records for that header, and post the detail records with the HeaderID... the following code does ALL of that... of course, you would need to change some column names and table names to match your Header and Detail tables...

    --=====Declare local variables

    DECLARE @CurrentHeader INT

    DECLARE @NextHeader INT

    DECLARE @MaxDetail INT

    DECLARE @HeaderID INT

    --===== Suppress the auto-display of rowcounts for appearance and speed

        SET NOCOUNT ON

    --===== Find the max detail record RowNum

     SELECT @MaxDetail = MAX(RowNum)

       FROM #Staging

      WHERE dbo.fnSplit(RawData,1,'^') = 'D'

    --===== Find the first header

        SET @CurrentHeader = 0

     SELECT @NextHeader = MIN(RowNum)

       FROM #Staging

      WHERE dbo.fnSplit(RawData,1,'^') = 'H'

        AND RowNum > @CurrentHeader

    --===== Parse the data by finding each header and the

         -- detail that goes with it

      WHILE @NextHeader >0

      BEGIN

            --===== Make the previous next header, the current header

                SET @CurrentHeader = @NextHeader

            --===== Parse the header info into the Header table

             INSERT INTO #Header

                    (HdrCol02,PhoneNum,XactDate,HdrCol05,HdrCol06,HdrCol07,HdrCol08)

             SELECT NULLIF(dbo.fnSplit(RawData,2,'^'),'') AS HdrCol02,

                    NULLIF(dbo.fnSplit(RawData,3,'^'),'') AS PhoneNum,

                    NULLIF(dbo.fnSplit(RawData,4,'^'),'') AS XactDate,

                    NULLIF(dbo.fnSplit(RawData,5,'^'),'') AS HdrCol05,

                    NULLIF(dbo.fnSplit(RawData,6,'^'),'') AS HdrCol06,

                    NULLIF(dbo.fnSplit(RawData,7,'^'),'') AS HdrCol07,

                    NULLIF(dbo.fnSplit(RawData,8,'^'),'') AS HdrCol08

               FROM #Staging

              WHERE RowNum = @CurrentHeader

           

            --===== Remember the ID of the header record we just inserted

                SET @HeaderID = SCOPE_IDENTITY()

           

            --===== Find the next header

                        SET @NextHeader = NULL --Forces a null if we don't find anything left

                     SELECT @NextHeader = MIN(RowNum)

                       FROM #Staging

                      WHERE dbo.fnSplit(RawData,1,'^') = 'H'

                        AND RowNum > @CurrentHeader

           

            --===== Parse and insert the detail records for the current header

                 -- including the ID of the the related header record

             INSERT INTO #Detail

                    (

                    ServiceType, Country,  Minutes,  TimeZone, DtlCol06,

                    DtlCol07,    DtlCol08, Cost,     DtlCol10, DtlCol11,

                    DtlCol12,    DtlCol13, DtlCol14, DtlCol15, HeaderID

                    )

             SELECT NULLIF(dbo.fnSplit(RawData, 2,'^'),'') AS ServiceType,

                    NULLIF(dbo.fnSplit(RawData, 3,'^'),'') AS Country,

                    NULLIF(dbo.fnSplit(RawData, 4,'^'),'') AS Minutes,

                    NULLIF(dbo.fnSplit(RawData, 5,'^'),'') AS TimeZone,

                    NULLIF(dbo.fnSplit(RawData, 6,'^'),'') AS DtlCol06,

                    NULLIF(dbo.fnSplit(RawData, 7,'^'),'') AS DtlCol07,

                    NULLIF(dbo.fnSplit(RawData, 8,'^'),'') AS DtlCol08,

                    NULLIF(dbo.fnSplit(RawData, 9,'^'),'') AS Cost,

                    NULLIF(dbo.fnSplit(RawData,10,'^'),'') AS DtlCol10,

                    NULLIF(dbo.fnSplit(RawData,11,'^'),'') AS DtlCol11,

                    NULLIF(dbo.fnSplit(RawData,12,'^'),'') AS DtlCol12,

                    NULLIF(dbo.fnSplit(RawData,13,'^'),'') AS DtlCol13,

                    NULLIF(dbo.fnSplit(RawData,14,'^'),'') AS DtlCol14,

                    NULLIF(dbo.fnSplit(RawData,15,'^'),'') AS DtlCol15,

                    @HeaderID AS HeaderID

               FROM #Staging

              WHERE RowNum BETWEEN @CurrentHeader+1

                AND ISNULL(@NextHeader-1,@MaxDetail)

        END --End of While loop

    ...And now you know why some folks spend lots of money on 3rd party solutions to do this type of stuff.  Some also use DTS (comes with SQL Server) only to turn around and write some VBScript to do most of what was done in the code above.  Others spend pot wads of money on genereal purpose ETL (Extract, Translate, Load) programs and they really don't run any faster.  They are, however a little easier to use for non-SQL-Developers.

    Any questions?

    Send beer, I already have enough pretzels

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ya know, Brenda... a little feed back on this extent of this type of help would have been appreciated.  At least let me know how it worked out for you

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, that answer was detailed enough that I'll buy you a beer, if you're ever in the Atlanta area, even though it wasn't my question.

  • Heh... Thanks David...

    I've been doing a lot of BCP work lately and ran into this ol' post of mine... Yeah, I know it's a forum and we shouldn't expect it... but it just struck me as a bit wrong that folks cough up detailed answers like this and never even get acknowledgment that the orginal poster even read it.

    Thanks, again.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply