help importing large flat file into relational tables

  • I am trying to find the best (fastest) way to import large text files into sql server 2000 relational tables.  The database has simple recovery model.  The total size of data to be load is about 150gig

    source:

    Multiple source files with same layout.  year1.txt, year2.txt, year15.txt etc.  files are separate to keep size small.  Each file has about 6 million rows.

    .txt file has 30 columns delimited by "|".

    Destination:

    2 sql server tables; a parent table with a ID (IDENTITY) primary key and a 3 column unique index (alternate key) and a detail table that has a DETAIL_ID (IDENTITY) primary key and a foreign key ID to the parent table.  The detail table has both char and decimal columns.  both destination tables are empty.

    So, for each source file, I need to convert some columns to decimal and separate the data into parent and detail tables.

    Here is one way to do this:

    For each source file

    • use dts bulk import into working sql table (no indexes) with every column char data type and do no dts conversions
    • create index on 3 columns that match the parent table
    • run insert query into parent table by selecting distinct 3 columns
    • run insert query to join parent table with working table and insert into detail table.  The query would include the primary key (identity) from parent table and convert some char columns to decimal.
    • clean up sql working table by truncating and removing index.
    • process next source file

    To make it fast, I want to minimize database logging and datatype conversions.  This is why I drop and create indexs separate from the data import.

    Would it be faster to use dts transformations to convert the source data char columns to decimal while it is importing to the working sql table instead of using sql after the import to convert the datatypes?  Im guessing since dts transforms on a row-by-row basis, it would take longer than doing a single sql insert statement with conversions.

    Is there a better way to do this in dts so I can go from the single source file to 2 related sql server tables and do data conversion all inside DTS? 

    Speed is the primary concern.  I tested loading one source file of 6 million rows to sql working table and querying into 2 related tables and it took about 40 minutes.  Is this a reasonable amount of time for this task?  The server has 4 processors and 2 gig of ram on a raid 5 controller.

    As I process these loads, the 2 related tables will get larger and larger; which will probably slow down the final insert into detail table.

    Thank you for any suggestions.

     

     

     

     

     

     

     

     

     

     

     

  • This was removed by the editor as SPAM

  • As a rule of thumb, you appear to be doing the right thing.

    Always bulk insert text files into an import and then use a stored procedure to shuffle the data across to the target table. If you add a row by row conversion into the package it will be very slow.

    As for dropping the indexes, I have rarely found this useful. It often takes longer to rebuild them at the end than to accept inserts while they are still alive. I'd try it with the indexes left on.

    To minimise logging you can set the db to simple mode before the insert and back to full after. But, of course you lose the ability to restore.

    The area where you can save time is the target. If you are only adding new records, rather than clearing the target, it would be better to see if the record exists and only add the new records.

    If you are using amended rows see if you can flag them by using an insert/update date if it's in the text file.

    I suppose what I am trying to say is to try out each of these methods and benchmark them against each other to find the fastest method.


    ------------------------------
    The Users are always right - when I'm not wrong!

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply