Large text file uploads

  • I've gone to using text files to load my data mainly because I'm loading a single data point per record (data warehouse style), but:

    I made a text load file, and it's 50megs.  It contains about 950,000 records to upload.  This thing is taking forever.  After about 14 hours, it has loaded 250,000 rows.

    Now I've loaded 40,000 row excel files, and the longest it ever took was maybe two minutes.

    My theory as to why the time difference is that to load the text file, the entire file must be gone through character by character, while loaded into memory, but SQL must use the jet engine to load from excel, or somehow it's just opening the excel file one row at a time?

    So why is the text file load taking so much longer?  About 5 minutes per 1k of records instead of a few seconds per?

  • Hi,

    It shouldn't take that long. I import csv files into our datawarehouse with a size of about 5GB, and it takes a couple of minutes.

    You need to look at what you are doing with the data as it is moved into the database. What type of transformation do you use? Also, do you have a lot of indexes in the destination table. By using a staging table without any indexes, you would speed up the process a lot if you import the data "as is" and then tranform it from your staging table into its final location.

    Martin.

  • I have quite a few jobs that import fixed field, flat file text data from legacy mainframe systems into tables. My top performer loads 9 fields of parcel information that includes owner name, address, legal description, etc. This dts package averages 82,018 rows per second and has peaked at 100,867 rows per second. The steps I use in the job are these:

    1. backup transaction log

    2. change recovery to Bulk_Logged

    3. drop the table

    4. recreate the table

    5. load the data with dts

    6. create 3 indexes

    7. load another flat file into a different table

    8. change recovery to Full

    9. backup transaction log

    The file I load averages about 15MB with 120,000 rows. The entire job, including ftp of 2 files from the mainframe averages 1 minute 22 seconds.

     

     

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply