Insert 8 million records from text file to a table

  • Hi !!

    I have a text file with records like this:

    05-06-07-08-09-10

    04-06-07-08-09-10

    04-05-07-08-09-10

    There would be around 8M lines like this.

    I tried using

    BULK INSERT DB_Name.Table_Name

    FROM 'D:\Sample.txt'

    WITH

    (

    ROWTERMINATOR = '|\n'

    )

    where Sample.txt contains the 8M records (and its about 150MB) and Table_Name has only one field

    It took a long time and didn't insert anything, where as the transaction log file increased to 1.5 GB

    Is there any other option to insert the records in bulk, in the table via SQL server instead of inserting it line by line?

    Thanks and Regards

    Edward Anil Joseph

  • use BCP command.

    For more information see SQL BOL.

    Kishore

  • use the import/export wizard


    Everything you can imagine is real.

  • Hi !!

    Thank you for the inputs..

    I had downsized the files for checking.

    File Details:

    3,838,380 rows

    Size: 71,220 KB

    Using BCP

    *****************

    First Run with Batch size = 1000

    3838380 rows copied.

    Network packet size (bytes): 4096

    Clock Time (ms.): total 93969

    Second run with Batch Size = 100000

    3838380 rows copied.

    Network packet size (bytes): 4096

    Clock Time (ms.): total 69953

    Disadvantage was the file CPU usage increased to 90% for BCP (like it happened for BULK INSERT)

    Using Import Wizard:

    ***************************

    To import the same number of rows it took about 02 min and 06 seconds.

    Though it took more time, the CPU usage was normal.

    Is there any other option via Code ?

    I was thinking of the option of passing the values as Text Parameter and inserting the records in batches, as mentioned in this link:

    http://www.sqlservercentral.com/columnists/lPeysakhovich/manipulatingdataintexttypecolumns.asp

    I did try it, but it was too slow and the CPU usage was high too.

    Could it be configured to use it in batches (split 8 Million records into 250K records) and insert the data ?

    Would there be any memory problems because of using it via code?

    Thanks again for your inputs..

    Regards

    Edward Anil Joseph

  • BULK INSERT is the fastest option for importing data into SQL Server. It has most of the options that are available for BCP.

    --------------------
    Colt 45 - the original point and click interface

  • Why do you have the pipe [|] character as the terminator?  You don't have that in the sample data.

    Also, are you intending that the dashes [-] serve as delimiters?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Have you considered setting the 'table lock on bulk load' option for the table? 

    exec sp_tableoption <tablename>, 'table lock on bulk load', 'ON'

    Have a look at BOL - it suggests that setting this option will create a table lock when bulk loading data rather than a lock for each row.  I found this option a few years ago when I had to load hundreds of millions of rows into tables.

    J

  • Sorry.. The pipe delimiter was a mistake.

    And thanks for the knowledge on "table lock on bulk load" option.

    Will try it and see.

    Thank you..

  • The BULK INSERT has a table lock option. No need to fiddle with table options which tend to be forgotten once set

    http://msdn2.microsoft.com/en-us/library/aa225968(SQL.80).aspx

    From the above page,

    "TABLOCK

    Specifies that a table-level lock is acquired for the duration of the bulk copy operation. A table can be loaded concurrently by multiple clients if the table has no indexes and TABLOCK is specified. By default, locking behavior is determined by the table option table lock on bulk load. Holding a lock only for the duration of the bulk copy operation reduces lock contention on the table, significantly improving performance."

    --------------------
    Colt 45 - the original point and click interface

  • You live and learn.

    I used the table option because I was bulk loading from an external application (SAS) and I couldn't see any way from the external app to lock the table.

    J

  • You didn't answer my other question... are the dashes supposed to be delimiters?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No. The dashes aren't supposed to be delimiters.

    There is only one column, and it stored the record as

    05-06-07-08-09-10

    04-06-07-08-09-10

    etc,

    in each row.

  • For those that want to play...  

    --================================================================================================
    --      Generate an 8 million row test file. This is NOT part of the solution.  It is only for
    --      test purposes just in case you don't happen to have an 8 million row test table just
    --      hanging around. If you don't have a "C:\Temp" directory on your server, either make one,
    --      or change the "C:\Temp\" to something more of your liking.  You must use a UNC if you 
    --      don't want the file to appear on your server.
    --
    --      Also, you must change "server\instance_name" to the name of the server you are running this
    --      from.
    --
    --      Generation of this file takes about 3 minutes and 34 seconds (includes 1 line row at end)
    --================================================================================================
    DECLARE @Query VARCHAR(8000)
        SET @Query = 
    '"SET NOCOUNT ON '
    +'SELECT TOP 8000000 '
    +        '''05-06-07-08-09-10'' AS SomeData '
    +   'FROM Master.dbo.SysColumns sc1, '
    +        'Master.dbo.SysColumns sc2"'
    PRINT @Query
    DECLARE @Cmd VARCHAR(8000)
        SET @Cmd = 'OSQL -S "JModen2\Hammer4"  -E -h-1 -s"" -Q ' + @Query + ' > C:\Temp\jbmTest.txt'
    EXEC Master.dbo.xp_CmdShell @Cmd
    --================================================================================================
    --      This "import" imports all 8 million rows in about...
    --        24 seconds if Recovery mode is "SIMPLE" w/ about 3 mb log usage
    --        24 seconds if Recovery mode is "Bulk-logged" w/ about 3 mb log usage
    --        25 seconds if Recovery mode is "Full" w/ about 239 mb log usage
    --      Do read the comments...
    --================================================================================================
    --DROP TABLE jbmTest
     CREATE TABLE jbmTest (SomeData VARCHAR(20))
        SET NOCOUNT ON
       BULK INSERT jbmTest FROM 'C:\Temp\jbmTest.txt'  
       WITH (--BATCHSIZE     = 20000, --all this does is slow things down. Log file still grows as above.
             DATAFILETYPE  = 'CHAR',
    --       FIRSTROW      = put line number of first row after header here, if there is one and uncomment,
             ROWTERMINATOR = '\n',
             TABLOCK )
    

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks, Edward.  Take a look at the above, if you want.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff !!!

    Thank you..

    I guess the keyword I had missed was "SET NOCOUNT ON"

    I guess that might be the reason, why it was slow.

    Also, I had set the db in simple recovery mode and it used up only 20 seconds as you have mentioned in your example.

    Thanks again !!

    Regards

    Edward Anil Joseph

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply