insert in batches

  • Can someone help me doing insert as select * into tab1 in a batches. I have almost billion records to insert and i think inserting in batches would be best way.

  • also the source table is a view and there wont be any ID column for source.

  • Create SSIS package.

    OR

    INSERT ... SELECT * FROM OPENROWSET(BULK...) statement

    http://msdn.microsoft.com/en-us/library/ms175915.aspx

  • DEv, could you please tell what components i have to use for this simple process.

  • Tara-1044200 (11/19/2011)


    DEv, could you please tell what components i have to use for this simple process.

    I am not sure for SSIS but I can help you in starting with 2nd option.

    OPENROWSET

    ( { 'provider_name' , { 'datasource' ; 'user_id' ; 'password'

    | 'provider_string' }

    , { [ catalog. ] [ schema. ] object

    | 'query'

    }

    | BULK 'data_file' ,

    { FORMATFILE = 'format_file_path' [ <bulk_options> ]

    | SINGLE_BLOB | SINGLE_CLOB | SINGLE_NCLOB }

    } )

    <bulk_options> ::=

    [ , CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' } ]

    [ , ERRORFILE = 'file_name' ]

    [ , FIRSTROW = first_row ]

    [ , LASTROW = last_row ]

    [ , MAXERRORS = maximum_errors ]

    [ , ROWS_PER_BATCH = rows_per_batch ]

    [ , ORDER ( { column [ ASC | DESC ] } [ ,...n ] ) [ UNIQUE ]

    Example:

    SELECT a.*

    FROM OPENROWSET('SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;',

    'SELECT GroupName, Name, DepartmentID

    FROM AdventureWorks2008R2.HumanResources.Department

    ORDER BY GroupName, Name') AS a;

    You need to add INSERT INTO clause & ROWS_PER_BATCH option in OPENROWSET to make it workable for you.

  • but i am loading data from a view to a table with in same database.

  • Tara-1044200 (11/19/2011)


    but i am loading data from a view to a table with in same database.

    I didn’t get you.

    Is it stopping you from using OPENROWSET?

    Or

    Your question is “Why should I use OPENROWSET in the same DB"?

  • yes because source and destination are in the same database is it still ok?

    also is there a way to use full cpu power to this inserts faster, may be increasing the threads. how do i do?

  • yes because source and destination are in the same database is it still ok?

    It should not be a problem but I am not sure. I don't see any harm in trying it.

    also is there a way to use full cpu power to this inserts faster, may be increasing the threads. how do i do?

    Read it carefully before you choose to implement any.

    The OPENROWSET function can accept the following table hints, which provide bulk-load optimizations with the INSERT statement:

    •The TABLOCK hint can minimize the number of log records for the insert operation. The recovery model of the database must be set to simple or bulk-logged and the target table cannot be used in replication. For more information, see Prerequisites for Minimal Logging in Bulk Import.

    •The IGNORE_CONSTRAINTS hint can temporarily disable FOREIGN KEY and CHECK constraint checking.

    •The IGNORE_TRIGGERS hint can temporarily disable trigger execution.

    •The KEEPDEFAULTS hint allows the insertion of a table column's default value, if any, instead of NULL when the data record lacks a value for the column.

    •The KEEPIDENTITY hint allows the identity values in the imported data file to be used for the identity column in the target table.

  • SSIS would be a very simple setup:

    You would need a data flow task. in the data flow, you will have an OLEDB Source and an OLEDB destination.

    For the source, you specify the view - for the destintation you specify the table (use the fast load option).

    Set the max commit and batch size parameters. I would recommend starting at 200,000 and monitoring the log usage and memory usage to see if you can increase that.

    Make sure you do this from the server and not your desktop. If you run this on your desktop, the data has to be sent across the wire to your desktop and back up to the server which is a lot of network traffic and will slow it down tremendously.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • SSIS would be a very simple setup:

    And this is why I don't like it. It stops the learning curve for DB Developer.

    The ONLY argument in favour of SSIS is "Why to re-invent the wheel?" 😀

  • Dev (11/19/2011)


    SSIS would be a very simple setup:

    And this is why I don't like it. It stops the learning curve for DB Developer.

    The ONLY argument in favour of SSIS is "Why to re-invent the wheel?" 😀

    With that fine thought in mind, the OPENROWSET code you wrote still doesn't meet the OP's requirement of doing it in batches. Let's see your example of using OPENROWSET for loading the data from the view to the table in batches of 200,000. 😉

    And don't forget that OPENROWSET isn't enabled by default. In order to use it, you must enable "Ad Hoc Distributed Queries" in the system configuration. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/19/2011)


    Dev (11/19/2011)


    SSIS would be a very simple setup:

    And this is why I don't like it. It stops the learning curve for DB Developer.

    The ONLY argument in favour of SSIS is "Why to re-invent the wheel?" 😀

    With that fine thought in mind, the OPENROWSET code you wrote still doesn't meet the OP's requirement of doing it in batches. Let's see your example of using OPENROWSET for loading the data from the view to the table in batches of 200,000. 😉

    And don't forget that OPENROWSET isn't enabled by default. In order to use it, you must enable "Ad Hoc Distributed Queries" in the system configuration. 😉

    Thanks Jeff! It slipped from my mind that BULK can be used with data files only. I also verified it in BULK INSERT & BCP. I don't believe it can only be done with SSIS but I don't find an easy answer in T-SQL except cursor / loops.

  • Tara-1044200 (11/18/2011)


    Can someone help me doing insert as select * into tab1 in a batches. I have almost billion records to insert and i think inserting in batches would be best way.

    Now that we've determined that SSIS is a pretty easy way to do this, I have to ask... why is it necessary to create a billion row table from a view (which will double the required storage) which also means that the data already exists somewhere?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • By "inserting in batches" do you mean breaking up the insert into batches? Such as doing 10,000 rows at a time instead of doing all 1 billion rows at once?

    If so, you can use a WHILE loop:

    declare @SourceRowCount int

    select @SourceRowCount = COUNT(*) FROM source_table

    WHILE ((SELECT COUNT(*) FROM destination_table) < @SourceRowCount)

    BEGIN

    INSERT INTO destinationtable

    SELECT TOP 10000 column_list FROM source_table

    END

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply