insert in batches

Question

Post reply

insert in batches

Tara-1044200

SSCoach

Points: 15951
More actions
November 18, 2011 at 6:18 pm

#250780

Can someone help me doing insert as select * into tab1 in a batches. I have almost billion records to insert and i think inserting in batches would be best way.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Tara-1044200 SSCoach Points: 15951 More actions · Answer 1

also the source table is a view and there wont be any ID column for source.

Devendrakumar SSC-Forever Points: 42493 More actions · Answer 2

Create SSIS package.

OR

INSERT ... SELECT * FROM OPENROWSET(BULK...) statement

http://msdn.microsoft.com/en-us/library/ms175915.aspx

Tara-1044200 SSCoach Points: 15951 More actions · Answer 3

DEv, could you please tell what components i have to use for this simple process.

Devendrakumar SSC-Forever Points: 42493 More actions · Answer 4

Tara-1044200 (11/19/2011)
DEv, could you please tell what components i have to use for this simple process.

I am not sure for SSIS but I can help you in starting with 2nd option.

OPENROWSET

( { 'provider_name' , { 'datasource' ; 'user_id' ; 'password'

| 'provider_string' }

, { [ catalog. ] [ schema. ] object

| 'query'

}

| BULK 'data_file' ,

{ FORMATFILE = 'format_file_path' [ <bulk_options> ]

| SINGLE_BLOB | SINGLE_CLOB | SINGLE_NCLOB }

} )

<bulk_options> ::=

[ , CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' } ]

[ , ERRORFILE = 'file_name' ]

[ , FIRSTROW = first_row ]

[ , LASTROW = last_row ]

[ , MAXERRORS = maximum_errors ]

[ , ROWS_PER_BATCH = rows_per_batch ]

[ , ORDER ( { column [ ASC | DESC ] } [ ,...n ] ) [ UNIQUE ]

Example:

SELECT a.*

FROM OPENROWSET('SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;',

'SELECT GroupName, Name, DepartmentID

FROM AdventureWorks2008R2.HumanResources.Department

ORDER BY GroupName, Name') AS a;

You need to add INSERT INTO clause & ROWS_PER_BATCH option in OPENROWSET to make it workable for you.

Tara-1044200 SSCoach Points: 15951 More actions · Answer 5

but i am loading data from a view to a table with in same database.

Devendrakumar SSC-Forever Points: 42493 More actions · Answer 6

Tara-1044200 (11/19/2011)
but i am loading data from a view to a table with in same database.

I didn’t get you.

Is it stopping you from using OPENROWSET?

Or

Your question is “Why should I use OPENROWSET in the same DB"?

Tara-1044200 SSCoach Points: 15951 More actions · Answer 7

yes because source and destination are in the same database is it still ok?

also is there a way to use full cpu power to this inserts faster, may be increasing the threads. how do i do?

Devendrakumar SSC-Forever Points: 42493 More actions · Answer 8

yes because source and destination are in the same database is it still ok?

It should not be a problem but I am not sure. I don't see any harm in trying it.

also is there a way to use full cpu power to this inserts faster, may be increasing the threads. how do i do?

Read it carefully before you choose to implement any.

The OPENROWSET function can accept the following table hints, which provide bulk-load optimizations with the INSERT statement:

•The TABLOCK hint can minimize the number of log records for the insert operation. The recovery model of the database must be set to simple or bulk-logged and the target table cannot be used in replication. For more information, see Prerequisites for Minimal Logging in Bulk Import.

•The IGNORE_CONSTRAINTS hint can temporarily disable FOREIGN KEY and CHECK constraint checking.

•The IGNORE_TRIGGERS hint can temporarily disable trigger execution.

•The KEEPDEFAULTS hint allows the insertion of a table column's default value, if any, instead of NULL when the data record lacks a value for the column.

•The KEEPIDENTITY hint allows the identity values in the imported data file to be used for the identity column in the target table.

Jeffrey Williams SSC Guru Points: 90312 More actions · Answer 9

SSIS would be a very simple setup:

You would need a data flow task. in the data flow, you will have an OLEDB Source and an OLEDB destination.

For the source, you specify the view - for the destintation you specify the table (use the fast load option).

Set the max commit and batch size parameters. I would recommend starting at 200,000 and monitoring the log usage and memory usage to see if you can increase that.

Make sure you do this from the server and not your desktop. If you run this on your desktop, the data has to be sent across the wire to your desktop and back up to the server which is a lot of network traffic and will slow it down tremendously.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Devendrakumar SSC-Forever Points: 42493 More actions · Answer 10

SSIS would be a very simple setup:

And this is why I don't like it. It stops the learning curve for DB Developer.

The ONLY argument in favour of SSIS is "Why to re-invent the wheel?" 😀

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 11

Dev (11/19/2011)
SSIS would be a very simple setup:
And this is why I don't like it. It stops the learning curve for DB Developer.
The ONLY argument in favour of SSIS is "Why to re-invent the wheel?" 😀

With that fine thought in mind, the OPENROWSET code you wrote still doesn't meet the OP's requirement of doing it in batches. Let's see your example of using OPENROWSET for loading the data from the view to the table in batches of 200,000. 😉

And don't forget that OPENROWSET isn't enabled by default. In order to use it, you must enable "Ad Hoc Distributed Queries" in the system configuration. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Devendrakumar SSC-Forever Points: 42493 More actions · Answer 12

Jeff Moden (11/19/2011)
Dev (11/19/2011)
SSIS would be a very simple setup:
And this is why I don't like it. It stops the learning curve for DB Developer.
The ONLY argument in favour of SSIS is "Why to re-invent the wheel?" 😀
With that fine thought in mind, the OPENROWSET code you wrote still doesn't meet the OP's requirement of doing it in batches. Let's see your example of using OPENROWSET for loading the data from the view to the table in batches of 200,000. 😉
And don't forget that OPENROWSET isn't enabled by default. In order to use it, you must enable "Ad Hoc Distributed Queries" in the system configuration. 😉

Thanks Jeff! It slipped from my mind that BULK can be used with data files only. I also verified it in BULK INSERT & BCP. I don't believe it can only be done with SSIS but I don't find an easy answer in T-SQL except cursor / loops.

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 13

Tara-1044200 (11/18/2011)
Can someone help me doing insert as select * into tab1 in a batches. I have almost billion records to insert and i think inserting in batches would be best way.

Now that we've determined that SSIS is a pretty easy way to do this, I have to ask... why is it necessary to create a billion row table from a view (which will double the required storage) which also means that the data already exists somewhere?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sam Bendayan SSC-Addicted Points: 479 More actions · Answer 14

By "inserting in batches" do you mean breaking up the insert into batches? Such as doing 10,000 rows at a time instead of doing all 1 billion rows at once?

If so, you can use a WHILE loop:

declare @SourceRowCount int

select @SourceRowCount = COUNT(*) FROM source_table

WHILE ((SELECT COUNT(*) FROM destination_table) < @SourceRowCount)

BEGIN

INSERT INTO destinationtable

SELECT TOP 10000 column_list FROM source_table

END