November 18, 2011 at 6:18 pm
Can someone help me doing insert as select * into tab1 in a batches. I have almost billion records to insert and i think inserting in batches would be best way.
November 18, 2011 at 6:40 pm
also the source table is a view and there wont be any ID column for source.
November 19, 2011 at 3:52 am
Create SSIS package.
OR
INSERT ... SELECT * FROM OPENROWSET(BULK...) statement
November 19, 2011 at 4:56 am
DEv, could you please tell what components i have to use for this simple process.
November 19, 2011 at 5:21 am
Tara-1044200 (11/19/2011)
DEv, could you please tell what components i have to use for this simple process.
I am not sure for SSIS but I can help you in starting with 2nd option.
OPENROWSET
( { 'provider_name' , { 'datasource' ; 'user_id' ; 'password'
| 'provider_string' }
, { [ catalog. ] [ schema. ] object
| 'query'
}
| BULK 'data_file' ,
{ FORMATFILE = 'format_file_path' [ <bulk_options> ]
| SINGLE_BLOB | SINGLE_CLOB | SINGLE_NCLOB }
} )
<bulk_options> ::=
[ , CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' } ]
[ , ERRORFILE = 'file_name' ]
[ , FIRSTROW = first_row ]
[ , LASTROW = last_row ]
[ , MAXERRORS = maximum_errors ]
[ , ROWS_PER_BATCH = rows_per_batch ]
[ , ORDER ( { column [ ASC | DESC ] } [ ,...n ] ) [ UNIQUE ]
Example:
SELECT a.*
FROM OPENROWSET('SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;',
'SELECT GroupName, Name, DepartmentID
FROM AdventureWorks2008R2.HumanResources.Department
ORDER BY GroupName, Name') AS a;
You need to add INSERT INTO clause & ROWS_PER_BATCH option in OPENROWSET to make it workable for you.
November 19, 2011 at 5:42 am
but i am loading data from a view to a table with in same database.
November 19, 2011 at 6:01 am
Tara-1044200 (11/19/2011)
but i am loading data from a view to a table with in same database.
I didn’t get you.
Is it stopping you from using OPENROWSET?
Or
Your question is “Why should I use OPENROWSET in the same DB"?
November 19, 2011 at 8:09 am
yes because source and destination are in the same database is it still ok?
also is there a way to use full cpu power to this inserts faster, may be increasing the threads. how do i do?
November 19, 2011 at 8:29 am
yes because source and destination are in the same database is it still ok?
It should not be a problem but I am not sure. I don't see any harm in trying it.
also is there a way to use full cpu power to this inserts faster, may be increasing the threads. how do i do?
Read it carefully before you choose to implement any.
The OPENROWSET function can accept the following table hints, which provide bulk-load optimizations with the INSERT statement:
•The TABLOCK hint can minimize the number of log records for the insert operation. The recovery model of the database must be set to simple or bulk-logged and the target table cannot be used in replication. For more information, see Prerequisites for Minimal Logging in Bulk Import.
•The IGNORE_CONSTRAINTS hint can temporarily disable FOREIGN KEY and CHECK constraint checking.
•The IGNORE_TRIGGERS hint can temporarily disable trigger execution.
•The KEEPDEFAULTS hint allows the insertion of a table column's default value, if any, instead of NULL when the data record lacks a value for the column.
•The KEEPIDENTITY hint allows the identity values in the imported data file to be used for the identity column in the target table.
November 19, 2011 at 8:49 am
SSIS would be a very simple setup:
You would need a data flow task. in the data flow, you will have an OLEDB Source and an OLEDB destination.
For the source, you specify the view - for the destintation you specify the table (use the fast load option).
Set the max commit and batch size parameters. I would recommend starting at 200,000 and monitoring the log usage and memory usage to see if you can increase that.
Make sure you do this from the server and not your desktop. If you run this on your desktop, the data has to be sent across the wire to your desktop and back up to the server which is a lot of network traffic and will slow it down tremendously.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
November 19, 2011 at 8:58 am
SSIS would be a very simple setup:
And this is why I don't like it. It stops the learning curve for DB Developer.
The ONLY argument in favour of SSIS is "Why to re-invent the wheel?" 😀
November 19, 2011 at 10:12 pm
Dev (11/19/2011)
SSIS would be a very simple setup:
And this is why I don't like it. It stops the learning curve for DB Developer.
The ONLY argument in favour of SSIS is "Why to re-invent the wheel?" 😀
With that fine thought in mind, the OPENROWSET code you wrote still doesn't meet the OP's requirement of doing it in batches. Let's see your example of using OPENROWSET for loading the data from the view to the table in batches of 200,000. 😉
And don't forget that OPENROWSET isn't enabled by default. In order to use it, you must enable "Ad Hoc Distributed Queries" in the system configuration. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
November 20, 2011 at 3:35 am
Jeff Moden (11/19/2011)
Dev (11/19/2011)
SSIS would be a very simple setup:
And this is why I don't like it. It stops the learning curve for DB Developer.
The ONLY argument in favour of SSIS is "Why to re-invent the wheel?" 😀
With that fine thought in mind, the OPENROWSET code you wrote still doesn't meet the OP's requirement of doing it in batches. Let's see your example of using OPENROWSET for loading the data from the view to the table in batches of 200,000. 😉
And don't forget that OPENROWSET isn't enabled by default. In order to use it, you must enable "Ad Hoc Distributed Queries" in the system configuration. 😉
Thanks Jeff! It slipped from my mind that BULK can be used with data files only. I also verified it in BULK INSERT & BCP. I don't believe it can only be done with SSIS but I don't find an easy answer in T-SQL except cursor / loops.
November 20, 2011 at 2:16 pm
Tara-1044200 (11/18/2011)
Can someone help me doing insert as select * into tab1 in a batches. I have almost billion records to insert and i think inserting in batches would be best way.
Now that we've determined that SSIS is a pretty easy way to do this, I have to ask... why is it necessary to create a billion row table from a view (which will double the required storage) which also means that the data already exists somewhere?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 21, 2011 at 8:26 pm
By "inserting in batches" do you mean breaking up the insert into batches? Such as doing 10,000 rows at a time instead of doing all 1 billion rows at once?
If so, you can use a WHILE loop:
declare @SourceRowCount int
select @SourceRowCount = COUNT(*) FROM source_table
WHILE ((SELECT COUNT(*) FROM destination_table) < @SourceRowCount)
BEGIN
INSERT INTO destinationtable
SELECT TOP 10000 column_list FROM source_table
END
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply