June 9, 2010 at 8:21 am
We are moving large amounts of data (15,000,000 rows) from 1 table to another. The first table is loaded using BULK INSERT with a BATCHSIZE = 1000000 and tab delimited from a text file. No other parameters are used for this insert.
Next, after we verify integrity of the data the Import/Export wizard is used to copy this data to another empty table for production use.
My concern is that the tempdb grew to 59gb and I am trying to isolate without running these 2 steps again why it is growing so large. My thought is that using the wizard versus INSERT INTO Table2 SELECT * FROM Table1 may be part of the problem here.
Can someone explain the differences in processing between the 2?
Thanks,
Jared
Jared
CE - Microsoft
June 9, 2010 at 9:08 am
This was removed by the editor as SPAM
June 9, 2010 at 9:29 am
stewartc-708166 (6/9/2010)
The BULK INSERT, with BATCHSIZE set to x rows, commits to the destination table after every x rows, thus consuming less memory et al.INSERT INTO SELECT FROM, however, loads it all at one time, then attempts to commit all in one go.
This contributes to the growth mentioned.
There are other possible causes, eg if other data intensive processes are running at the same time etc.
One possible solution: if the final destination table is partitioned and the staging table is in the same file group, the ALTER TABLE...SWITCH statement could be used
refer: http://msdn.microsoft.com/en-us/library/ms191174(v=SQL.100).aspx,
http://msdn.microsoft.com/en-us/library/ms191160(v=SQL.100).aspx
Unfortunately I do not think that the partitioning will work well as this data really does not have a column to partition from and both tables are empty at the start of this process. Is your thought, then, to first decrease our BATCHSIZE from 1000000 to something smaller to see how this affects it? Also, is using the Import Wizard to copy data from one table to another different than using INSERT INTO?
Thanks,
Jared
Jared
CE - Microsoft
June 10, 2010 at 12:34 am
This was removed by the editor as SPAM
June 10, 2010 at 5:51 am
stewartc-708166 (6/10/2010)
It is a good idea to play around with the batch sizes.However, the INSERT INTO SELECT FROM is what is consuming the most resources.
you could consider moving the data to the final table using, e.g. SSIS, where batching can also be performed.
Thank you for your advice. We are going to test different batch sizes to see if this helps. We are also going to try the use of TABLOCK since there are no constraints on the tables being loaded to see if that helps. We just need a balance between performance and tempdb usage.
On the second step, however, I think you are still misunderstanding me. Me are not using INSERT INTO ... SELECT * FROM ... We are using the Import/Export Wizard included in SSMS. It is that I am wanting to compare to the INSERT INTO. I want to understand how this wizard works in comparison.
Thanks,
Jared
Jared
CE - Microsoft
June 10, 2010 at 6:31 am
This was removed by the editor as SPAM
June 10, 2010 at 7:38 am
Ok, great! That makes a lot of sense. We have actually just moved our tempdb to a larger partition so that we can test without affecting our OS partition. This will allow us to try the different batch sizes and perfect our SSIS package. Thanks so much for your help!
Thanks,
Jared
Jared
CE - Microsoft
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply