May 26, 2010 at 9:17 am
Please accept my apology if this post is in the wrong forum.
I'm trying to bulk load two tables with mimimal logging. I started by setting the recovery model to Bulk-logged. Here's the pseudo-code for each table:
TRUNCATE TABLE table_name
INSERT INTO table_name WITH(TABLOCK)
(Column_Names)
SELECT
(Column_Names)
FROM OPENROWSET (
BULK 'data_file_path',
FORMATFILE = 'format_file_path') alias
WHERE condition
ORDER BY column_with_nonclustered_index
I can backup the transaction log and shrink it down to 50 MB. After running the code for the two tables, the transaction log has grown to about 560 MB, which is about the same size as the primary data file (.mdf file).
Each table has exactly one index, which is non-clustered. (There are no primary keys, and no unique constraints.) I tried dropping the index before the insert statement, and then recreating the index after the insert statement as follows:
TRUNCATE TABLE table_name
DROP nonclustered_index
INSERT INTO table_name WITH(TABLOCK)
(Column_Names)
SELECT
(Column_Names)
FROM OPENROWSET (
BULK 'data_file_path',
FORMATFILE = 'format_file_path') alias
WHERE condition
ORDER BY column_with_nonclustered_index
CREATE nonclustered_index
Once again, I backup the transaction log and shrink it down to 50 MB. After running the code for the two tables, the transaction log stays at 50 MB.
In my mind, the 2nd scenario achieved my goal of bulk loading the two tables with minimal logging. But I don't know why I had to drop/create the indexes. I'd prefer not to do this. This documentation got me headed in the right direction:
http://msdn.microsoft.com/en-us/library/ms190422.aspx
But the question remains: If the tables are truncated before the bulk load, why do I have to drop/create indexes? (I am using SQL 2005, and the documentation is for SQL 2008.)
What am I missing?
May 26, 2010 at 2:46 pm
Followup...
I'm really reluctant to drop and recreate indexes. Let's say another dba or developer changes the index (add/remove columns to/from the index, change the FILLFACTOR, etc). My code would nullify those changes because I hard-coded the steps to create the index. Not good!
A better option is to disable the index and rebuild it later. (Thanks to Andy Warren for his article about Disabling Indexes[/url]!)
Here's my revised pseudo-code:
TRUNCATE TABLE table_name
ALTER INDEX nonclustered_index ON table_name DISABLE
INSERT INTO table_name WITH(TABLOCK)
(Column_Names)
SELECT
(Column_Names)
FROM OPENROWSET (
BULK 'data_file_path',
FORMATFILE = 'format_file_path') alias
WHERE condition
ORDER BY column_with_nonclustered_index
ALTER INDEX nonclustered_index ON table_name REBUILD
If I get really motivated, I can programatically determine the non-clustered indexes and disable/rebuild them.
Still, after reading the documentation on MSDN, I'm confused as to why any of this matters when the tables are truncated first...
August 29, 2010 at 3:46 pm
Dave Mason (5/26/2010)
Still, after reading the documentation on MSDN, I'm confused as to why any of this matters when the tables are truncated first...
It's been a while, but in case you haven't figured it out, the reason that the log files stay so much smaller when you either DROP or DISABLE the non-clustered indexes is that if the indexes are active while you are loading data into the table SQL Server has to log all of the updates to the indexes. So as a record is inserted it has to log the page changes, and page splits. When you DROP or DISABLE the index first, and then CREATE or REBUILD it after the data is in, you are essentially creating a brand new index.
The two main good side-effects of doing this are that:
* the fragmentation in your indexes will be much less, making for better performance.
* It is normally faster to insert the data and then index. (It is faster to create the index all at once than to modify it row by row to keep it up to date.)
I hope that helps explain things.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply