March 26, 2013 at 4:53 am
Hi folks,
SQL Server 2008 has enhanced INSERT..SELECT statement to allow minimal logging in certain cases, documented here:
http://msdn.microsoft.com/en-us/library/ms174335(v=sql.100).aspx (See section: Best Practices for Bulk Loading Data)
We planned to use this feature in a situation where we wanted to insert millions of rows in an empty heap (or an empty clustered index) and made sure that all pre-requisites are fulfilled.
a) The target table is a heap or an empty B-Tree,
b) TABLOCK is being used on the target table,
c) The database is in SIMPLE recovery model,
d) The target table is not used in replication.
The INSERT is done from a SELECT from a temp table (#temp) having around 300,000 rows (300,000 distinct Ids) joined with another table that has ~500 rows for each of the distinct #temp.Id. Thus, 500 x 300,000 = 150 million rows (around 10 GB)
We are on Microsoft SQL Server 2008 (SP1) - 10.0.2740.0 (X64).
But it seems the minimal logging is not working for us, as I see the transaction log growing to almost 10GB, during the insert if no other processes are running.
We also face transaction log full issue, when other processes are running along with this insert. (Transaction Log size = 20 GB)
There were no backups being run at the time of testing, which may have caused full logging.
Is the minimal logging using INSERT..SELECT available in all versions of SQL Server 2008 or did it get included in one of the Cumulative Updates?
I would really like to make use of this new feature in SQL Server 2008.
Any ideas on how to achieve it, would be appreciated.
March 26, 2013 at 5:02 am
What recovery model is the database in?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 26, 2013 at 6:54 am
Hi Gail,
As mentioned, the database is in SIMPLE recovery model.
Best,
Yusuf
March 26, 2013 at 8:40 am
Heap should have no indexes. Try with TABLOCKX hint.
March 26, 2013 at 8:54 am
are you doing the insert in batches? after the first batch the table will no longer be empty so you will no longer get minimal logging.
---------------------------------------------------------------------
March 27, 2013 at 12:34 pm
Hi Vedran,
I couldn't quite connect your two sentences: "Heaps have no indexes" and "Try using TABLOCKX".
Nevertheless, I did try TABLOCKX to no avail.
March 27, 2013 at 12:36 pm
george sibbald (3/26/2013)
are you doing the insert in batches? after the first batch the table will no longer be empty so you will no longer get minimal logging.
Not doing insert in batches. It is a single big insert..select
March 27, 2013 at 12:38 pm
Vedran Kesegic (3/26/2013)
Heap should have no indexes. Try with TABLOCKX hint.
Hi Vedran,
I couldn't quite connect your two sentences:
"Heaps should have no indexes" (heaps can have non clustered indexes, can't they?),
and
"Try with TABLOCKX hint".
Nevertheless, I did try TABLOCKX to no avail.
March 27, 2013 at 12:53 pm
I figured out the reason why minimal logging was not being done in my case.
All the pre-requisites mentioned in BOL to achieve minimal logging were met.
Though, a few hours later I noticed that my target table has an IDENTITY column, and I thought of removing the identity property or the identity column itself and see if it makes any difference.
And lo, minimal logging was achieved.
Not very sure of the internals behind it, but if I remove the identity column from the target table, the 150 million records insert is minimally logged.
Do you think BOL should mention this exception or add it in the list of pre-requisites for minimal logging during normal INSERT..SELECT operation?
March 27, 2013 at 1:56 pm
Yusuf Ali Bhiwandiwala (3/27/2013)
Vedran Kesegic (3/26/2013)
Heap should have no indexes. Try with TABLOCKX hint.Hi Vedran,
I couldn't quite connect your two sentences:
"Heaps should have no indexes" (heaps can have non clustered indexes, can't they?)
My sentence was: "Heap should have no indexes". Singular, not plural, your particular heap you want to achieve minimal logging.
If you have NC indexes on the heap, they will be fully logged on nonempty tables (after the first batch if you bulk import in batches, your table is no longer considered empty) even if table itself is minimally logged.
Maybe identity problem would disappear if you choose that value is imported instead of auto-generated.
Certainly, that should be in the BOL. You can add a comment there for others to know.
March 27, 2013 at 2:17 pm
Vedran Kesegic (3/27/2013)
Yusuf Ali Bhiwandiwala (3/27/2013)
Vedran Kesegic (3/26/2013)
Heap should have no indexes. Try with TABLOCKX hint.Hi Vedran,
I couldn't quite connect your two sentences:
"Heaps should have no indexes" (heaps can have non clustered indexes, can't they?)
My sentence was: "Heap should have no indexes". Singular, not plural, your particular heap you want to achieve minimal logging.
If you have NC indexes on the heap, they will be fully logged on nonempty tables (after the first batch if you bulk import in batches, your table is no longer considered empty) even if table itself is minimally logged.
Maybe identity problem would disappear if you choose that value is imported instead of auto-generated.
Certainly, that should be in the BOL. You can add a comment there for others to know.
:hehe: Sorry about the heap/heaps goof up.
The target table here had no NC indexes as well, so we were safe even there.
Having the identity column on the table was anyway an overkill, so it does me real good to remove it.
Thanks for your replies.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply