November 5, 2013 at 2:39 am
Hi everybody. I've an unexpected behaviour filling a table with the tablock hint. I need to use minimally logged insert to minimize execution time, but I noticed an unexpected slowness in some cases.
In short: if source data are not an existing table but a dynamic one (f.i a cte) I see that a SELECT ... INTO is considerably faster than an INSERT INTO statement WITH (TABLOCK).
In attachment you have a little sample with two scripts. In both I use a recursive CTE in order to generate some data, and then I use these data to fill a table.
By the first script I use the SELECT ... INTO statement
By the second one I use an INSERT INTO statement WITH (TABLOCK).
Usually both methods return the same execution time, BUT only if INSERT INTO statement gets data from an existing table object as source. In my sample it seems that a dynamic generation on source data (the recursive CTE) slow down execution. In detail, I have a 40% (avg) slower execution by INSERT INTO than by SELECT ... INTO.
Database on which test was done meets requirements for minimally logging, of course, and target table too.
Some details.
1. By IO statistics I see that both statement return following info:
Table 'Worktable'. Scan count 18, logical reads 3161360, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
but INSERT ... INTO statement returns also info about many logical reads on target table:
Table 'resTable'. Scan count 0, logical reads 1003344, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
2. Execution plans are equal for both statements, except for object on which data inserts are made:
- for SELECT ... INTO object is target table itself (resTable)
- for INSERT INTO object isits heap ([dbo].[resTable](heap))
3. By INSERT... INTO it seems that minimally logging is missing... Monitoring Log use I saw that
- By SELECT ... INTO Log size increase about 1681 KB
- By INSERT... INTO Log increase about 116996 KB
4. If you put data in intermediate temp table (by SELECT ... INTO) and then in target table from this temp one (by INSERT INTO) minimally logging works fine; it means that filling two tables starting from the same recursive CTE (first one by SELECT INTO, second one BY INSERT INTO with the first as source) is much faster than filling a single table by an INSERT INTO based on CTE data...
I can not understand the reason for this behavior. Have any of you experienced something like that? Do you have any explanation about?
Thanks a lot.
November 11, 2013 at 2:49 pm
there are some recomendations on the web, check if your table is a heap or not before "insert ... select", consider using begin transaction too
in 2008 r2, for inserting into large tables, I use a cursor for scanning each row, all in one big transaction, 5 million rows in a half minute, and you don't consume disk space
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply