Is SELECT Identity() … INTO Minimally Logged in Simple Recovery?
I was having a discussion with Idera (@idera_software) dev lead Vicky Harp (blog|@vickyharp) about minimizing log impact when performing schema changes on really large tables. One option discussed was using SELECT INTO to create a new table and populate it in a minimally logged operation. We would also need to add an identity column to the new table definition, and we would include the IDENTITY() function to create it.
Later that evening, Vicky forwarded a link to me for a blog post by Kalen Delaney (site|blog|@sqlqueen) about how the IDENTITY function changes the way SQL Server logs the SELECT INTO clause when in the full recovery model or bulk-logged recovery model. Kalen clearly demonstrated that you would not get the same savings in logging if you used the IDENTITY function.
There is great information in this blog post, I highly recommend that you pause reading this blog post long enough to go read Kalen’s blog post. Just be sure to come back, becaue I’m going to add on to what Kalen says.
The blog post: Geek City: What gets logged for SELECT INTO operations?
What About Simple Recovery?
So if you just read Kalen’s blog post, you may be wondering what I was wondering when I read it. What about simple recovery? For the schema changes that we are planning, we were considering switching the database to simple recovery for the duration of the upgrade. I wanted to verify what impact using IDENTITY with SELECT INTO would have so I set up my own little test.
First step of the test is to create a database for the test and make sure it is in simple recovery model.
Use master; Go If DB_ID('TestSimple') Is Not Null Drop Database TestSimple; Go Create Database TestSimple; Go Alter Database TestSimple Set Recovery Simple; Go Use TestSimple; Go
Next thing is to see how much gets logged with just a Select. For this part of the test, I create the table using SELECT INTO with criteria of 1 = 2 so that no rows are inserted. The schema gets created, but the table is left empty. Then I clear out the log (checkpoint) and perform the insert.
-- First off, let's do the same insert without SELECT INTO -- Use SELECT INTO to create an empty table, then insert data Select IDENTITY(int, 1,1) as ID, * into dbo.AllDatabases From sys.databases where 1 = 2; -- Clear the active log Checkpoint; -- Insert the data Insert Into dbo.AllDatabases ( [name] ,[database_id] ,[source_database_id] ,[owner_sid] ,[create_date] ,[compatibility_level] ,[collation_name] ,[user_access] ,[user_access_desc] ,[is_read_only] ,[is_auto_close_on] ,[is_auto_shrink_on] ,[state] ,[state_desc] ,[is_in_standby] ,[is_cleanly_shutdown] ,[is_supplemental_logging_enabled] ,[snapshot_isolation_state] ,[snapshot_isolation_state_desc] ,[is_read_committed_snapshot_on] ,[recovery_model] ,[recovery_model_desc] ,[page_verify_option] ,[page_verify_option_desc] ,[is_auto_create_stats_on] ,[is_auto_update_stats_on] ,[is_auto_update_stats_async_on] ,[is_ansi_null_default_on] ,[is_ansi_nulls_on] ,[is_ansi_padding_on] ,[is_ansi_warnings_on] ,[is_arithabort_on] ,[is_concat_null_yields_null_on] ,[is_numeric_roundabort_on] ,[is_quoted_identifier_on] ,[is_recursive_triggers_on] ,[is_cursor_close_on_commit_on] ,[is_local_cursor_default] ,[is_fulltext_enabled] ,[is_trustworthy_on] ,[is_db_chaining_on] ,[is_parameterization_forced] ,[is_master_key_encrypted_by_server] ,[is_published] ,[is_subscribed] ,[is_merge_published] ,[is_distributor] ,[is_sync_with_backup] ,[service_broker_guid] ,[is_broker_enabled] ,[log_reuse_wait] ,[log_reuse_wait_desc] ,[is_date_correlation_on] ,[is_cdc_enabled] ,[is_encrypted] ,[is_honor_broker_priority_on]) Select * From sys.databases GO -- Let's look at the log records Select [Current LSN], Operation, Context, [Transaction ID], [Log Record Length], [Log Reserve] from fn_dblog(null, null)
Expand the result set below and look at the log records (or just run the code and look at real result sets. I call out a few rows in it. Throughout the transaction, you will notice that for each row inserted, there are a pair of log entries, one for the identity value (LOP_IDENT_NEWVAL) and one for the row insert(LOP_INSERT_ROWS). the actual row inserts are easy to pick out because they have a longer record length and an additional log reserved.
[[Visit blog to check out this spoiler]]
Now let’s see how much log is created for the SELECT INTO with IDENTITY. I’ll clear the log and then run the insert and take another look at the log file.
-- Clear the active log Checkpoint; Select IDENTITY(int,1,1) as ID Into dbo.AllDatabases2 From sys.databases; Go -- Let's look at the log records Select [Current LSN], Operation, Context, [Transaction ID], [Log Record Length], [Log Reserve] from fn_dblog(null, null);
You will see a lot more log records here because we are not able to filter out the creation fo the table. You see a lot of records for creating the table and managing the allocations. What you don’t see though are log records for each individual row. If you scroll down to the section with the identity values being created (LOP_IDENT_NEWVAL operations), you will see a long stream of identity increments.
[[Visit blog to check out this spoiler]]
Summary
Long story short, inserting the log records is minimally logged as the individual rows are not being logged individually. The identity values are being fully logged (individually). Even in simple recovery model, using the IDENTITY function still creates additional overhead for logging. It is, for the most part, still minimally logged, however.