October 19, 2013 at 3:08 pm
I have a large Production Database that I need to occasionally copy to Development, Test and Training Databases.
However, the Production Database is MUCH too big for those purposes and has data in it that should not be visible to Trainees etc.
Therefore, I wrote an 'Anonymization' script to anonymize some of the real data (much more representative and complete than the Test data is) and then to remove the remaining real data.
Initially, I just used DELETE FROM xxx WHERE [...] to remove the remaining data, but this was slow and created huge log files which caused the script to fail if it were not run in stages.
However, although TRUNCATE is much faster and does not log the changes, it does not normally allow any data to be kept.
The following script snippet shows how a table can have the data that you want copied to another table and then copied back into the original table after it has been Truncated. It should handle foreign key constraints okay since the Truncate does not check them but I make no promises ! When running this on multiple tables, ensure that the tables are processed in the correct order - usually the Primary table should be handled first and then the Secondary ones etc. Thus, in the example shown, the assumption is that the Person table has already been processed before processing the Sales table.
/*
SQL Script to conditionally Truncate a Table
It is straightforward to wrap a table selection
query around this so that it works for multiple
tables.
Valid @Update_Type values are;
IIgnore Table- Do not change the Table
TTrim Table- Remove the unnecessary records from the
Table (Copy, Truncate and Copy back)
uses @Column_NameIf this is blank, the table will have
all records removed but will not be dropped
If this has a value, it will use @Linked_Table
and @Linked_Column to check if the value in
@Column_Name exists and will keep those records.
XDelete Table- First Truncate the Table and then Drop the Table
Faster than just Dropping the Table
since it does not Log the Transaction
Written By:S Holzman
Date:08/22/2013
Amendments:
*/
DECLARE @Table_NameVARCHAR(128),
@Column_NameVARCHAR(128),
@Linked_TableVARCHAR(128),
@Linked_ColumnVARCHAR(128),
@Update_TypeCHAR(1),
@Table_OrderINTEGER,
@Field_NameVARCHAR(50),
@SQL_ScriptVARCHAR(MAX),
@FieldsVARCHAR(MAX),
@Row_CountINTEGER
-- Specify the Database name to avoid the chance of
-- running this against the Production database
DECLARE@Database_NameVARCHAR(20)
SET @Database_Name = 'Test'
-- Setup an example
SET @Update_Type = 'T'
SET @Table_Name = 'Sales'
SET @Column_Name = 'Customer_ID'
SET @Linked_Table = 'Person'
SET @Linked_Column = 'Person_ID'
SET @Table_Order = 1
-- The following could be wrapped into a Cursor
-- to process multiple tables
SET @Fields = ''
DECLARE Field_Cursor CURSOR FOR
SELECT Name
FROM SysColumns WITH (nolock)
WHERE id IN (SELECT id FROM SysObjects WITH (nolock)
WHERE Name = @Table_Name
AND XType = 'U')
AND Name <> 'row_timestamp'
ORDER BY ColOrder
OPEN Field_Cursor
-- Perform the first fetch
FETCH NEXT FROM Field_Cursor
INTO @Field_Name
-- Check @@FETCH_STATUS to see if there are any more rows to fetch.
WHILE @@FETCH_STATUS = 0
BEGIN
SET @Fields = @Fields + '"' + @Field_Name + '", '
FETCH NEXT FROM Field_Cursor
INTO @Field_Name
END
CLOSE Field_Cursor
DEALLOCATE Field_Cursor
-- X- Truncate the Table and then Drop the Table
-- T with a blank Column Name- Truncate the Table
-- T with a valid Column Name- Copy Data to _Temp_Anon, Truncate the Table and then copy Data back
IF @Update_Type = 'T' AND @Column_Name IS NOT NULL AND DB_Name() = @Database_Name
BEGIN
SET @SQL_Script = 'SELECT ' + LEFT(@Fields, (LEN(@Fields) - 1))
+ ' INTO _Temp_Anon'
+ ' FROM ' + @Table_Name + ' WITH (nolock)'
+ ' WHERE ' + @Column_Name + ' IN (SELECT ' + @Linked_Column
+ ' FROM ' + @Linked_Table + ' WITH (nolock))'
PRINT @SQL_Script
EXEC (@SQL_Script)
END
IF DB_Name() = @Database_Name
BEGIN
SET @SQL_Script = 'TRUNCATE TABLE ' + @Table_Name
PRINT @SQL_Script
EXEC (@SQL_Script)
END
IF @Update_Type = 'X' AND DB_Name() = @Database_Name
BEGIN
SET @SQL_Script = 'DROP TABLE ' + @Table_Name
PRINT @SQL_Script
EXEC (@SQL_Script)
END
IF @Update_Type = 'T' AND @Column_Name IS NOT NULL
BEGIN
SELECT @Row_Count = COUNT(*) FROM _Temp_Anon WITH (nolock)
/*
-- Used for processing multiple Table;
-- the _Anonymizer_Tables table stores a list of Tables to process
-- and this updates it to ensure that each table is processed
-- in turn and to store the status
IF @Row_Count <> 0
BEGIN
UPDATE _Anonymizer_Tables
SET Records_Left = @Row_Count
WHERE Table_Name = @Table_Name
AND Column_Name = @Column_Name
AND Linked_Table = @Linked_Table
AND Linked_Column = @Linked_Column
AND Update_Type = @Update_Type
AND Table_Order = @Table_Order
AND Records_Left = 0
END
*/
END
IF @Update_Type = 'T' AND @Column_Name IS NOT NULL AND DB_Name() = @Database_Name
BEGIN
IF @Row_Count > 0
BEGIN
SET @SQL_Script = 'INSERT INTO ' + @Table_Name + ' (' + LEFT(@Fields, (LEN(@Fields) - 1)) + ')'
+ ' SELECT * FROM _Temp_Anon'
PRINT @SQL_Script
EXEC (@SQL_Script)
SET @SQL_Script = 'TRUNCATE TABLE _Temp_Anon'
PRINT @SQL_Script
EXEC (@SQL_Script)
END
SET @SQL_Script = 'DROP TABLE _Temp_Anon'
PRINT @SQL_Script
EXEC (@SQL_Script)
END
October 19, 2013 at 5:23 pm
It should handle foreign key constraints okay since the Truncate does not check them but I make no promises !
You can't truncate a table that has foreign keys involved on it. It won't let you.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 20, 2013 at 4:05 am
Hi simonholzman 14059, and welcome to the forum 😀
To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.
Anyway, if your tables are referenced by FKs you'll need to drop them before deleting rows and recreate them after, or, another approach would be to alter those constraints with the delete cascade option.
Hope it helps. 😎
Jonathan Bernardez Bernardez
___________________________________________________________
DBD. MCSA SQL Server 2012
October 20, 2013 at 11:34 am
jonysuise (10/20/2013)
Hi simonholzman 14059, and welcome to the forum 😀To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.
Careful now... while that may (as in might) deal with what you perceive to be "log issues", switching to "Bulk Logged" will possibly create other log issues insofar as the ability to do "point in time recovery" should you need to ever do a restore.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 20, 2013 at 12:49 pm
Jeff Moden (10/20/2013)
jonysuise (10/20/2013)
Hi simonholzman 14059, and welcome to the forum 😀To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.
Careful now... while that may (as in might) deal with what you perceive to be "log issues", switching to "Bulk Logged" will possibly create other log issues insofar as the ability to do "point in time recovery" should you need to ever do a restore.
Hi Jeff, thank you for your reply. I indeed forgot to mention that it's strongly recommended to backup the transaction log before and after the bulk operation, in order to enable point-in-time recovery.
Cheers ! 😎
Jonathan Bernardez Bernardez
___________________________________________________________
DBD. MCSA SQL Server 2012
October 21, 2013 at 7:37 am
simonholzman 14059 (10/19/2013)
However, although TRUNCATE is much faster and does not log the changes, it does not normally allow any data to be kept.
This is not true. TRUNCATE is a fully logged operation. However unlike a delete it logs the page deallocations instead of rows. It must be logged in order to maintain ACID. Since it is logging page deallocations instead of rows it is much faster.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
October 21, 2013 at 7:43 am
Jeff Moden (10/20/2013)
jonysuise (10/20/2013)
Hi simonholzman 14059, and welcome to the forum 😀To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.
Careful now... while that may (as in might) deal with what you perceive to be "log issues", switching to "Bulk Logged" will possibly create other log issues insofar as the ability to do "point in time recovery" should you need to ever do a restore.
In this particular case - where the data is a copy from production and only part of the data is needed - I don't believe point in time restore is necessary. Personally I would even switch to simply recovery model (on the training databases that is, of course).
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 21, 2013 at 8:35 am
Koen Verbeeck (10/21/2013)
Jeff Moden (10/20/2013)
jonysuise (10/20/2013)
Hi simonholzman 14059, and welcome to the forum 😀To delete a large number of rows switch the recovery model of your database (in your development environment) to bulk-log, run your script and switch back to full recovery. This should do with the log issues.
Careful now... while that may (as in might) deal with what you perceive to be "log issues", switching to "Bulk Logged" will possibly create other log issues insofar as the ability to do "point in time recovery" should you need to ever do a restore.
In this particular case - where the data is a copy from production and only part of the data is needed - I don't believe point in time restore is necessary. Personally I would even switch to simply recovery model (on the training databases that is, of course).
Since most development and UAT boxes often don't contain mission critical data I've always made it a practice to change the recovery model to Simple.
It is always a struggle to decide which way to go to provide enough good data in DEV & UAT boxes. Personally I like to take a complete copy of the production database and move my new changes in it to make a UAT box. For DEV I have either stripped out all transaction data or leave just enough behind to give enough for the developers to perform their unit testing.
I have utilized a variety of solutions to move VLD from server to server. If you can afford the room on your production box, and it is robust enough to deal with the purging process I'd consider making a copy of the database on the production box, change the recovery model of it to SIMPLE then start whacking and hacking the data down to what is needed elsewhere. When the purge is finished, backup/restore THAT version to the various locations as needed.
This process could get quite complex based on the direction you take. You can create a complete custom SSIS package that restores the limited set of data to your various locations.
There are quite a number of solutions to address. Choosing the one that meets your requirements is going to drive you into the right solution.
Kurt
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
October 21, 2013 at 2:20 pm
Thank you all for your replies and comments. While my job title is DBA, I am more of a SQL Programmer and true Database Administration is a rare task for me, as I suspect it is for most people with that job title ! There simply aren't that many occasions where it is needed when one is managing a single production database.
Thus, please accept my apologies for any misunderstandings I have over exactly what is logged and which contstraints etc are affected by specific SQL Statements.
The purpose of the script is to easily and quickly remove most of the data, but not all of it, from a non-Production database that has been copied from Production.
Originally, I had used something along the lines of the following on about 30 tables;
DELETE FROM Sales
WHERE Customer_Person_ID NOT IN (SELECT Person_ID FROM Person WITH (nolock))
However, since the vast majority of the records are being deleted, this is very slow and, since it is logging every single record deleted (and there are millions of records in most of the tables), it was usually running out of log space unless I ran it in smaller chunks. It took about 24 hours in total to run and needed regular monitoring.
Using the 'Calculated Truncate' approach, it is only selecting the few records that are being kept and so it runs against thousands of tables and takes an hour to run in a single chunk.
Thus, it is faster, more reliable and MUCH more effective in shrinking the amount of data left in the database.
My hope is that this approach will help others with a similar problem.
However, if certain constraints will cause issues for some systems using the script as I have written it, please suggest improvements to the script so that it can dynamically remove those constraints and reimpose them once the data has been replaced in the original tables.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply