January 5, 2009 at 5:29 pm
I am trying to find out if there is an efficient way to provide to developers with a small data set from a large databases. It has been some kind of tradition where I work to prepare our production databases with small set of data for developers every Monday morning. Also QA is using the same database to test their scripts prior to release. Right now one production db is over 500 GB and another is over 400 GB. Couple years ago when the databases were small developers had written stored procedures to delete data from the tables and leave only the last 3 days of data. So the process kicks in restores the latest full backup with a new name, builds several temp lookup tables to be used in deleting tables. The job has been taking too long time lately.
So I would like to ask the community whether anyone is practicing something similar? Restoring 500 GB db and then starting to delete taking a long time and not very efficient. I would appreciate if anyone has better idea that they have been practicing.
Happy new year to everyone.
Bulent
January 5, 2009 at 6:17 pm
January 5, 2009 at 10:30 pm
You might be able to take the opposite approach. Instead of resorting a full backup and then deleting all but 3 days of data, create an empty database and load it with the most recent 3 days of data directly from the live prod database. Then back up the 3 day database and restore it to your dev environments.
January 6, 2009 at 3:07 am
I had to do something very similar and its certainly possible but can get tricky depending on the complexity of your relationships...
For mine I created a temp table of every primary record, and deleted a selection of records that I wanted to keep in the database from that temp table.
I then went through the major tables, joined them to my temp table in a delete statement. You must try to ensure you have removed the record cleanly from related tables also. You may need to remove constraints, triggers etc and you may need to remove replication
(to remove replication use http://www.sqlhowto.co.uk/post/how-can-i-remove-replication-from-my-database-.html )
You can use the script on the following page to identify the largest tables
http://www.sqlhowto.co.uk/post/how-can-i-identify-the-largest-tables-in-my-database-.html
You can also clear down any audit tables, user tables, archived info etc
Hope this helps... let me know how you get on.
James Howard
January 6, 2009 at 10:28 pm
Eric and James, thanks to you both for your input. Only copying the last 3 days of data sounds like better idea than deleting more than a year old data. I just have to find time to implement a solution for that. Appreciate you inputs.
Have a nice day.
January 9, 2009 at 4:07 pm
If you can afford the Visual Studio Team System it has a feature that lets you take a subset of the data.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply