If you’ve been around databases and database servers long enough, you’ll get the call. What call am I referring to? The one that says that the server has crashed, that a fiber card went south, that a rogue process modified/deleted/created bad data, that a restore was accidentally run against a production system, that something went wrong and you now have to recover your production database(s) to a specific point in time. To quote my favorite cartoon character from my youth (and firmly establish my age), Underdog, “There’s no need to fear...”
First off, you’ve been doing your backups, right? You’ve got a full backup running on a regular schedule. You may have incremental backups running intermittently. Most importantly, your production data, the stuff that the company can’t live without, you know, the information that helps to generate your paycheck each month, has its transaction logs on a regular backup. Right? If you’re running your production system with truncate log on checkpoint set to true (AKA, Simple Recovery), stop reading. Go and fix that production system. Come back here when you’re done so you know what to do in the event of an actual emergency.
For sake of discussion, let’s assume the following: there is a nightly, full backup of the production system, there is a mid-day incremental backup, and there are log backups every fifteen minutes. We’ll concentrate on restoring a single database to a point in time, the moment just before whatever debacle listed above caused you to need to go through this evolution. You should consider that multiple databases and multiple systems could be affected by this recovery. Do you have two databases on the server that refer to data within each other such that a recovery will cause that data to be out of sync, or replication servers that are dependent on this data, or offline data sets on user systems that will have incorrect data after the restore? These, and other, situations could expand the scope of the recovery to a point in time, but won’t make any change to the core of the process.
Failure #1
Sometime around 8:37AM a developer, with too much production access, ran a process against the production data that inserted some bad data, modified good data, and deleted good data. This wasn’t discovered until a short time later when the complaints start to come in to the help desk. Management makes the determination to toss any data created in the system since 8:37AM and needs you to quickly restore the database back to that point in time.
After you get the users out of the system and lock it down so that you’re not contending with them (if there are connections into a database, you can’t restore it), you must restore last night’s full backup. There is a wrinkle to the tried and true restore. The standard restore syntax would look like this:
RESTORE DATABASE BigMoney FROM DISK = ‘\\backup_server\fullbackup\BigMoney.BAK’ WITH REPLACE
At the conclusion of the restore process SQL Server will roll back any uncommitted transactions stored within the backup and set the database to ready for use. The problem is, when restoring log files, you need to keep the transactions open until you’ve arrived at the end time to which you’re restoring, in this case 8:37AM. So, you need to modify the restore as follows:
RESTORE DATABASE BigMoney FROM DISK = ‘\\backup_server\fullbackup\BigMoney.BAK’ WITH REPLACE, NORECOVERY
This keeps uncommitted transactions open and the database in an intermediate, non-recovered state, ready for the logs to be applied. On that note, let’s get started:
RESTORE LOG BigMoney FROM DISK = ‘\\backup_server\logs\BigMoney.BAK’ WITH NORECOVERY, STOPAT = ‘Mar 17, 2005 8:37 AM’, FILE = 1
Let’s break that down a bit. First off, while we are restoring to a database, we’re not doing a database restore, but a log restore, hence the change in syntax to ‘RESTORE LOG.’ Log backups, and database backups for that matter, can be done one of two ways. You can backup a series of backups to a single file or backup device, accumulating backups over a time period, in our case, a day, so that you get a number of backups in a single location that have to be referenced by number within the file. That’s the reason for the ‘FILE=1’ syntax above. Twenty-four hours of log backups every 15 minutes would result in 96 backups within the file. You can also initialize a new file, or device, for each log backup and then have 96 files out on your share to manage & cleanup. Then the restores don’t need the ‘FILE=x’ syntax. Until we get to the final log restore, we don’t want the database to recover, rollback the logs, and because our further log restores to fail, so we use the ‘NORECOVERY’ syntax on the log restores. Lastly, you have to determine the stopping point ‘STOPAT = ‘Mar 17, 2005 8:37 AM’
How many times will you have to run that? If our nightly backup were run at midnight, you’d have four log backups an hour until the 8:45 backup, 35 times. In order to control the situation directly, I’d recommend you determine what you’re dealing with specifically when it comes to your log backups. You do this by getting the header information:
RESTORE HEADERONLY FROM DISK = ‘\\backup_server\logs\BigMoney.BAK’
This will return the full list of backup sets on the file. The information contained in the results of this query includes important information you’ll use for point in time recovery like:
So, instead of hard coding the ‘FILE=x’ statement, based on the information obtained from the header you could do something like this:
DECLARE @FileCount int DECLARE @CurrentFile int SET @FileCount = 18 SET @CurrentFile = 1 WHILE (@CurrentFile < @FileCount) BEGIN RESTORE LOG BigMoney FROM DISK = ‘\\backup_server\logs\BigMoney.BAK’ WITH NORECOVERY, STOPAT = ‘Mar 17, 2005 8:37 AM’, FILE = @CurrentFile SET @CurrentFile = @CurrentFile + 1 END
That will take care of cycling through all the logs that you need. Finally, you need to run the following:
RESTORE LOG BigMoney FROM DISK = ‘\\backup_server\logs\BigMoney.BAK’ WITH RECOVERY, STOPAT = ‘Mar 17, 2005 8:37 AM’, FILE = @CurrentFile
RECOVERY is set, so the database will be recovered, meaning all uncommitted transactions still open at our end time of 8:37AM will be rolled back and the database will be ready for use. You’re back online and a hero. Until…
FAILURE #2
Sometime around 3:27PM, the fiber card failed and some data being written at that point was corrupted. The database, after replacing the fiber card and bringing everything back on line, is in an indeterminate state and you need to restore to 3:26PM (just to be sure you’re clear of the bad transactions, you’re willing to lose a minute of data). Start off the same way, restore last night’s backup and be sure to include WITH NORECOVERY. Now we need to restore the differential backup. Differential backups are like giant log backups. They backup everything that has changed since the last full backup, but only the stuff that has changed. They’re smaller and faster than full backups.
RESTORE DATABASE BigMoney FROM DISK = ‘\\backup_server\diff\BigMoney.BAK’ WITH NORECOVERY
Restoring the differential is just like restoring the database. Assuming we’re recovering to a point in time, you need to prevent the rollback of open transactions until we reach our target time, in this case 3:26PM. If not, and you simply wanted to apply the differential backup, you would not set the ‘WITH NORECOVERY’ clause. Now simply run through the log backups as before and be sure to include the ‘WITH RECOVERY’ clause in the last log restore. You’re back online and a hero again.
While the amount of work increases pretty radically if you have to synchronize the recovery of multiple databases, the actual steps are the same. As a matter of fact, knowing how to restore to a point in time makes synchronization of two databases easier. Let’s assume for a moment that you have a small database, about 10-12 gb, and a more mid-sized one, 100gb, and these two databases share data in such a way that they need to be synchronized in their recovery so that they’re working from identical baselines. Obviously backing up the 100gb database will take longer than backing up the 10gb. So, when you have to restore them, simply restore the 100gb database normally. Get the backup completion time from the header and then do a point in time recovery of the smaller database to that time. Ta-da, synchronized databases.
Having read this article is not enough. You need to practice doing this in your environment. My examples worked from backup files, but if you use backup devices, while the concepts will be identical, the code needs some adjustment. Most importantly, you don’t want the first time that you’re doing a point in time recovery to be at 4AM when you were woken up by the phone or, even worse, at 2PM with ¾ of IT management standing behind you watching every single typo as you try to recover the company’s financial data. So get out there and practice now, just not on the production system, please.