Here is a riddle for you…
You are requested to move a database from an old server to a new server. As a professional and experienced DBA, you prepare a checklist of actions to take, you script everything, you document your plan, and, of course, you test it before doing it for real. Without getting into all the details, the plan includes the following steps:
- Move all the server-level objects to the new server
- Bring the application down
- Backup the database
- Copy the backup file to the new server
- Restore the database in the new server
- Change the connection string of the application
- Bring the application up again
- Test and verify that everything is fine
After you prepare everything in advance, you first perform the test. Unfortunately, you don’t have a test environment, so you perform the test in the production environment, but without all the steps involving the application. You move all the server-level objects (such as logins and linked servers) to the new server. You back up the database, copy the backup file to the new server and restore it there. Then you run some tests on the restored database, and everything seems to be just fine. You’re happy with the results, and you go home confident in your plan, ready for the actual production migration planned for the day after.
After a good sleep, you go to the office early in the morning and start the real migration. You perform all the steps successfully, and you manage to complete all of them in less than two hours without any errors. Now it’s time for coffee, while the product team tests the application and verifies that everything is in place.
And then comes a surprise… The product team tells you that data is missing. It seems that all the changes made since yesterday are gone. People become nervous, and everyone is looking at you, waiting for answers. Your happy face is replaced with something like this:
What happened here? How can it be?
You’ve just taken a new backup after the application has been shut down, and you used that backup to restore the database. How can data be missing?
Every time you perform a backup in SQL Server, you must specify the target media for the backup. This is called a media set. It is called a set, because you can specify multiple backup devices. A media set is an ordered collection of backup devices (tapes, disk files or Azure Blobs) that contains one or more backup sets. A backup set is the content that is added to a media set by a successful backup operation, striped between the backup devices in the media set.The problem lies with the backup and restore operations. Before I explain what happened, let me give you a brief explanation of the way SQL Server handles backup operations…
For example, if you perform a full database backup to 3 disk files, you are actually creating a new media set composed of the 3 backup devices, and then adding a backup set to the media set, striped between the 3 backup devices. The result of the backup operation will look like this:
Here are a few restrictions about media sets:
- A media set cannot contain backup devices of different types (e.g. tapes and disk files).
- Each backup operation has to be striped across the same collection of backup devices in the same order. Continuing the previous example, it is not possible to perform a backup and use only one of the backup devices as the target media.
- All backup sets in a media set must be either compressed or not.
Now, let’s perform another backup operation onto the same media set. It can be any type of backup, and it can be any database. So let’s perform a log backup. The default behavior of the backup operation is to append each backup set to the end of the media set. So after the log backup is completes successfully, the media set will look like this:
If you like, you can change the definition of the media set and reuse the backup devices for other purposes. For example, you can stop using the 3 disk files in our example as a media set, and form a new media set composed of only 2 disk files out of the three (and also maybe start using compressed backups from now on). You do that by specifying the FORMAT keyword in the backup operation.
Another option is to keep the format of the media set, but to initialize it by removing all existing backup sets. This is done by specifying the INIT keyword. When this keyword is used, the new backup set becomes the first and only backup set in the media set instead of being appended to the end of the media set.
So now that we have a solid background and understanding of how backup works in SQL Server in terms of media sets and backup sets, let’s go back to our story…
When we performed the first backup as part of the test run, we used the following statement:
BACKUP DATABASE MyDatabase TO DISK = N'E:\Backup\MyBackup.bak';
Since the “E:\Backup\MyBackup.bak” file didn’t exist at this point, we actually created a new media set (a single disk file in this case) and added a new backup set (the full backup). We then copied the backup file and restored it in the new server. At this point everything was fine.
The day after, we performed another backup using the same script. Since the file now already exists, the new backup set is appended to the media set and becomes the second backup set. This is because the default behavior of the backup operation is not to initialize the media set (“NOINIT”). Then we copied the backup file to the new server again, overwriting the file from yesterday, and restored the database. We used the following statement for the restore operation:
RESTORE DATABASE MyDatabase FROM DISK = N'D:\NewServer\Backup\MyBackup.bak';
We already know that the media set contains two backup sets, the test backup from yesterday followed by the real backup from today. By the way, if you want to review the backup sets contained in a media set, you can use the RESTORE HEADERONLY statement. So which backup set is going to be used in the restore operation?
You can (and should) specify which backup set to use by specifying the “FILE = n” clause of the RESTORE statement, where “n” is the position of the backup set within the media set. If you don’t specify which backup set to use, the first is used by default. In our case, we didn’t specify which backup set to use, so the first backup set (the one from the test run yesterday) was used.
Now you see why a whole day of data was missing. We performed a backup today after the application has been brought down, but we restored the same backup from yesterday. It happened because we didn’t specify whether to initialize the media set or not during the backup operation, and we also didn’t specify which backup set to use as part of the restore operation. The combination of the default behavior of both operations creates a big problem in this scenario.
So what have we learned today?
- Perform your tests on a test environment and not on the production environment.
- Always prefer to specify explicitly what you want to do, and don’t let SQL Server decide based on some default behavior.
- Don’t put on the happy face until the product team says you can.
Image courtesy of stockimages at FreeDigitalPhotos.net
The post Be Careful! Backups Can Bite You! appeared first on .