I believe that I have strong knowledge in SQL Server backup and recovery process, but recently there was an incident which shook my confidence. I was working on a migration for a noncritical production server from SQL server 2008 to 2017. As we need to upgrade the OS also, we decided to go ahead with the side by side migration. The backup plan on the existing server was full backup on a weekly basis. So I decided to restore the previous week's full backup well in advance and during the migration window to take a differential backup and apply that on the new server and I communicated this process will take approximately 15 minutes. As I was pretty confident in this process, I didn't put any effort to do a dry run prior to the migration day and everything set for migration.
During the migration window, as a first step, I fired the command to take differential backup on the existing server. To my surprise, it throws the following error:
Msg 3035, Level 16, State 1, Line 1
Cannot perform a differential backup for database “Mydatabase”, because a current database backup does not exist. Perform a full database backup by reissuing BACKUP DATABASE
Cannot perform a differential backup for database “Mydatabase”, because a current database backup does not exist. Perform a full database backup by reissuing BACKUP DATABASE
I was not able to believe this. Immediately I queried the backup set table and I saw multiple full backup records after the previous week full backup. I ran a set of queries and it revealed that some backup operation is happening without DBA's knowledge to a virtual device at a regular interval. As per the Microsoft documentation, Device_type 7 is a virtual device.
I don't have time to dig further, I continued the migration by issuing full backup and restoring the same. This delayed the entire migration process by 3-4 hours and it was an embarrassing situation.
After the migration, I decided to figure out the issue and went ahead and searched the SQL error log and I found the following messages along with the backup message.
- I/O is frozen on database Mydatabase. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup
- I/O was resumed on database Mydatabase. No user action is required.
- Database backed up. Database: Mydatabase, creation date(time): 2012/02/02(09:42:18), pages dumped: 578, first LSN: 528:626:37, last LSN: 528:643:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'{4618D52D-9C83-400A-A698-AB391E30369A}26'}).This is an informational message only. No user action is required.
The I/O frozen error was just before the backup and that triggered me to think about disk snapshot backups running behind the scene and contacted the infra team to understand this. On contacting them, they told me that, they have enabled the Azur VM backup on all VM on a daily basis and the Azure VM backup schedule on the server in question was matching with the above error message. With that, we were sure that the azure VM is the culprit behind this ghost SQL server backup. On reading further on azure VM backup, we learned that Azure VM backup uses the VSS (Volume Shadow copy) service to take the backup and SQL server is a VSS aware application and it takes a full backup of SQL Server databases using SQL writer service. Before taking backup, it will freeze the disk operation to get a consistent backup.
What is VSS backup?
Volume Shadow Copy is a technology included in Microsoft Windows that can create backup copies or snapshots of computer files or volumes, even when they are in use. It is implemented as a Windows service called the Volume Shadow Copy service. It allows backup applications to safely back up locked and open files.
Components of VSS:
- VSS service: Part of windows operating system and installed as by default. This Orichistate the communication between the backup tool and VSS aware applications.
- VSS requester: The application who initiate the backup request to the VSS service. This can be Microsoft or non-microsof backup applications such as Windows server backup utility, any third party VM backup tool, etc.
- VSS writer This is the component that guarantees a consistent data set to back up. For example, Microsoft provides the VSS writers for SQL server, Exchange server, windows registry, etc. Many non-Microsoft applications also provide VSS components that need to guarantee data consistency during back up.
Finally, we understood what was happening behind the scene and decided to stop the backup operation initiated by the Azure VM backup as DBA's are responsible for SQL server backup and recovery. We can not stop the Azure VM backup as the infra team need this for recovering the instance in case of disaster. So we found two solutions.
- Stop and disable the SQL writer service. With this VSS will not able to take the SQL server database backup but still takes the VM backup.
- Add below registry key [ HKEY_LOCAL_MACHINESOFTWAREMICROSOFTBCDRAGENT]
"USEVSSCOPYBACKUP"="TRUE" This will force the VSS to take COPY_ONLY backup which will not break the backup chain
I hope this piece of information will help you one day. If you are more curious about VSS service, refer the below links: