October 22, 2024 at 8:09 am
Hi all
I've written a script to keep track of how long backups are taking, which databases got a full or differential backup, and a few other bits.
I'm using msdb.dbo.backupset as my base table but I think I've found something strange.
According to that table, we managed to backup a 300GB database in 8 seconds (see picture).
This could cause us issues when tracking how long backups are taking.
I've narrowed it down to two possibilities:-
As a bit of background, we use an SSIS package to control the backup routine and we backup 4 databases at a time.
We do one full backup every 7 weeks (due to space constraints) and we keep the last 5 differentials (again, due to space constraints).
All databases are in SIMPLE mode as they are designed to catch up from where they left off if we have to restore a backup.
Let me know if you need any more info.
TIA
Richard
October 22, 2024 at 12:23 pm
My first thought would be to check the is_Snapshot column. It could be the case that the backup is being taken by a snapshot.
October 22, 2024 at 1:02 pm
@Coffee_&_SQL - I had thought of that but we've got as table that contains the databases that need to be backed up (with dates for the next full backup) and none of the snapshots are in there.
Don't get me wrong, if we could do a backup of a 300GB database in 8 seconds (or backup a 400GB database in 9 seconds), I'd be announcing it to the world but I think msdb.dbo.backupset is lying.
October 22, 2024 at 1:41 pm
Are you using a 3rd party backup application that's called from your SSIS package? Potentially, maybe if that's the case, the backup application is taking a VSS snapshot (which is very quick,) and then backing up from that (which on the back end would take longer.)
This wouldn't show up as a "snapshot backup" because as far as SQL is concerned, it's not.
October 22, 2024 at 1:58 pm
October 22, 2024 at 6:07 pm
This table is written to by various backup processes, where they log what they do. As noted, potentially you have something doing some sort of snapshot that is recorded as a backup, which might take seconds because of hardware tricks or some sort of snapshot.
I wouldn't be too worried about this, as it likely isn't a problem. I might ignore these, and not use them in RPO calcs unless I know what this actually is.
October 22, 2024 at 7:51 pm
@jasona.work - no, it uses dynamic SQL to do the backup (full or differential). We do have a server-level backup done by a 3rd-party but that's only the complete server (it's a VM).
Steve's comment got me thinking and I have a hunch I know where you 8 second backup is coming from AND it could potentially be a problem.
It *MAY* be your VM backup software backing up the VM. Easy to check if this is on the right track, compare the backup times with the fast backups to when the VM is being backed up. Where, if this is right, that could be a problem is, SQL will see those backups as a valid full database backup and base any differential backups off that, which makes restoring the database, shall we say, problematic.
I had a similar issue not long after we migrated our databases to VMs in Azure being backed up by Azure backup for the VMs, it took some hunting to sort out what we had to do to STOP the databases from being included and let our Agent job backups be in control. As I recall, it's a registry key (again, this is specific to an Azure environment, your VM backup software likely has something similar)
October 23, 2024 at 5:31 am
In some semblance of order:-
@jasona.work -
@steve-2 Jones - thanks for that. It's not used for RPO/RTO timings yet but may be used in the future so I'd like to get the most accurate information possible.
@jasona.work - We do the database backups (fulls) on a Sunday and the diffs are done daily. The server-level backup done by the 3rd-party application is started on a Friday (I've had this confirmed by our infrastructure team)
ETA - According to the MSDB table, all the backups were started on the same day (for the fulls) which I know isn't correct. It does show that as a Friday so it's possible that it's the server-level backup but that backup takes several days (even snapshotting/cloning it for backup takes more than a few seconds).
Assuming it is the server-level backup that's causing the issue, the question is why would backing up an entire VM affect the backup table (this backup doesn't run any SQL at all).
October 23, 2024 at 11:42 am
So what I've seen with a couple different backup applications is, the backup application takes a VSS snapshot of the drive(s,) and SQL "sees" that as a full backup and records it in the table you're querying.
Here's where it gets ugly. SQL sees it as a Full database backup and any following differentials use that point as their base.
I would strongly, strongly recommend, if at all possible, you do a test restore of your database somewhere, using a Full backup from Sunday and a Differential from Saturday AFTER the VM backup has run. My suspicion is, the restore will fail.
If that happens, then the next step would be to check the VM backup software to see if it's possible to exclude SQL databases from the backup, or if there's a registry key to set to accomplish that, or possibly to modify your database backup schedule so the full backups run very shortly after the VM backup runs (and even then, there's going to be a brief window of time where you could still lose data)
October 23, 2024 at 12:56 pm
@jasona.work - Thanks for that, at least it means I'm not going insane. I'll speak to our infrastructure team to see exactly what the backup application does in terms of VS snapshots. Hopefully it is just that, but, from you've put, that could also cause some issues.
Looks like I'm shelving my current project and doing some restore testing to see if we will be having issues.
If it is the backup solution, it also looks like I'll have to adjust our backup schedule to do a full backup weekly (that will cause it's own issues).
Oh well, possibly back to the drawing board.......
ETA - I'm trying to restore one our smaller databases that has the following backups:-
Full taken on 29/09/2024
Last differential backup taken 22/10/2024
If this works, then all is good except for the times.
If it doesn't, we've got problems.
October 23, 2024 at 1:00 pm
Keep in mind, I'd expect your VM backup application to have some method to "turn off" backing up the SQL databases, or at least causing it to take a normal Full backup of the databases. Possibly an option to make any database backups be a copy only backup.
Definitely work with the infrastructure team to look into it. Possibly the worst-case solution would be to configure the VM backup to exclude the volume(s) your databases are on (if possible.)
October 23, 2024 at 1:33 pm
I used the database I chose for a couple of reasons:-
The good news is I've just restored one of smaller databases successfully using the GUI (I just restored it under a slightly different name to stop it overwriting the current database.
I've checked a few tables and all the data appears to be there and accessible.
From that (although it is only a single test), it looks like we can successfully restore databases.
It now looks like it's just msdb.dbo.backupset that's telling lies.
According to the backupset table, the last full backup was taken 2024-10-11 11:34:20.000 and finished at 2024-10-11 11:34:28.000
According to my logging, the backup was between 2024-10-22 20:53:16.280 (start) and 2024-10-22 20:56:38.690 (finish).
Looks like I can't trust the backupset table for timings and I'll have to use my own logging.
Thanks for all the help folks.
October 23, 2024 at 1:44 pm
I'd still say you can trust the backupset table, because I'd still bet money (only a quarter, granted, but money) that the VM backup is causing that 8s backup entry. The good news, though, is that you CAN restore despite that, which also suggests to me that your infrastructure team / the VM backup software *IS* configured correctly to not screw up SQL backup chains.
Possibly what you could do for your reporting script is, configure it to exclude / ignore backups that would be about the time your VM backups are taken.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply