Today I'd like to go over a new and straight forward feature added to SQL
2005 called a mirrored backup. This gives you the ability to backup to two or
more locations simultaneously (which is not the same thing as a striped backup,
that's a different topic!).
Why is this in interesting feature? I think most of us would agree that
having a good backup is job one for a DBA. But the job doesn't end with the
backup, there is still room for things to go wrong. Backup hard drives fail,
tape drives eat the tape, network admins forget to change the tape, etc. I think
it's fairly common now for DBA's to adopt a disk to disk to tape strategy,
meaning they backup the databases to a hard drive they have access to, then have
that backed up to tape. This provides the ability to easily restore from
yesterday without waiting on someone to find the tape, guards against the
possibility that the tape backup didn't work, and gives you a measure of
redundancy (limited because you'll have only a couple days on disk, tapes should
go back for months). This can still leave open the possibility of losing
everything if the facility is destroyed. Having someone take the tape home each
night or using an off site service is always a good idea, but there is a human
element involved in both of those, and we are prone to fail from time to time.
The answer to the last possibility has been to copy or FTP the backup files
to a different location - down the hall, across the campus, across the country
even. Setting it up isn't terribly difficult, but I bet that the number of
servers where something like this is being done is a small percentage in large
part because we tend to rely on maintenance plans until we need a more complex
strategy. So now we have an extension to the backup syntax that lets us do
something similar to our old copy/FTP process, but it's not baked into the
maintenance plans, so I suspect it won't see a lot of user either.
Let's look at how it works. I started with a standard full backup of
Adventureworks on one of the machines in our classroom.
backup database adventureworks to disk='c:\advtest1.bak' Processed 2 pages for database 'adventureworks', file 'AdventureWorks_Log' on file 1. BACKUP DATABASE successfully processed 20938 pages in 8.294 seconds (20.680 MB/sec).
To get an idea of how much the network would add to the process, I did the
same backup to another machine in the classroom:
backup database adventureworks to disk='\\train4\backup\advtest1.bak' Processed 20936 pages for database 'adventureworks', file 'AdventureWorks_Data' on file 1. Processed 2 pages for database 'adventureworks', file 'AdventureWorks_Log' on file 1. BACKUP DATABASE successfully processed 20938 pages in 21.017 seconds (8.161 MB/sec).
The time is probably off just a little due to the log backup at the end, but
clearly we can see that it's taking almost twice as long. I received a similar
time when I just copied the file across via xp_cmdshell and the copy command,
but it might be possible to do it faster with xcopy or robocopy, and I didn't
test larger files where any differences might have been more visible. Now let's
try a mirrored backup:
backup database adventureworks to disk='c:\advtest3.bak' mirror to disk='\\train4\backup\advtest3.bak' with format, description='test'
The syntax is easy enough. Gotchas? The syntax requires the 'with format'
clause to work, and you can only mirror to devices of the same type (all disk or
all tape). Now let's look at the results:
Processed 20936 pages for database 'adventureworks', file 'AdventureWorks_Data' on file 1. Processed 2 pages for database 'adventureworks', file 'AdventureWorks_Log' on file 1. BACKUP DATABASE successfully processed 20938 pages in 21.179 seconds (8.098 MB/sec).
As you can see the total time for the backup was equal to the time it took to
do a single backup across the network. BOL states that the backups are
synchronous. That makes sense because you want to only run through the database
pages one time so you'd have to successfully copy the page to all locations
(essentially a distributed transaction) before moving to the next page. I was
curious to see what would happen during a failure so I ran the test again, but
disconnected the network cable as soon as the backup had started. After a couple
seconds I received this error:
Msg 3202, Level 16, State 2, Line 1
Write on "\\train4\backup\advtest9.bak" failed: 64(The specified network name is
no longer available.)
Msg 3271, Level 16, State 1, Line 1
A nonrecoverable I/O error occurred on file "\\train4\backup\advtest9.bak:"
64(The specified network name is no longer available.).
Msg 3013, Level 16, State 1, Line 1
BACKUP DATABASE is terminating abnormally.
At this point I had no BAK file on the local drive, but I did have a 37k file
on the network share (full file would have been about 168k). It makes sense that
the file got left behind as SQL had no way of knowing that the network would
drop and at that point the command had terminated. Worth remembering to have a
check for old files on remote shares if you do decide to use mirroring. The
bigger problem here is that I wanted to add mirroring to accomplish what I used
to do with copy/FTP but it has one big difference; if the copy/FTP fails I still
have the original backup, but if the mirror backup fails I have no backup! The
synchronous nature of the mirroring makes for an all or none experience.
It is worth the risk of not having any backup the one time the job fails
compared to just writing a few lines of code in a job to copy the file after the
backup? There are a few variables to consider; how critical is one missing
backup, would you notice and take action if the job failed, are you mirroring to
local drives or across a slow/fast network. I think it's hard to recommend using
it, especially since it's not a standard part of the maintenance plans and it's
not hard to implement the copy after backup functionality.
Changes I'd like to see:
- Add a switch that would let me set a configurable timeout so that if one
mirror was taking an abnormally long time it could be abandoned and the rest
of the job would continue
- Continue the backup if any mirror fails (network connection, out of
space)
- Add it to the maintenance plans (along with the option to copy after
instead of mirror)
Are you using backup mirroring, either the native SQL one or perhaps one in a
third party tool? I look forward to your comments.
Andy is a trainer with End to End Training (along with fellow SSC founders
Steve Jones & Brian Knight) focusing on beginning to intermediate level students
in performance tuning, replication, and administration. He is also a frequent
speaker at the PASS Summit and Code Camps in Florida. Visit the web site at
http://www.endtoendtraining.com.