Need Help in troubleshooting backup/restore issues

Question

Need Help in troubleshooting backup/restore issues

Lori Kontny

SSC Enthusiast

Points: 107
More actions
August 17, 2005 at 2:24 pm

#110589

I'm hoping that there is someone out there to provide some possible insight into various SQL backup problems that I'm trying to resolve.
Problem 1: SQL backup/restore times vary dramatically for the same database. For 1+ weeks the restore of a 20 GB file will run in < 1 hour. Suddenly over night the restore will begin taking > 2 hours
Problem 2: SQL backups will run for multiple weeks completing successfully, then s uddenly start failing with the following error:
"Write on [backup device] failed, status = 64. See the SQL Server error log for more details. [SQLSTATE 42000] (Error 3202) Write on [backup device] failed, status = 64. See the SQL Server error log for more details. [SQLSTATE 42000] (Error 3202) BACKUP DATABASE is terminating abnormally. [SQLSTATE 42000] (Error 3013). The step failed."
Here are the particulars of the situation.
20 instances of SQL server running. All of them backing up to a centralized DFS location. This location is accessed by UNC path not by drive letter. SQL servers experiencing the problem are running either Windows2000 SP4 or Windows 2003 SP1, SQL Server SP3a.
I have worked with the server group at my company to try to evenly spread our SQL backup jobs so that there aren't periods of higher contention. In doing this I have changed the scheduled time of some of the jobs. Initially this appeared to help the problem, however, now it appears that the problem has returned.
If any one has any troubleshooting tips I would greatly appreciate hearing them!
Thanks for your time.

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

David Scotland-132255 SSCertifiable Points: 5599 More actions · Answer 1

I had a similar issue as I too back up all our sql server databases to a central location on the network accessed by a unc path.

I initially did the same as what you have done and initially it did seem to solve the problem but like you it also returned.

What seems to have been causing us the problem was the overnight backups on the network specifically if the sql servers were trying to back up to the central area on the network when veritas was also trying back up the server on which this was located. I have since made sure that the backups are carried out at a time which does not clash with veritas backup and have not had the issue for 5 months now.

hth

David

Yermom SSC Rookie Points: 40 More actions · Answer 2

Had similar issue and would experience "delayed write" errors on either the SQL server OR the backup destination server. It seemed that the disk I/O could not keep up with the network I/O on our gigabit network. Rearranging the backup times/orders would clear it up for a while but the problem would inevitably return. Upgrading NIC drivers cleared it up -- they are apparently more forgiving and queue up better during times of potential disk contention bottlenecks.

Jake

Lori Kontny SSC Enthusiast Points: 107 More actions · Answer 3

Thank you both for your responses. I verified that the NIC drivers are updated on all involved servers to the most recent drivers.

I'm still working with the server team to resolve any potential conflicts with veritas. Our current suspicion is that the veritas backup of the server file system is occuring while the SQL backup/restore is running. This could be causing conflict for disk read/writes on the servers involved.

If changing the File system backup time doesn't improve times, I still need more ideas on where I can look for solutions!

CJohnson-232084 SSCommitted Points: 1512 More actions · Answer 4

You have probably already considered this, but most of the times I have received the 3202 error in this context, the destination disk was full. A couple of times it was a problem with the RAID. The RAID problem seemed to resolve itself (never did figure out exactly what happened.)

Lori Kontny SSC Enthusiast Points: 107 More actions · Answer 5

Thank you for the idea. However, I'm certain that there is available disk. Last we looked there was > 1 TB available.

sswords SSCrazy Eights Points: 8207 More actions · Answer 6

If you're looking for a consensus, I would tend to agree that Veritas is most likely your problem, and I've seen similar situations with other third party backup tools causing resource contention. Ideally, what we try to do is have all of the SQL Server native backups (at least the full DB ones) completed before the tape backups kick off. Unfortunately this still doesn't prevent the occasional log backup failure however. You may also want to ask your network guys if they're using the latest client and patches for Veritas. We use another product and used to see this quite a bit. It's virtually disappeared now running the later backup clients and patches on the servers. Hope this helps.

My hovercraft is full of eels.