March 26, 2009 at 4:17 am
Hi all
I'm fairly comfortable with most facets of Windows server, but SQL Server is unfortunately one of my weaker areas, so forgive me if this appears to be unrelated to SQL itself, but at the moment I'm not sure if it's a SQL issue or network-related.
Our nightly backup software is BackupAssist, and is essentially just a user-friendly front-end for NTBackup. It runs on one of our DC's, and each night part of its job is to back up a series of SQL databases on two separate 2003 servers, across the network. As I understand it, the databases are are exported out of SQL, and copied over to a share on the DC into a .bak file, where they are then added to the tape backup routine.
One of the SQL servers runs SQL 2000 and contains 34 databases, the other runs SQL 2005 and holds 11 databases. Recently, we've been getting errors in the backup logs along the lines of these:
Unhandled exception in the MSSQL Engine
BACKUP DATABASE is terminating abnormally. Processed 186000 pages for database 'STS_SERVER1_1', file 'STS_SERVER1_1' on file 1. Processed 1 pages for database 'STS_SERVER1_1', file 'STS_SERVER1_1_log' on file 1.
and
A nonrecoverable I/O error occurred on file "\\PRIMARYDC\BA_SQL_1_140\SQLSERVER1-series6-20090325.211423.359-FULL.bak:" 64(error not found). BACKUP DATABASE is terminating abnormally. Processed 295552 pages for database 'series6', file 'series6' on file 1. Processed 118 pages for database 'series6', file 'series6_log' on file 1.
It started about a week ago and is occurring every night now, but usually only with one or two databases on each server; the rest back up fine. It tends to be the same 3 or 4 databases which generate these exceptions, but not every night - e.g. database STS_SERVER1_1 might error Monday night, backup fine Tuesday night, then error again Wednesday night.
I've tried backing up these databases manually from SQL Enterprise Manager and it backs them up ok.
Also it may be a red herring, but in case it's relevant I've also run chkdsk (in read-only mode at this stage) on both servers and both report several hundred large file records which need fixing.
Are there any integrity checks etc. I can run on these databases to see if they are corrupt, or any other fixes I can try? If the evidence points to this being more of a network issue, I will have to start checking our network infrastructure, but we back up over 1Tb of data across the network from other servers with no problems whatsoever (including other SQL data), so I'm inclined to think the problem does lie with these 2 servers.
Anyway I would really appreciate any pointers. Thanks 🙂
March 26, 2009 at 4:38 am
run DBCC CHECKDB on the database to check for any errors.
DBCC checkdb WITH ALL_ERRORMSGS , NO_INFOMSGS
http://msdn.microsoft.com/en-us/library/aa258278(SQL.80).aspx"> http://msdn.microsoft.com/en-us/library/aa258278(SQL.80).aspx
March 26, 2009 at 5:23 am
thanks for the reply Steve. Apologies but you're talking to an SQL novice here - would I run that command in Query Analyzer? Also can I specify by name the database I want to check?
thanks
March 26, 2009 at 5:26 am
Yes this T-SQL should be run in Query Analyser
connect to the DB server and run the code, you can specify the db name with this code
DBCC checkdb (YourDBName) WITH ALL_ERRORMSGS , NO_INFOMSGS
March 26, 2009 at 9:20 am
hmm, they came up with no errors at all.
I also noticed that the 3 databases which are intermittently giving these errors are the 3 largest databases, all are > 1.4gb , so I guess this would point to some sort of network dropout when they're copied over the network.
Oh well, at least that seems to have eliminated corrupt databases as a possible cause, time for some more network testing I think. Thanks for your help Steve.
March 26, 2009 at 9:45 am
That is good news that there isno corruption, I/O errors do usually mean that then network or disk was unavailable.
You may want to do restore with verify on some of your backups to check that they are recoverable in case of a disaster.
March 26, 2009 at 12:08 pm
This is not a problem with the databases - it is a problem with backing up across the network. If there are any issues with the network during the backup, the backup is going to fail. SQL Server backups do not handle network latency issues very well - if at all.
There are a couple of things you can try - some might help, might not.
One setting that people have used to correct some of these types of issues is SessTimeout for Lanmanworkstation. This is a registry setting change that you can find in the following document (http://support.microsoft.com/kb/961657). There are other setting changes here that you can try also.
Another setting is the ScavengerTimeLimit value in the same registry key. Here is a link to that document http://support.microsoft.com/kb/890553.
Another option is to reduce the throughput value in your backup utility. I use SQL Litespeed and dropping from 1MB to 64K was recommended.
Another recommendation is to use a mapped drive - but I am not sure how you can get that done with a utility like this.
And finally, you could backup to a local drive and then copy (or backup) from there.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
March 27, 2009 at 9:53 am
thank you, I'll try those kb suggestions too. Just one question - the recommended scavenger time limit value of 30, I presume that's in decimal, not hex? It's not very clear from the kb article.
March 27, 2009 at 1:19 pm
I am not really sure if they meant hex or decimal.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply