Need help--very urgent..

Question

Need help--very urgent..

RPSQL

Hall of Fame

Points: 3229
More actions
May 11, 2009 at 5:27 pm

#204136

Hi All,
Here's issue. I am not able to backup one of our production database. This is the error it gives...
BackupIoRequest::WaitForIoCompletion: read failure on backup device 'E:\Database File\abc.mdf'. Operating system error 23(Data error (cyclic redundancy check).).
What can I do? This is an urgent production issue....Please let us know your view asap....

Viewing 15 posts - 1 through 15 (of 34 total)

You must be logged in to reply to this topic. Login to reply

Suresh B. SSC-Insane Points: 22986 More actions · Answer 1

RPSql (5/11/2009)

read failure on backup device 'E:\Database File\abc.mdf'. Operating system error 23(Data error (cyclic redundancy check).).

quote]

Looks like .MDF file is currupted. Run DBCC CHECKDB.

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 2

I would agree that this points to a physical disk problem on your server. Your only recourse may be to recover from your previous backup at this point. Check your SQL Server Error Log for 824 and 825 errors in the log. Run drive diagnostics and then CHECKDB to find the extent of the damage.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Please run the following and post all the results here.

DBCC CHECKDB (< Database Name > ) WITH NO_INFOMSGS, ALL_ERRORMSGS

Take a look at this article. http://www.sqlservercentral.com/articles/65804/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

RPSQL Hall of Fame Points: 3229 More actions · Answer 4

The Database size is 320 GB. I have started to run DBCC Checkdb on that database yesterday. It is running since 20 hours....Any idea why it is taking so long and why it is not completed yet?

In activity monitor it is showing Wait type FCB_REPLICA_WRITE. What can I do now?????

Please answer as it is a production issue and our application is down currently.

Thanks in advance.

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 5

Unless you are normally running CHECKDB (a recommended practice to catch these problems early on and reduce downtime/risk of data loss) and can look at the historical runs for how long it takes to run, there isn't a whole lot that you can do except wait it out. If you stop it, you won't get the information that you need to help resolve the problems.

RPSQL Hall of Fame Points: 3229 More actions · Answer 6

Thanks for your reply. I was littlebit worried as this database is not much large, but it's already a day since I am running this process...It's really taking so long....The thread in activity monitor is suspended and wait type is 'FCB_REPLICA_WRITE'.

Is this normal?

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 7

Per the Book Online, that wait type signals the following:

Occurs when the pushing or pulling of a page to a snapshot (or a temporary snapshot created by DBCC) sparse file is synchronized.

http://msdn.microsoft.com/en-us/library/ms179984.aspx

Based on that I'd say yes it is normal.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

RPSql (5/12/2009)
It is running since 20 hours....Any idea why it is taking so long and why it is not completed yet?

Maybe because there is corruption.

The checkDB algorithms are written in such a way that they can tell quickly if there is corruption or not, but if there is, then SQL has to go back and do extra detailed searches. it's called a 'deep-dive' and it can make the CheckDB time go up massively.

Wait until it's finished. To tell what's wrong we need the results. If you stop it now you're just going to have to run it to completion later.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

BLADE-1043594 SSCrazy Points: 2576 More actions · Answer 9

The file might be a corrupted one

[font="Comic Sans MS"]+++BLADE+++[/font]:cool:

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 10

Just a thought too, but while the DBCC continues you might want to start looking at backups and ensuring that you have one available and ready to go if the corruption is such that it can't be fixed. So, find the most recent good backup and get it available (if on tape, get it off tape). This can save you some time at the end of this process and allow you to get back online faster if you do have to perform a restore.

Hopefully you won't have to use it though.... 🙂

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

RPSQL Hall of Fame Points: 3229 More actions · Answer 11

Hi Gilamonster,

Thanks for your help. I will wait till this process complete and so as our application team..I will post the output here, please help if you can?

One more thing, I have the last backup on this database which is four days older. We didn't get notified as we are using IBM Tivoli to take SQL Backups directly to Tape. Our Storage people notify us after yesterday which is after 3 days!!!!...

Thanks again..

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 12

David Benoit (5/12/2009)
Just a thought too, but while the DBCC continues you might want to start looking at backups and ensuring that you have one available and ready to go if the corruption is such that it can't be fixed. So, find the most recent good backup and get it available (if on tape, get it off tape). This can save you some time at the end of this process and allow you to get back online faster if you do have to perform a restore.
Hopefully you won't have to use it though.... 🙂

You might look at replacement hardware as well. CRC failures are generally physical disk failure and requires replacing the bad disk(s) to rectify the problem.

David Benoit SSC-Dedicated Points: 34562 More actions · Answer 13

Yeah, I have a report that looks at last day of a database backup to avoid things like that. Hopefully you won't have to use the backup. Regardless, have them make sure the tape is available.

David

@SQLTentmaker

“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 14

David Benoit (5/12/2009)
Just a thought too, but while the DBCC continues you might want to start looking at backups and ensuring that you have one available and ready to go if the corruption is such that it can't be fixed. So, find the most recent good backup and get it available (if on tape, get it off tape). This can save you some time at the end of this process and allow you to get back online faster if you do have to perform a restore.

I'd also start checking system event logs, RAID controller/SAN logs, etc. Corruption's usually an IO problem.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass