database backups fail with BackupIoRequest::WaitForIoCompletion: read failure on backup device

  • my backups started to fail periodically with the following errors:

    Nonrecoverable I/O error occurred on file 'd:\Microsoft SQL Server\MSSQL\Data\commodity\commodity_data.mdf'.

    [Microsoft][ODBC SQL Server Driver][SQL Server]BACKUP LOG is terminating abnormally.

    Or

    [Microsoft SQL-DMO (ODBC SQLState: 42000)] Error 3202: [Microsoft][ODBC SQL Server Driver][SQL Server]Write on 'D:\Microsoft SQL Server\MSSQL\BACKUP\bx01t\bx01t_db_200901261049.DAT' failed, status = 2. See the SQL Server error log for more details.

    [Microsoft][ODBC SQL Server Driver][SQL Server]BACKUP DATABASE is terminating abnormally.

    Error logs show this:

    2009-01-26 10:13:38.35 spid6 Error: 823, Severity: 24, State: 3

    2009-01-26 10:13:38.35 spid6 I/O error 2(The system cannot find the file specified.) detected during write at offset 0x00000059594000 in file 'd:\Microsoft SQL Server\MSSQL\Data\commodity\commodity_data.mdf'..

    2009-01-26 10:13:42.59 spid53 LogEvent: Failed to report the current event. Operating system error = 31(A device attached to the system is not functioning.).

    When ran DBCC check db got 0 errors.

    Any help will be greatly appreciated

  • That looks like problems with the IO subsystem. Possibly intermittent right now.

    Check the windows event log for anything indicating hardware problems, check the SQL error log, check any logs from your drives.

    Consider getting a copy of that database elsewhere. Hard IO errors and no backups are a very risky combination.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thank you, I checked event viewer and found these warnings Event Type:Warning

    Event Source:Disk

    Event Category:None

    Event ID:51

    An error was detected on device \Device\Harddisk1 during a paging operation.

    And also that disk D has problems. But when our IT ran chckdisk on it it repported 0 errors.

  • llokshin (1/26/2009)


    But when our IT ran chckdisk on it it repported 0 errors.

    Still, if it's giving errors there's a problem. You don't want to find the problem when the drive fails completely. It could be an intermittent IO problem at the moment.

    Is this part of a RAID array, or is it just a single drive? Can you move the database elsewhere?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I just looked at other logs per your advice and in this RAID log I saw this January 26, 2009 1:23:25 AM EST WRN 338:A01C-S--L-- bfcdb01 Periodic scan found one or more critical logical drives: controller 1. Repair as soon as possible to avoid data loss.

    January 26, 2009 1:23:58 AM EST WRN 215:A01C-S--L-- bfcdb01 One or more logical drives contain a bad stripe: controller 1

    January 26, 2009 9:23:36 AM EST WRN 338:A01C-S--L-- bfcdb01 Periodic scan found one or more critical logical drives: controller 1. Repair as soon as possible to avoid data loss.

    January 26, 2009 9:24:10 AM EST WRN 215:A01C-S--L-- bfcdb01 One or more logical drives contain a bad stripe: controller 1

    January 26, 2009 10:20:14 AM EST WRN 405:A01C1S00L-- bfcdb01 PFA detected for drive: controller 1, channel 1, SCSI ID 0 (FRU Part # 34L5429)

    January 26, 2009 10:20:14 AM EST WRN 405:A01C1S04L-- bfcdb01 PFA detected for drive: controller 1, channel 1, SCSI ID 4 (FRU Part # 34L5429)

    January 26, 2009 10:20:14 AM EST WRN 301:A01C-S--L02 bfcdb01 Logical drive is critical: controller 1, logical drive 2

    January 26, 2009 10:20:14 AM EST WRN 504:A01C-S--L-- bfcdb01 Enclosure fan 2 is malfunctioning: controller 1, channel 1

    January 26, 2009 10:20:14 AM EST WRN 510:A01C-S--L-- bfcdb01 Enclosure power supply 1 is malfunctioning: controller 1, channel 1

    January 26, 2009 10:20:14 AM EST ERR 404:A01C1S04L-- bfcdb01 Defunct drive - SCSI error: controller 1, channel 1, SCSI ID 4 (FRU Part # 34L5429)

    January 26, 2009 10:20:30 AM EST WRN 215:A01C-S--L-- bfcdb01 One or more logical drives contain a bad stripe: controller 1

    Asked our IT where I can copy my dbs from drive D. Don't have anything available there yet. But in your opinion there is no way to repair the drive?

  • It started with one db1 backup failure, then It worked for a couple of days, now this db1 (full and trans) and other db2 (full) are failling. But still there some not failing backups, even the trans backup for db2 is working.

  • llokshin (1/26/2009)


    But in your opinion there is no way to repair the drive?

    No idea. I know databases, not drives

    That said, from those errors it appears that you have one drive that's completely failed, and maybe some problems with the controller or other drives. Get those disks replaced! It's pointless trying to fix backup errors when the underlying drives are failing.

    Do a backup somewhere else (removable drive, network, anywhere else), then get your IT people to fix those drives, before another drive fails and you lose the database completely.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thank you, that's what I am on now, making IT to move faster to give me other options, thanks god it is only testing environment.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply