July 30, 2016 at 10:16 am
Following a SAN issue one database was marked Suspect. No other corruption was apparent on any of the system dbs or other user dbs. With CHECKDB showing extensive inconsistencies the recover was restoring from recent .bak & .trn backups. Following the restores CHECKDB across the whole instance was successful and the application was brought back online.
However since then the daily backup have continually failed, whilst CHECKDB is always happy across all the whole instance. The Full backups always produce the same error (have also included continue_after_error) – and always the backup gets to the same point 70% before terminating
Msg 3203, Level 16, State 1, Line 3
Read on "mydb.mdf" failed:
23(failed to retrieve text for this error. Reason 15105)
Msg 3203, Level 16, State 1, Line 3
BACKUP DATABASE is terminating abnormally.
I also see in the System Event log
The device, \Device\Harddisk2\DR2, has a bad block.
The environment is VMWare - Windows Server 2008 R2 SP1 running SQL 2008R2 SP3, Standard Edition and an 11GB database which is separated across 3 files – Primary, Data, Index.
The server itself was restarted a couple of days ago to try address the backup issue. SQL Server restarted OK and reports no errors in either the application or system event logs until the backup is tried. Log Backups are being taken and succeed but now there is no valid chain to recover from. The position is worsening with no valid backup now for over a week.
Other forums suggest this is due to file access/permissions or disk space for the backup to finish but this is not the case.
I’ve considered is detaching/stopping SQL and copying the mdf, ndf, ldf files to another server, but I don't want to stop the system again until I feel more confident with the data being protected. I’m pulling together a process to export all the DB objects and bulk copy all the data out into another database.
If the problem is with the Primary file (mdf) why is the CHECKDB not picking it up? I would welcome any thoughts, further checks I could run to help identify what is wrong with this data file and if there any workarounds to get the backups working.
Cheers Unsy.
July 30, 2016 at 12:30 pm
Could be it's in an unallocated page of an allocated extent (8 contiguous pages). CheckDB reads (afaik) allocated pages, backups copy out allocated extents
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 30, 2016 at 2:25 pm
Thanks Gail.
I presume then there's no easy way to confirm this? and do you think there is a chance that shrinking the primary file (which is showing the error) could possibly release the corrupt extent and resolve the issue.
I have no history about the system/database other than the problem which being cause by an issue with the SAN. I still plan to get a maintenance window and export all the data before attempting anything that might risk the whole database integrity
July 30, 2016 at 2:41 pm
Not corrupt extent. Unallocated but unreadable page in an allocated extent (otherwise CheckDB would fall over it), and it's a guess.
Not easy at all to confirm, shrink may work, more likely to fall over the damaged page. Rebuilding all indexes may fix, may fall over. Moving indexes into another filegroup may work too.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 30, 2016 at 2:55 pm
Thanks again Gail - what you've clarified is making more sense to me. Can I run by you my plan?
- agree maintenance window
- export all the data via an SSIS package
- stop the SQL service
- copy the current data files to another server
- attach DB on new server, attempt the index rebuilds, file shrink
- see if this allows backup to complete
Unsy
July 31, 2016 at 5:43 am
Do the last 3 steps one at a time and see if the backup works after each one (after the copy, after the rebuild)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 9, 2016 at 3:18 am
Just to say that we have concluded the mdf file is beyond repair. To share the scenario again
With the bad sector in the mdf file
* the T-Log backups succeeded
* the Database was still accessible/functioning
* and CHECKDB were appearing good
However
* Full & Diff backups failed
* the MDF file could not be copied when the DB was detached
* the DB could still be be reattached in situ
Due to some poor file management & the delayed identification of this whole issue
* the log chain become broken (due to limited log backups retention)
* the only solution was to restoring an old backup and painful copy out of the data
Thanks Gail for you responses.
September 18, 2018 at 10:09 am
Just came here to share I was able to restore the file from this situation.
Solution is not to just do "file shrink" -> "release unused space", but do as "file shrink" -> "reorganise pages before releasing unused space" (and choose some lower value in MB)
Hence, if you are lucky you can get your file recovered (try couple of times, with x/2 MB reduction). Apparently reorganise pages would be doing some copy/paste in the background before releasing which helped.
January 28, 2020 at 12:54 pm
Dear Sir,
The below issue i am also facing.The san drive only have 800 MB and database size is 5GB.
But i don't have backup because cleanup job is running.
Now when i take backup 80% is full after error : 1117 showing(i/o device error).
Kindly give any solutions.
With the bad sector in the mdf file
* the T-Log backups succeeded
* the Database was still accessible/functioning
* and CHECKDB were appearing good
However
* Full & Diff backups failed
* the MDF file could not be copied when the DB was detached
* the DB could still be be reattached in situ
Due to some poor file management & the delayed identification of this whole issue
* the log chain become broken (due to limited log backups retention)
* the only solution was to restoring an old backup and painful copy out of the data
January 28, 2020 at 6:01 pm
Fix the issues with the disks - it sounds like it's more hardware related so check the windows event logs and fix whatever the issue is with the disks, controller.
Sue
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply