May 8, 2015 at 7:26 am
Last night's backup failed - message BACKUP detected an error on page (1:1337093).
I ran Checkdb and got some errors. First this right at the top:
DBCC results for 'NCVLive'.
Service Broker Msg 9675, State 1: Message Types analyzed: 14.
Service Broker Msg 9676, State 1: Service Contracts analyzed: 6.
Service Broker Msg 9667, State 1: Services analyzed: 3.
Service Broker Msg 9668, State 1: Service Queues analyzed: 3.
Service Broker Msg 9669, State 1: Conversation Endpoints analyzed: 0.
Service Broker Msg 9674, State 1: Conversation Groups analyzed: 0.
Service Broker Msg 9670, State 1: Remote Service Bindings analyzed: 0.
Service Broker Msg 9605, State 1: Conversation Priorities analyzed: 0.
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:1337094) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:1337095) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).
CHECKDB found 0 allocation errors and 2 consistency errors not associated with any single object.
Then these consistency errors on one table:
DBCC results for 'ep_careplan'.
Msg 8928, Level 16, State 1, Line 1
Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data): Page (1:1337093) could not be processed. See other errors for details.
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data), page (1:1337093). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12716041 and -4.
Msg 8976, Level 16, State 1, Line 1
Table error: Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data). Page (1:1337093) was not seen in the scan although its parent (1:775941) and previous (1:1337092) refer to it. Check any previous errors.
Msg 8928, Level 16, State 1, Line 1
Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data): Page (1:1337094) could not be processed. See other errors for details.
Msg 8980, Level 16, State 1, Line 1
Table error: Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data). Index node page (1:775941), slot 169 refers to child page (1:1337094) and previous child (1:1337093), but they were not encountered.
Msg 8928, Level 16, State 1, Line 1
Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data): Page (1:1337095) could not be processed. See other errors for details.
Msg 8980, Level 16, State 1, Line 1
Table error: Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data). Index node page (1:775941), slot 170 refers to child page (1:1337095) and previous child (1:1337094), but they were not encountered.
Msg 8978, Level 16, State 1, Line 1
Table error: Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data). Page (1:1337096) is missing a reference from previous page (1:1337095). Possible chain linkage problem.
There are 2426647 rows in 38487 pages for object "ep_careplan".
CHECKDB found 0 allocation errors and 8 consistency errors in table 'ep_careplan' (object ID 122483515).
This: SELECT name FROM sys.indexes WHERE object_id = 122483515 and index_id = 1
tells me index id 1 is the Primary Key clustered index.
TX log backups are still running without error. My last good full is from two night ago. Last DBCC run clean was 3 days ago. Can this be fixed? Should I restore the last backup and logs elsewhere and bring the table over? Can I even do that? There are Fkeys and a trigger on this table? If necessary, shouldn't I be able to restore the FULL and all the logs up to now and not expect the errors to be present?
Should I get the users off this application?
Thanks all.
May 8, 2015 at 7:39 am
Before you do anything, please run the following and post the full, complete, unedited output
DBCC CheckDB('<database name>') WITH NO_INFOMSGS, ALL_ERRORMSGS
Probably, if you have the time, best fix will be to restore over the damaged DB with the clean full (one taken BEFORE the last known good CheckDB), then restore all log backups (but please test that elsewhere first and make sure it really is clean). Two of the errors you posted don't relate to a table, so dropping the table and re-importing probably won't work, but I need to see the entire checkDB output to be sure.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 8, 2015 at 7:47 am
I started the checkdb. It should take <10 minutes. I just had our sysadmins add another drive letter to this VM so I can restore the last good backup elsewhere and run checkdb on it.
May 8, 2015 at 7:54 am
Here's the results:
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:1337094) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:1337095) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).
CHECKDB found 0 allocation errors and 2 consistency errors not associated with any single object.
Msg 8928, Level 16, State 1, Line 1
Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data): Page (1:1337093) could not be processed. See other errors for details.
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data), page (1:1337093). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12716041 and -4.
Msg 8976, Level 16, State 1, Line 1
Table error: Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data). Page (1:1337093) was not seen in the scan although its parent (1:775941) and previous (1:1337092) refer to it. Check any previous errors.
Msg 8928, Level 16, State 1, Line 1
Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data): Page (1:1337094) could not be processed. See other errors for details.
Msg 8980, Level 16, State 1, Line 1
Table error: Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data). Index node page (1:775941), slot 169 refers to child page (1:1337094) and previous child (1:1337093), but they were not encountered.
Msg 8928, Level 16, State 1, Line 1
Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data): Page (1:1337095) could not be processed. See other errors for details.
Msg 8980, Level 16, State 1, Line 1
Table error: Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data). Index node page (1:775941), slot 170 refers to child page (1:1337095) and previous child (1:1337094), but they were not encountered.
Msg 8978, Level 16, State 1, Line 1
Table error: Object ID 122483515, index ID 1, partition ID 72057595327348736, alloc unit ID 72057595334688768 (type In-row data). Page (1:1337096) is missing a reference from previous page (1:1337095). Possible chain linkage problem.
CHECKDB found 0 allocation errors and 8 consistency errors in table 'ep_careplan' (object ID 122483515).
CHECKDB found 0 allocation errors and 10 consistency errors in database 'NCVLive'.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (NCVLive).
May 8, 2015 at 8:20 am
Cool.
Looking at that again, your plan of dropping (or truncating) the table and then reloading will work. You'll have to drop foreign keys and disable the trigger before truncating it (and please truncate, not delete from) and then recreate the foreign keys and re-enable the trigger after you reload the table.
Just check that the backup that you're planning to restore to act as source is not damaged before you truncate the table in question.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 8, 2015 at 8:28 am
I'm restoring now to a new database. I check its integrity as soon as I can.
May 8, 2015 at 8:50 am
So I restored the full and all the logs I have and the dbcc ran clean. I just looked and the problem table has more rows in the live db than my restored copy. My restored copy is current up to about an hour ago. So in order to replace just the table, I'd have to get the users off, get the last log, restore it all again, then deal with moving the table's contents. Seems it would be less time to just restore over the existing database. And probably safer.
May 8, 2015 at 9:02 am
It'll work fine too.
Take a final log backup before you start the restore. Backup log ... with norecovery. Also take a full backup, just in case. You'll probably have to use continue_after_error, but no such thing as too many backups.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 8, 2015 at 10:39 am
Back online and all better. Thanks much for your time.
May 8, 2015 at 11:02 am
\o/
Excellent. All too often corruption problems posted here are unfixable because of a lack of backups or corruption that's been ignored too long or similar.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 8, 2015 at 11:11 am
And now you need to go and do some root-cause analysis. Corruption is, in the vast majority of cases, an IO subsystem problem. Make sure drivers and firmware are up to date, that the batteries on any write controllers are still effective and look through logs to see what (if anything) reported an error, failure or problem.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 8, 2015 at 1:51 pm
Yes indeed. But I have to rely on our SysAdmins for that. Besides my being completely ignorant about today's hardware, it is their domain. The last time I had some database corruption it was due to a fiber channel throwing errors on a VM host. They were aware of the errors and motioned all the servers off the host, but not before I had a db throw 823 errors. I don't think the root cause of this morning's issue is as obvious.
May 8, 2015 at 4:00 pm
True and likely not to be as obvious, but it still needs doing.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply