January 28, 2009 at 4:15 pm
NOTE: This is cross-posted on sqlteam.com - if this is considered poor form, my apologies; feel free to delete the thread. In any event:
Usual sort of horror story - we've been accumulating 824 errors in this database since 2009-1-9 without knowing it, so we don't have a good backup any more. How we got to this point is a rant for another time.
Right now, I need to identify exactly how far up the creek we are; any assistance in parsing the DBCC CHECKDB results would be much appreciated. Those results, unfortunately, are too massive to post here (hundreds of pages worth), but break down into a few distinct types:
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID -8839967932860596224 (type Unknown), page (25856:1979740672). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12584969 and -1.
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID -5332261958806667264 (type Unknown), page (0:0). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12584969 and -1.
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID -5185783918850211840 (type Unknown), page ID (1:2260184) contains an incorrect page ID in its page header. The PageId in the page header = (8192:1912628480).
.
.
.
Msg 8928, Level 16, State 1, Line 1
Object ID 1484565205, index ID 2, partition ID 72057630310465536, alloc unit ID 72057683410812928 (type In-row data): Page (1:2259690) could not be processed. See other errors for details.
Msg 8928, Level 16, State 1, Line 1
Object ID 1484565205, index ID 2, partition ID 72057630310465536, alloc unit ID 72057683410812928 (type In-row data): Page (1:2259691) could not be processed. See other errors for details.
.
.
.
Msg 8979, Level 16, State 1, Line 1
Table error: Object ID 2023919360, index ID 1, partition ID 72057630667309056, alloc unit ID 72057684265861120 (type In-row data). Page (1:2262424) is missing references from parent (unknown) and previous (page (1:2262007)) nodes. Possible bad root entry in system catalog.
Msg 8977, Level 16, State 1, Line 1
Table error: Object ID 2023919360, index ID 1, partition ID 72057630667309056, alloc unit ID 72057684265861120 (type In-row data). Parent node for page (1:2262425) was not encountered.
As far as I can see, those errors (8939, 8909, 8928, 8979, 8977) constitute the entirety of the error output. The first batch of errors(all 8939 and 8909) is followed by a block that's more intepretable (and postable):
CHECKDB found 0 allocation errors and 2359 consistency errors not associated with any single object.
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 58, index ID 2, partition ID 562949957222400, alloc unit ID 72057594038059008 (type In-row data), page ID (1:2259769) contains an incorrect page ID in its page header. The PageId in the page header = (4978:-2086430897).
CHECKDB found 0 allocation errors and 1 consistency errors in table 'sys.sysbinobjs' (object ID 58).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 69, index ID 2, partition ID 562949957943296, alloc unit ID 72057594039762944 (type In-row data), page ID (1:2259486) contains an incorrect page ID in its page header. The PageId in the page header = (52216:-1).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 69, index ID 2, partition ID 562949957943296, alloc unit ID 72057594039762944 (type In-row data), page ID (1:2259845) contains an incorrect page ID in its page header. The PageId in the page header = (8448:33030147).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 69, index ID 2, partition ID 562949957943296, alloc unit ID 72057594039762944 (type In-row data), page ID (1:2259913) contains an incorrect page ID in its page header. The PageId in the page header = (52216:-1).
CHECKDB found 0 allocation errors and 3 consistency errors in table 'sys.sysrts' (object ID 69).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 74, index ID 1, partition ID 281474981560320, alloc unit ID 281474981560320 (type In-row data), page ID (1:2260131) contains an incorrect page ID in its page header. The PageId in the page header = (11776:905979648).
CHECKDB found 0 allocation errors and 1 consistency errors in table 'sys.syssingleobjrefs' (object ID 74).
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 93, index ID 2, partition ID 562949959516160, alloc unit ID 72057594039173120 (type In-row data), page (44032:20). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12584969 and -1.
CHECKDB found 0 allocation errors and 1 consistency errors in table 'sys.sysxmlplacement' (object ID 93).
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 469576711, index ID 2, partition ID 72057594039107584, alloc unit ID 72057594043498496 (type In-row data), page ID (1:2259890) contains an incorrect page ID in its page header. The PageId in the page header = (9984:33030147).
cMsg 8909, Level 16, State 1, Line 1
Table error: Object ID 517576882, index ID 2, partition ID 72057594039304192, alloc unit ID 72057594043695104 (type In-row data), page ID (1:2260040) contains an incorrect page ID in its page header. The PageId in the page header = (9984:33030147).
CHECKDB found 0 allocation errors and 1 consistency errors in table 'DatasetTypeSchemaTypeManagementPackVersion' (object ID 517576882).
There are more blocks of errors, but they're also (like the last several above) easily attributable to user tables. It's really the first batch of 2,359 errors that aren't associated with any single object, as well as the errors tied to the sys.* tables that have me thinking we're completely out of luck.
Notably, the CHECKDB ends with the entirely inauspicious:
...
Msg 8928, Level 16, State 2, Line 1
Object ID 2050106344, index ID 4, partition ID 72057630786912256, alloc unit ID 72057684552712192 (type In-row data): Page (1:2264554) could not be processed. See other errors for details.
CHECKDB found 0 allocation errors and 233 consistency errors in table 'TypedManagedEntity' (object ID 2050106344).
CHECKDB found 0 allocation errors and 6913 consistency errors in database 'OperationsManagerDW'.
Rather than the normal "DBCC execution completed. If DBCC printed error messages, contact your system administrator."
Given our lack of known-good backups, I've already taken a backup of the broken database, and tried running REPAIR_REBUILD (unsurprisingly ineffective) and REBUILD_ALLOW_DATA_LOSS without success. I'm in the process now of trying to extract at least the logical structure of the DB, since reinstalling the software (MSSCOM, in this case) will be a giant PITA. So any input is much appreciated - even if it's just to confirm my "we're SOL" hypothesis, so I don't have to worry I'm missing a good fix.
Thanks!
______
Twitter: @Control_Group
January 28, 2009 at 5:07 pm
Yup - you're screwed. Without a backup to restore from, repair is your only option. If REPAIR_ALLOW_DATA_LOSS doesn't fix everything (it's not able to fix all corruptions) then your only option is to extract as much as you can into a new database. Live and learn I'm afraid.
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
January 28, 2009 at 6:09 pm
Paul Randal (1/28/2009)
Yup - you're screwed. Without a backup to restore from, repair is your only option. If REPAIR_ALLOW_DATA_LOSS doesn't fix everything (it's not able to fix all corruptions) then your only option is to extract as much as you can into a new database. Live and learn I'm afraid.
Thanks for the confirmation.
The one bright side to this is it helps support my position that all SQL Server instances in the firm should at least fall under my purview in terms of basic maintenance.
Thanks again!
______
Twitter: @Control_Group
January 29, 2009 at 12:05 am
Matt Cherwin (1/28/2009)
The one bright side to this is it helps support my position that all SQL Server instances in the firm should at least fall under my purview in terms of basic maintenance.
Absolutely.
One other thing. You may want to have a look at the drives that this database was sitting on. 824 errors don't just happen. Something messed up the database. It's usually an IO issue. Check the windows event log, check any RAID or SAN logs that you may have. If possible, do some IO tests.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 29, 2009 at 7:35 am
GilaMonster (1/29/2009)
Matt Cherwin (1/28/2009)
The one bright side to this is it helps support my position that all SQL Server instances in the firm should at least fall under my purview in terms of basic maintenance.Absolutely.
One other thing. You may want to have a look at the drives that this database was sitting on. 824 errors don't just happen. Something messed up the database. It's usually an IO issue. Check the windows event log, check any RAID or SAN logs that you may have. If possible, do some IO tests.
Indeed. I've been working with our storage guy on this - the drive in question is a raw attached LUN that the VM hosting the DB could see. This is a problem because it's the only example of such in the firm, and none of us (myself most notably included) are real familiar with the approach.
The event log shows NTFS errors mounting the drive (which is almost certainly what caused the problem). But every time anyone actually logs onto the console, the drive exists and responds well to any tests we run. The storage architect has taken a look at the physical drives backing the LUN, and they appear to be working correctly.
We all know it's an IO failure of some kind, but we haven't been able to track down exactly where the failure is.
So rather than fixing the problem, we're moving the DB to a physical server that I had available; storing the data on local disk.
______
Twitter: @Control_Group
January 30, 2009 at 11:46 am
Something to play with - I ran into a similiar situation when during a rollback the client couldn't wait and told operation to reboot the server on a 6.5 database, they had no backups and repair allow data loss would work.
take the tables in question drop all indexes.
bcp out the data - you'll run into errors but you can skip over these the bad ones.
BCP back in the data
validate integrity
add indexes
deal with loss data.
Zach
John Zacharkan
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply