September 7, 2023 at 10:55 am
Hi SqlServerCentral members,
I have a client that has reported consistency errors in a db after running DBCC CHECKDB, the client hasn't seen any issues when processing data but they would like the db fixed.
I've run repair_rebuild but the errors still remain.
below is a example of few of the errors being reported. As my sql skills are limited I'm tempted to archive the db and start them off on a new db.
any help would be greatly appreciated.
kind Regards,
Shell_Scrape
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID -9223372036854775808 (type Unknown), page ID (1:2269805) contains an incorrect page ID in its page header. The PageId in the page header = (268:65536).
Repairing this error requires other errors to be corrected first.
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID -9223372036854775808 (type Unknown), page ID (1:2270433) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).
Repairing this error requires other errors to be corrected first.
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID -9223372036854775808 (type Unknown), page ID (1:2270468) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).
Repairing this error requires other errors to be corrected first.
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID -9223372036854775808 (type Unknown), page ID (1:2270572) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).
Repairing this error requires other errors to be corrected first.
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID -9223336577571225600 (type Unknown), page (4:4327424). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 2057 and -4.
Repairing this error requires other errors to be corrected first.
Msg 8929, Level 16, State 1, Line 1
Object ID 741577680, index ID 1, partition ID 72057594041204736, alloc unit ID 72057594046709760 (type In-row data): Errors found in off-row data with ID 583860224 owned by data record identified by RID = (1:234380:69)
Repairing this error requires other errors to be corrected first.
sg 8928, Level 16, State 1, Line 1
Object ID 741577680, index ID 1, partition ID 72057594041204736, alloc unit ID 72057594046775296 (type LOB data): Page (1:2269631) could not be processed. See other errors for details.
Repairing this error requires other errors to be corrected first.
Table error: Object ID 741577680, index ID 1, partition ID 72057594041204736, alloc unit ID 72057594046775296 (type LOB data). The off-row data node at page (1:2269631), slot 0, text ID 203751424 is referenced by page (1:2269615), slot 0, but was not seen in the scan.
Repairing this error requires other errors to be corrected first.
Msg 8928, Level 16, State 1, Line 1Msg 8965, Level 16, State 1, Line 1
September 7, 2023 at 4:33 pm
Data corruption is a tricky thing to correct and can result in larger problems down the road.
I'd either hire a DBA consultant who specializes in data recovery to resolve the issue IF the current data is important, and if the current data is not important, I'd start over like you said. But the very first thing I'd do is a backup of the current state and then stop users from accessing the database to reduce the chance of more corruption.
With multiple places showing signs of corruption, I would also be concerned about what caused it. It MAY be a bug in SQL Server, but that is not very likely. The more likely culprit, in my opinion, is hardware failure with that much corruption. I would be doing a chkdsk (again AFTER backing up the database to a different disk as you don't want to risk corrupting your backups) and running SMART tests on the disk (if possible).
Now, if you REALLY don't care about the data, you can try to repair it with data loss, but that will cause data loss. What data will be lost is impossible to say without having access to the database itself.
If you REALLY want to try to repair the corruption yourself (note it can be a lot of work), step 1 is to restore the backup to a test system, then it depends on who you ask, but I like to do a row count on all tables so I can use that to see where data was potentially lost. Next, I run the checkdb with repair allow data loss option and get the test system to have a clean checkdb record. At this point, I KNOW I have lost data in the test system, but it MAY still be there in live. Next, I'd do a row count again and then compare the number before and after. Using this, I can determine which tables lost data and I can decide (ie work with the end users) to determine how hard I need to work to recover that data. IF the data is critical, then I'd hire someone in to fix it as doing it yourself without any experience can be challenging, but if you like a challenge, hex editors on the database (assuming it isn't encrypted) can be fun ways to poke at the data and you can find the rows which were corrupt and you MAY be able to do a best guess to recover them. OR if you are SUPER lucky, the clustered index/heap may be corrupt BUT you may have a valid non-clustered index in which case you can recover the data from there. Mind you, if you rebuild the non-clustered indexes after the corruption occurred, you likely blew out any chance of getting the data out of the NCI.
But step 1 before running checkdb with ANY recovery method is to make sure you have a backup. I REALLY REALLY REALLY hope you have a backup before you ran checkdb with recover rebuild...
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
September 11, 2023 at 8:40 am
Hi Brian,
Thank for for you're response. I had a feeling this issue might be difficult to resolve by fixing the db.
I checked on line regarding the causes to corruption and came to the same conclusion that the cause might be a hardware issue, I raised this with our client and they got quite defensive as they claim other databases haven't been affected. It seems they want to push the issue in our direction rather than resolve any internal IT problems.
I made a copy of the database and ran repair_rebuild on the copy before I attempt anything on the live database. I always backup/copy before doing anything with db's even if only minor.
We'll give them the option of starting again with a blank db.
thanks again Brian, much appreciated.
Dan
September 11, 2023 at 3:07 pm
My opinion, I would ask them to do basic troubleshooting to validate that the disk is good. Just because other systems are running fine now, doesn't mean they will continue to run fine.
I'd ask them to at the very least run a check disk on the system to ensure that it comes back clean. It will very very likely result in needing some downtime which they may not be thrilled about, but best case, they wasted some time during a scheduled maintenance window and they know their disks are good. Worst case, they find out their disks are showing signs of failure and they should be replacing them before more systems fail.
Last thing they would want (I'm sure) is for the disks to fail and all of their systems to go down.
And probably a dumb question, but have they ran a checkdb on other databases to ensure there isn't corruption in other places?
If I was you, I wouldn't be building up a blank database on the SAME disk until I knew the disk was good. You know what will happen if corruption pops up again due to hardware failure - they will blame you for it and claim you set things up wrong or the tools are broken or something along those lines. At the very least, if you are going to blow out the DB and start fresh, I would ask them to put you on a different physical disk as you don't trust that disk and don't want to get blamed when corruption happens again as it is likely to happen again.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
September 15, 2023 at 2:37 pm
Hi Brian,
I'll try to implement your advise. They haven't mentioned corruption with any other db but they are now investigating internally why it's taken them this long to find the issue...
However, their ICT still doesn't believe a hardware issue can be the cause of the consistency corruption, i'll be on a uphill battle with this one.
thanks again
dan
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply