Msg 8909 & Msg 8998: Page allocation errors

Question

Msg 8909 & Msg 8998: Page allocation errors

abo_moaaz

Right there with Babe

Points: 765
More actions
February 3, 2010 at 4:45 am

#220252

I Recently ran dbcc checkdb on one of our SQLServer boxes (SS2005 Ent SP3, x64) and a DB generated this:
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:8088) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).
Msg 8998, Level 16, State 2, Line 1
Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 77 pages from (1:8088) to (1:16175). See other errors for cause.
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:16176) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).
Msg 8998, Level 16, State 2, Line 1
Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 77 pages from (1:16176) to (1:24263). See other errors for cause.
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:24264) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).
Msg 8998, Level 16, State 2, Line 1
Page errors on the GAM, SGAM, or PFS pages prevent allocation integrity checks in database ID 77 pages from (1:24264) to (1:32351). See other errors for cause.
CHECKDB found 6 allocation errors and 0 consistency errors not associated with any single object.
CHECKDB found 6 allocation errors and 0 consistency errors in database 'DbSSSS'
So far db is accessible by users and 'Select * from ...' works against all 91 tables. No repair attempted yet. I restored from prior backups as a different dbname on different SAN but same page errors. I appreciate your guidance to resolve this.
Thanks in advance..
Othman

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

You have what looks like errors in the allocation pages. Because of those errors, checkDB can't do a full check of the DB. There is no repair for this.

Either restore a clean backup of the database along with all log backups (if your backup strategy is good, you'll be able to do that without data loss), or, if you don't have a clean backup then you're going to need to script all objects, export all data and recreate the DB.

You may be lucky and the damaged allocation pages refer to empty sections of the DB, however that doesn't mean these are benign, they are a fairly serious form of corruption.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Paul Randal One Orange Chip Points: 29544 More actions · Answer 2

You've got corruption in two PFS pages - this isn't repairable, even with undocumented DBCC options. You need to restore, and also figure out why this happened. Select * can work fine sometimes depending on table structure when this happens.

Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

I'm curious. How did you determine that they were PFS pages, not GAM/SGAM? Something about the page ID or is it the size of the range of pages that can't be checked?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Joie Andrew One Orange Chip Points: 27360 More actions · Answer 4

Have you checked for hardware errors on the server?

Error 8998

Joie Andrew
"Since 1982"

Paul Randal One Orange Chip Points: 29544 More actions · Answer 5

Interesting - that Books Online page is wrong (it's based on a document I wrote while inside MS). A PFS page maps 8088 pages, not 7990. I'll get it fixed.

Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

Paul Randal One Orange Chip Points: 29544 More actions · Answer 6

Gail - you mean you don't have all the allocation bitmap page IDs and intervals burned into your brain? Lucky you 🙂

Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

Paul Randal (2/3/2010)
Gail - you mean you don't have all the allocation bitmap page IDs and intervals burned into your brain? Lucky you 🙂

Not yet, I'm working on it though.

Can you tell a GAM from SGAM from PFS on the basis of page ID alone? For that matter, with an error like this, can you tell a GAM from an SGAM and, if so how?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Paul Randal One Orange Chip Points: 29544 More actions · Answer 8

I should do a blog post at some point, or you should do the MCM class 🙂

Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

Paul Randal (2/4/2010)
I should do a blog post at some point

Please?

or you should do the MCM class 🙂

Would love to. As soon as I persuade someone to pay for it for me or I win the lottery.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

abo_moaaz Right there with Babe Points: 765 More actions · Answer 10

Gail / Paul ,

appreciate your followup.

As suggested:hehe:, I exported data to a new db but noticed that exporting VIEWs causes correponding TABLEs to be created so it seems they have to be scripted then explicitly created.

Few questions:

1)As DBCC CHECKDB wasn't run before, this page corruption seems to have been present since weeks ago:w00t:. Does this entail necessity of running CHECKDB on daily basis during afterhours to be aware of similar corruption at early stage?

2)If required diagnostics arranged to be run against the affected SAN drive, can it also 'fix' or will it only expose damaged area?

Thanks and Regards,

Othman

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

abo_moaaz (2/4/2010)
1)As DBCC CHECKDB wasn't run before, this page corruption seems to have been present since weeks ago:w00t:. Does this entail necessity of running CHECKDB on daily basis during afterhours to be aware of similar corruption at early stage?

Absolutely! Maybe not daily, but often enough that, should corruption be detected, restore from a clean backup is always an option. Hence your backup strategy will determine the minimum frequency for CheckDB.

For example, if you only retain 2 days of backups and hence cannot restore to further back than 2 days, checkDB must run at least every 2 days. Personally I'd recommend running it before you run your full backups. No point in backing up a corrupt DB.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

abo_moaaz Right there with Babe Points: 765 More actions · Answer 12

Hi ,

For further analysis :cool:(note that actual db with 8998 error was exported earlier with no loss of data and the old one was renamed ), with that said,

1)If I run backup with checksum on the corrupted db, it still finishes successfully does this confirm that the corrupted area has no data?

2) the storgae group informed me that as long as the SAN volume is accessible, from their side it's 'healthy'. Should we as a precaution, replace the volume by a new one ?

Thanks and Regards

Othman

Paul Randal One Orange Chip Points: 29544 More actions · Answer 13

1) No. It just means that the corrupt pages don't have page checksums on them for some reason. Was the database upgraded from 2000?

2) Your storage group are fools. Is a SAN volume with poor IO throughput 'healthy' just becase it's accessible? No. Is a SAN volume that has drives causing corruption 'healthy'? No. You need to educate the storage group of what healthy means - 'does not corrupt the data' is the number 1 requirement and it's not being met here. Just because a drive can be accessed in no way guarantees that there aren't integrity problems with it.

Thanks

Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

abo_moaaz Right there with Babe Points: 765 More actions · Answer 14

Hi Paul,

just to let you know that i really enjoyed your teched video:-). As for 1) this was a pure SS2005 db (actually, one of the SharePorint DBs). For the storage group part, what i think they meant that their available tools may not tell them much other than accessibility or performance throughput. If it helps, i can mention the brand name of the storage.

Another insight, it was determined early this week that the installed multipath s/w for the SAN was not compatible with the OS (WS2008 ENT R2 to which we upgraded last October), so it was yesterday upgraded to the compatible level, but the checkdb would still show 8998 on that db.

Thanks and Regards

Othman