November 30, 2009 at 9:29 am
This morning we had an electrical failure that resulted in a hard take down of several of my SQL servers. When everything came back up my SQL 2000 replication monitor showed several (but not all) of my subscriptions had the following error message:
I/O error 21(The device is not ready.) detected during read at offset 0x00000000150000 in file 'D:\SQLData\distribution.MDF'. The step failed.
Since some of my subscriptions that use the same distribution server are running I decided to run dbcc checkdb on my distribution database and I got the following message:
Server: Msg 7995, Level 16, State 1, Line 1
Database 'distribution' consistency errors in sysobjects, sysindexes, syscolumns, or systypes prevent further CHECKDB processing.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
So after researching on the internet I found someone that mentioned that one of the system table my have inconsistencies so I tried running dbcc checktable on the sysobjects and sysindexes tables with no problems. But when I ran it on the syscolumns table I get the following:
Server: Msg 8966, Level 16, State 2, Line 1
Could not read and latch page (1:119) with latch type UP. ⟄⮞ failed.
Server: Msg 2533, Level 16, State 1, Line 1
Table error: Page (1:119) allocated to object ID 3, index ID 0 was not seen. Page may be invalid or have incorrect object ID information in its header.
Server: Msg 8976, Level 16, State 1, Line 1
Table error: Object ID 3, index ID 1. Page (1:119) was not seen in the scan although its parent (1:17) and previous (1:118) refer to it. Check any previous errors.
Server: Msg 8978, Level 16, State 1, Line 1
Table error: Object ID 3, index ID 1. Page (1:145) is missing a reference from previous page (1:119). Possible chain linkage problem.
CHECKTABLE found 0 allocation errors and 1 consistency errors not associated with any single object.
DBCC results for 'syscolumns'.
There are 1753 rows in 34 pages for object 'syscolumns'.
CHECKTABLE found 0 allocation errors and 3 consistency errors in table 'syscolumns' (object ID 3).
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKTABLE (distribution.dbo.syscolumns ).
Basically I am trying to see what I can do to not make anything worse. Since some of my subscriptions are running correctly I do not want to mess them up and just fix the ones that are not. So I am taking suggestions, but I am wondering if it would be possible to not make things worse by just restoring my last distribution database backup to another server and DTS just the syscolumns table?
Thanks
Sherri
November 30, 2009 at 9:34 am
The best fix for this is to restore from backup. Don't try and restore then DTS stuff around. The system tables are not to be messed around with in that way, by doing so you could break it even more. Restore a clean backup of distribution.
Repairs aren't generally allowed on the system tables, I'm surprised at the minimum repair level specified. It may be worth taking a backup of this DB, restoring elsewhere and trying a repair, see if it works and, if so what it drops.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 30, 2009 at 9:52 am
Thanks Gail, I like that suggestion of restoring this database elsewhere so I can see what happens when I try to repair.
November 30, 2009 at 9:55 am
Gail - these are SQL 2000 corruptions (from the CHECKDB error messages). The rules for when a system table can be repaired are a little different between 2000 and 2005, but IIRC, even though I specify a repair level in the output, it won't actually do the repair if asked.
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
November 30, 2009 at 10:45 am
Gail,
I just tried to make the database backup of the Distribution database, but I got the original error from my first message as soon and I try to start the backup about I/O error 21 (The device is not ready).
So I think I am going to have the server guys take a look at the server and make sure it isn't a hardware issue before I try to restore the database.
But assuming that I get to the point of restoring the database in just a bit, do you know if there are any things I need to be aware of with my replication because of restoring the distribution database? This will be my first time ever restoring the distribution database so if you have any advice I would greatly appreciate it. (I am a rookie DBA, but I am the only DBA on the staff)
Thanks again.
Sherri
November 30, 2009 at 9:22 pm
Sherri Barkley (11/30/2009)
So I think I am going to have the server guys take a look at the server and make sure it isn't a hardware issue before I try to restore the database.
Very good idea.
But assuming that I get to the point of restoring the database in just a bit, do you know if there are any things I need to be aware of with my replication because of restoring the distribution database?
I don't, I'm afraid. I haven't worked that much with replication. Do you have a test environment where you can set up a similar situation (minus the corruption) and test out what happens when you restore an older copy of distribution?
Paul Randal (11/30/2009)
Gail - these are SQL 2000 corruptions (from the CHECKDB error messages). The rules for when a system table can be repaired are a little different between 2000 and 2005, but IIRC, even though I specify a repair level in the output, it won't actually do the repair if asked.
I noticed it was SQL 2000, just from the table names. Didn't realise the repair rules were different. Can you elaborate at all? Here or by email?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 1, 2009 at 9:43 am
Thanks Gail. I restored the distribution database and my replication was out of sync so I ended up reinitializing just to get everything back up and going. I do not currently have a test environment with replication set up...I hope to get that in the not too distant future so I can practice and see what happens. I am also going to continue to research replication to see if there is anything I could have done to not have to reinitialize.
Thanks again for everyone's responses to my issue.
Sherri
December 1, 2009 at 9:54 am
Good to hear it's fixed. If you haven't already, get the server admins to run some checks on the disk, make sure there are no lingering problems
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 1, 2009 at 9:58 am
They have looked at it and said all of the disks are healthy. So we are assuming that the hard shutdown due to the electrical issue in our data center must have caused the corruption.
Thanks again.
Sherri
December 1, 2009 at 6:44 pm
I went through something similar once. I was "lucky" (waiting for the opportunity to strike, actually :-D) because I also had the foresight to not only keep track of everyone's time trying to do such a fix, but was also able to make an estimate of everyone's time that was affected by the outage. From that, I was also able to make an estimate of what that time was worth dollar wise... it turned out to be almost 1000 times more than the small standalone UPS's I wanted (and had previously been denied several times) for the systems and I didn't even include an estimate on possible lost revenue nor what the cost would have been had we had to "rebuild" one of more servers.
I received them along with 2 spares by FedEx the day after I made my presentation. 😉
Heh... "Luck... where foresight and planning meet opportunity."
--Jeff Moden
Change is inevitable... Change for the better is not.
December 2, 2009 at 12:10 am
Sherri Barkley (12/1/2009)
They have looked at it and said all of the disks are healthy. So we are assuming that the hard shutdown due to the electrical issue in our data center must have caused the corruption.
Get the server guys to check the state of the disk write caches (if it has them). If the disks have write caches and they are set to be used, they must be battery-backed (and the battery should be working). Otherwise a sudden power failure can cause a write that SQL thought had been completed to disk to be lost.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply