What is the best DR to counteract a corrupt file / table?

  • we've had an issue where 2 tables were corrupt overnight and the database went into suspect mode. We do a nightly backup and translogs every 30mins and move ship them to 2 other DR servers.

    Can anyone confirm whether if a table becomes corrupt, will this be replicated to the DR boxes?

    We could not restore the database to the point in time where it was first corrupt due to the amount of processing done on that database since that point. Rebuilding the index and CheckDB did not work either...

    Research on the errors shows issues like this happening on sql server 2000, we're on 2005 (build 3239).

    I was just wondering if anyone has any ideas on what i can do to prevent this from happening again? What DR procedures could i put in place to keep downtime to a minimum? Would bcp ing the data to a new table, dropping original table and changing the new table name to original name work?

    I believe it became corrupt from the DBREINDEX job we run everynight, not got any hard evidence but seems coincidental that these 2 tables were reindexed overnight.

    any ideas greatly received....

    PS: did get it back up and running but want to know if there's a clear cut way to fix / prevent this issue??

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • Best way to recover with no data loss is always to use backups, log backups, and a final tail-of-the-log backup. You said that you couldn't use your backups because too much work had been done? What do you mean? Not much point taking backups if you didn't use them - do you mean it was taking too long to restore? That's a common problem and relates to not having a correctly designed backup strategy - you need to design a restore strategy first then use that to drive your backup strategy.

    Highly unlikely that rebuilding the indexes corrupted them - much more likely that your I/O subsystem has transient corruption problems. Check your firmware levels, run I/O diagnostics, run SQLIOSim, etc

    What were the exact errors from CHECKDB?

    Thanks

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • Swirl80 (11/13/2008)


    Can anyone confirm whether if a table becomes corrupt, will this be replicated to the DR boxes?

    No. What's shipped to the failover are the log backups. Corruption is typicaly caused by bad hardware and hence isn't logged

    Research on the errors shows issues like this happening on sql server 2000, we're on 2005 (build 3239).

    It can happen on any version of SQL

    Would bcp ing the data to a new table, dropping original table and changing the new table name to original name work?

    Probably not. If the table is corrupt then trying to BCP the data out will fail due to the unreadable pages.

    I believe it became corrupt from the DBREINDEX job we run everynight, not got any hard evidence but seems coincidental that these 2 tables were reindexed overnight.

    SQL doesn't corrupt its own database files. Corruption is generally a hardware problem

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • the problem with restoring a backup on this particular database is that it is involved in almost every part of our systems and links in one way or another to every other database that we have so when a job runs it will update several tables in this database as well as tables in another. Therefore it would take a lot more time to pull out / reverse all the processing done on the other databases to make the data match - don't ask why it is like this, was all designed and built this way long before i recently joined and we are looking to address this issue but it is a massive project which is currently being done in stages.

    This is the error from the log file:

    Unable to find index entry in index ID 1, of table 1479845230, in database 'DBNAME'. The indicated index is corrupt or there is a problem with the current update plan. Run DBCC CHECKDB or DBCC CHECKTABLE. If the problem persists, contact product support.

    The dbcc errors were similar to:

    Error: 644, Severity: 21, State: 5

    Could not find the index entry for RID '1613b1000000100' in index page (1:189), index ID 7, database 'TestDB'..

    Error: 8646, Severity: 21, State: 1

    The index entry for row ID was not found in index ID 7, of table 2009058193, in database 'test644'..

    and:

    Server: Msg 2511, Level 16, State 1, Line 1

    Table error: Object ID 2009058193, Index ID 7. Keys out of order on page (1:189), slots 184 and 185.

    i'll look into the i/o subsystem and see if the diagnostics brings back any issues...

    I fully agree with your comment about you need to design a restore strategy first then use that to drive your backup strategy, this issue will be addressed tomorrow.

    thanks for you thoughts.

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • Ah - from the error messages, you're on SQL 2000, not 2005, right? Note that there's a separate forum for 7.0/2000 problems.

    Looks like you may be hitting this bug - http://support.microsoft.com/kb/822747

    Thanks

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • Swirl80 (11/13/2008)


    the problem with restoring a backup on this particular database is that it is involved in almost every part of our systems and links in one way or another to every other database that we have so when a job runs it will update several tables in this database as well as tables in another. Therefore it would take a lot more time to pull out / reverse all the processing done on the other databases to make the data match

    But if you backup the tail of the log before restoring, you will lose no data and hence won't have to reverse any processing anywhere. That's the point of log backups.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Unfortuantely not, we're on sql server 2005 enterprise (v9.0.3239) which is why i'm baffled as to why we're getting errors which are / were specific to 2000 and supposedly fixed.....

    I've spoken to our Ops manager and looking to run some diagnostics on the SAN

    that knowledge base is something i'd read earlier which indicated that it was an issue in 2000.

    Gail: So in theory if i bring my Standby databases online on the DR server they won't have the corruption? Interesting, you said "Corruption is typicaly caused by bad hardware and hence isn't logged", but there is still a chance SQL could of corrupted the data? If this is the case then will this corruption be logged and shipped?

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • GilaMonster (11/13/2008)


    Swirl80 (11/13/2008)


    the problem with restoring a backup on this particular database is that it is involved in almost every part of our systems and links in one way or another to every other database that we have so when a job runs it will update several tables in this database as well as tables in another. Therefore it would take a lot more time to pull out / reverse all the processing done on the other databases to make the data match

    But if you backup the tail of the log before restoring, you will lose no data and hence won't have to reverse any processing anywhere. That's the point of log backups.

    but i was on the understanding that if i restored the database and all tlogs (incl the tail) then i'd just be restoring the database in the corrupt state.

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • Swirl80 (11/13/2008)


    but i was on the understanding that if i restored the database and all tlogs (incl the tail) then i'd just be restoring the database in the corrupt state.

    Only if the database backup is corrupt. If you restore a full backup from before the corruption occurred and then roll the log forward, the DB will be clean.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Swirl80 (11/13/2008)


    Gail: So in theory if i bring my Standby databases online on the DR server they won't have the corruption? Interesting, you said "Corruption is typicaly caused by bad hardware and hence isn't logged", but there is still a chance SQL could of corrupted the data? If this is the case then will this corruption be logged and shipped?

    Very, very, very slim. Paul can probably give more details, but I'd say the chance of SQL generating a corrupt page and logging that corruption in such a way that the log backups succeed and the subsequent log restores succeed all without an error are in the very slim to none range.

    Typically corruption is caused by the data (or log) been changed outside of SQL, either during the write operation (between SQL and the disk) or on the disk itself.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • could this be a potential issue?

    http://support.microsoft.com/kb/954734/

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • Yes, if the situation applies then this could be it. I didn't know about this problem - nice 🙁

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • I'm convinced this is the problem. I was confused yesterday when one of the errors referred to the unique clustered index beingcorrupt but that table did not have a unique index on it.

    I'll patch the test servers this weekend.

    So, in this instance, what would be the best DR I could put in place? I'm gonna be testing the restore of the backup file plus tlogs to see if it restores the corruption this weekend.

    I'll report back with my results

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

  • Cool (- that you found the problem).

    As far as DR to avoid this, there isn't any way to circumvent a bug like this. It'll be in your transaction log backups too, and hence will be log-shipped and mirrored if you had those setup.

    I'll need to blog about this one.

    Thanks

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • I think it's a good idea to get this blogged and make ppl more aware of this issue, especially if it could potentially catch u with it pants down!

    If u need any info on errors etc for it blog then let me know and I'll help all I can.

    _________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply