DBCC Question

  • So to summarise...

    Corruption appearing repeatedly in the same table across different databases on different servers?

    Corruption 'disappearing' between a maintenance plan running checkDB and a manual run of checkDB with no index rebuilds or other large page-deallocating operations between the two?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I would rather just focus on the one server that keeps generating the error messages. I haven't heard from them after getting a clean DBCC, so once they post the maintenance plan results (again) i will let you know. Yes it is the same table, same db, appears to be different indexes now though.

    Thanks for your help on this! I already checked the system for disk related errors in the event log, and the only event was when they swap out disks on a different drive (they rotate backups).

    I also told them not to run chkdsk on the drive that SQL dbs are located on.

  • That's correct as well, no index rebuilds, the only thing defined is index reorganize, but that only runs once a month, I'm not sure when it ran last.

  • They just got finished rerunning the plan, came back totally clean again. Very odd!

    Microsoft(R) Server Maintenance Utility (Unicode) Version 10.50.1600 Report was generated on "ServerName".

    Maintenance Plan: SV MaintenancePlan

    Duration: 00:20:29

    Status: Succeeded.

    Details:

    Check Database Integrity (ServerName)

    Check Database integrity on Local server connection Databases that have a compatibility level of 70 (SQL Server version 7.0) will be skipped.

    Databases: All databases

    Include indexes

    Task start: 2012-04-16T13:05:42.

    Task end: 2012-04-16T13:26:07.

    Success

    Command:USE [master]

    GO

    DBCC CHECKDB(N''master'') WITH NO_INFOMSGS

    GO

    USE [model]

    GO

    DBCC CHECKDB(N''model'') WITH NO_INFOMSGS

    GO

    USE [msdb]

    GO

    DBCC CHECKDB(N''msdb'') WITH NO_INFOMSGS

    GO

    USE [ReportServer]

    GO

    DBCC CHECKDB(N''ReportServer'') WITH NO_INFOMSGS

    GO

    USE [ReportServerTempDB]

    GO

    DBCC CHECKDB(N''ReportServerTempDB'') WITH NO_INFOMSGS

    GO

    USE [Foo]

    GO

    DBCC CHECKDB(N''Foo'') WITH NO_INFOMSGS

    GO

  • The application doesn't happen to drop and recreate that table as part of operation, does it? Or is it possible a vendor tech got in due to a report from a user and handled it? It's a long shot but would explain this.

  • Heck no, no drop/creates, all maintenance is done outside the application, and their application tech messaged me about it cause he didn't know what to do either. The last time any index was entirely rebuild was in 3/2011, ever since then it's just index reorganization.

    And nothing was done (no reboots either) between the message coming back with errors, and then rerunning the commands a bunch and it coming back clean.

    I even had them run DROP CLEANBUFFERS prior to rerunning the maintenance plan. The only think I can think of even remotely is something happening in-memory, even the first error message in post #1 is weird, the fact it says REPAIR_ALLOW_DATALOSS, but it's index id 2 (just a rebuild would fix!).

  • It has been suggested to me that the problem may lie in the SAN caches. Can you get the SAN admin to check that there are no errors (dropped pages is what I was told) and maybe see if the caches can be disabled for a while (though that may seriously impact performance, so take care)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • This particular server isn't on a SAN, it's just your ho-hum normal DAS RAID, (I think it's raid 5 actually). It was a different server that was on a SAN (i've not got any messages since from that server).

  • I can check if write-caching has been enabled on the hardware RAID though.

  • Do so, and see if the read cache can be disabled for testing.

    Is it possible to shut that SQL Server down and stress-test the IO subsysten?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • It's not possible to shut down the server right now :/ It's one of those 24x7 applications, I have passed on the information and will let you know about the write back cache, they are going to send a tech out to look at the drives anyways.

  • Make sure that the tech checks everything, drives, cache, controllers, etc. I doubt this is the actual physical disk that's the problem, but culd weasily be something else in the IO stack. Also check for new versions of firmware and drivers.

    Finally, I'd suggest seeing if you can get the DB onto alternate storage, even if just so you can stress test the DAS.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • had something similar to this on an exchange server some years ago, repeated corruption turned out to be a whacko RAID backplane\bios. check and double check all areas of the storage subsystem

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply