April 16, 2012 at 12:06 pm
So to summarise...
Corruption appearing repeatedly in the same table across different databases on different servers?
Corruption 'disappearing' between a maintenance plan running checkDB and a manual run of checkDB with no index rebuilds or other large page-deallocating operations between the two?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 16, 2012 at 12:28 pm
I would rather just focus on the one server that keeps generating the error messages. I haven't heard from them after getting a clean DBCC, so once they post the maintenance plan results (again) i will let you know. Yes it is the same table, same db, appears to be different indexes now though.
Thanks for your help on this! I already checked the system for disk related errors in the event log, and the only event was when they swap out disks on a different drive (they rotate backups).
I also told them not to run chkdsk on the drive that SQL dbs are located on.
April 16, 2012 at 12:28 pm
That's correct as well, no index rebuilds, the only thing defined is index reorganize, but that only runs once a month, I'm not sure when it ran last.
April 16, 2012 at 1:38 pm
They just got finished rerunning the plan, came back totally clean again. Very odd!
Microsoft(R) Server Maintenance Utility (Unicode) Version 10.50.1600 Report was generated on "ServerName".
Maintenance Plan: SV MaintenancePlan
Duration: 00:20:29
Status: Succeeded.
Details:
Check Database Integrity (ServerName)
Check Database integrity on Local server connection Databases that have a compatibility level of 70 (SQL Server version 7.0) will be skipped.
Databases: All databases
Include indexes
Task start: 2012-04-16T13:05:42.
Task end: 2012-04-16T13:26:07.
Success
Command:USE [master]
GO
DBCC CHECKDB(N''master'') WITH NO_INFOMSGS
GO
USE [model]
GO
DBCC CHECKDB(N''model'') WITH NO_INFOMSGS
GO
USE [msdb]
GO
DBCC CHECKDB(N''msdb'') WITH NO_INFOMSGS
GO
USE [ReportServer]
GO
DBCC CHECKDB(N''ReportServer'') WITH NO_INFOMSGS
GO
USE [ReportServerTempDB]
GO
DBCC CHECKDB(N''ReportServerTempDB'') WITH NO_INFOMSGS
GO
USE [Foo]
GO
DBCC CHECKDB(N''Foo'') WITH NO_INFOMSGS
GO
April 16, 2012 at 1:44 pm
The application doesn't happen to drop and recreate that table as part of operation, does it? Or is it possible a vendor tech got in due to a report from a user and handled it? It's a long shot but would explain this.
April 16, 2012 at 1:48 pm
Heck no, no drop/creates, all maintenance is done outside the application, and their application tech messaged me about it cause he didn't know what to do either. The last time any index was entirely rebuild was in 3/2011, ever since then it's just index reorganization.
And nothing was done (no reboots either) between the message coming back with errors, and then rerunning the commands a bunch and it coming back clean.
I even had them run DROP CLEANBUFFERS prior to rerunning the maintenance plan. The only think I can think of even remotely is something happening in-memory, even the first error message in post #1 is weird, the fact it says REPAIR_ALLOW_DATALOSS, but it's index id 2 (just a rebuild would fix!).
April 16, 2012 at 2:05 pm
It has been suggested to me that the problem may lie in the SAN caches. Can you get the SAN admin to check that there are no errors (dropped pages is what I was told) and maybe see if the caches can be disabled for a while (though that may seriously impact performance, so take care)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 16, 2012 at 2:12 pm
This particular server isn't on a SAN, it's just your ho-hum normal DAS RAID, (I think it's raid 5 actually). It was a different server that was on a SAN (i've not got any messages since from that server).
April 16, 2012 at 2:13 pm
I can check if write-caching has been enabled on the hardware RAID though.
April 16, 2012 at 2:45 pm
Do so, and see if the read cache can be disabled for testing.
Is it possible to shut that SQL Server down and stress-test the IO subsysten?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 16, 2012 at 2:49 pm
It's not possible to shut down the server right now :/ It's one of those 24x7 applications, I have passed on the information and will let you know about the write back cache, they are going to send a tech out to look at the drives anyways.
April 16, 2012 at 3:32 pm
Make sure that the tech checks everything, drives, cache, controllers, etc. I doubt this is the actual physical disk that's the problem, but culd weasily be something else in the IO stack. Also check for new versions of firmware and drivers.
Finally, I'd suggest seeing if you can get the DB onto alternate storage, even if just so you can stress test the DAS.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 18, 2012 at 2:15 pm
had something similar to this on an exchange server some years ago, repeated corruption turned out to be a whacko RAID backplane\bios. check and double check all areas of the storage subsystem
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply