Corrupt table ?

  • I provide DBA support for a company with a medium size application database (3GB) on SQLServer 2000. Recently, one of their scheduled tasks has been failing. It is a simple daily task that copies data from application tables into datawarehouse tables for reporting.

    First time it failed with this error :

    Attempt to fetch logical page (1:440107) in database 'db_name' belongs to object 'X', not to object 'Y'. [SQLSTATE HY000] (Error 605).  The step failed.

    Ran a dbcc checkdb, with no errors. Ran the task again and it worked !

    Next day, task failed again at the scheduled time. Only this time it failed when I re-ran it. So, I saved the Data in table Y, dropped and re-created it then loaded the data back in. Task ran successfully.

    Next day it failed again with a different error :

    I/O error (bad page ID) detected during read at offset 0x00000138b76000 in file 'X:\mssql\data\db_name\db_name_data.mdf'. [SQLSTATE HY000] (Error 823).  The step failed.

    Ran a dbcc checkdb, with no errors. Ran the task again and it worked !

    Has anyone seen these symptoms before ? Could it be a tempdb corruption ? DBCC checkdb finds no corruption in the application database and I am reluctant to do a complete database rebuild because my customer runs 24/7.

    /john

  • It seems to be some kind of "In Memory" corruption, i.e the page is getting corrupted while being read from disk to RAM (either due to some issues in RAM or due to some issues like malicious Kernel level Filter Drivers corrupting the page).  See if there are any Anti Virus applications or other filter drivers enabled.

    Also make sure that the Write Cache mechanism on the disk is disabled (if enabled), try relocating the files to other disk.

    Run DBCC Dropcleanbuffers before running the DBCC Checkdb

    Go through the System Event Log / Application Event Logs for any Disk / RAM / Application related errors that might provide pointers

    If the issue continues, approach Microsoft SQL Server Support.

     

    M.S. Reddy

     

  • I also usually restart the server(s) after disabling Write Cache mechanism on the disk. (if enabled).

  • I found the solution for this in another post

    It is caused by stale cache data and is fixed by a firmware update.

    It is specific to a certain type of disk controller

    If you search this site for `stale cache` you will find the details. Also see the following link.

    See : ftp://ftp.compaq.com/pub/products/storageworks/techdoc/msa1000/MSA10004.32_ReleaseNotes.pdf

    /John

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply