Unconsistent consistency errors...

  • Hi yall.

    This is my first post here so bear with me please.

    I'm currently working for a company as IT/IS Manager, but my main job has been improving/designing and incorporating procedures in the ERP software. So i guess i'm more of a IS guy than an IT guy. Neverthless, i have to run and manage the IT infrastructure aswell.

    We run an SQL2005 instance with one major database.

    Since the last few days CHECKDB has been reporting some consistency errors in a very unconsistent way. Everytime i run it i get a different number of errors and different levels of minimum repair type.

    We are running a custom built (budget is an issue...) server with a hdd for OS and a RAID 10 (4 HDD) where we sit the database files.

    I've ran a checkdsk on the RAID drive and it showed no errors. Neverthless the database keeps showing the same erratic pattern when running checkdb.

    Could this be from the OS drive?

    Thanks for your input.

    Regards.

    PS: sorry for any typos you may find

    ____________________________________________________________

    If you can't do things right at the first time, don't try skydiving. I won't.

  • No, it won't be the OS drive if the database files aren't on it.

    Errors please.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (11/28/2012)


    No, it won't be the OS drive if the database files aren't on it.

    Errors please.

    Hi Gail, thanks for the quick reply.

    On this run of CHECKDB WITH ALL_ERRORMSGS i got this one error, solved apparently by REPAIR_REBUILD

    DBCC results for 'fi'.

    Msg 8951, Level 16, State 1, Line 1

    Table error: Table 'fi' (ID 78099560). Data row does not have a matching index row in index 'in_fi_ref' (ID 7). Possible missing or invalid keys for the index row matching:

    Msg 8955, Level 16, State 1, Line 1

    Data row (1:1413363:2) identified by (HEAP RID = (1:1413363:2)) with index values (ref = 'XXXXXXXXX' and HEAP RID = (1:1413363:2)).

    There are 350297 rows in 87808 pages for object "fi".

    Previously i got 10 errors solved only by REPAIR_ALLOW_DATA_LOSS

    ____________________________________________________________

    If you can't do things right at the first time, don't try skydiving. I won't.

  • Are you doing index rebuilds right after the consistency checks?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (11/29/2012)


    Are you doing index rebuilds right after the consistency checks?

    These results were obtained running the command right from a query window.

    I do have a scheduled maintenance plan (runs everyday at 2:00AM) that performs the following (by this order):

    - checkdb

    - rebuild

    - reorganize

    - update

    - checkdb (again)

    but, like i said these tasks are not being run right now.

    Just finished another run of CHECKDB executed from the query window:

    Msg 8951, Level 16, State 1, Line 1

    Table error: Table 'sl' (ID 11208673). Data row does not have a matching index row in index 'in_sl_reftam' (ID 10). Possible missing or invalid keys for the index row matching:

    Msg 8955, Level 16, State 1, Line 1

    Data row (1:1412729:2) identified by (HEAP RID = (1:1412729:2)) with index values (ref = 'XXXXX ' and tam = ' ' and HEAP RID = (1:1412729:2)).

    CHECKDB found 0 allocation errors and 1 consistency errors in table 'sl' (object ID 11208673).

    Msg 8951, Level 16, State 1, Line 1

    Table error: Table 'pn' (ID 572994355). Data row does not have a matching index row in index 'in_pn_no' (ID 7). Possible missing or invalid keys for the index row matching:

    Msg 8955, Level 16, State 1, Line 1

    Data row (1:1413369:3) identified by (HEAP RID = (1:1413369:3)) with index values (no = 22006. and HEAP RID = (1:1413369:3)).

    Msg 8951, Level 16, State 1, Line 1

    Table error: Table 'pn' (ID 572994355). Data row does not have a matching index row in index 'in_pn_no' (ID 7). Possible missing or invalid keys for the index row matching:

    Msg 8955, Level 16, State 1, Line 1

    Data row (1:1413369:4) identified by (HEAP RID = (1:1413369:4)) with index values (no = 22006. and HEAP RID = (1:1413369:4)).

    Msg 8951, Level 16, State 1, Line 1

    Table error: Table 'pn' (ID 572994355). Data row does not have a matching index row in index 'in_pn_no' (ID 7). Possible missing or invalid keys for the index row matching:

    Msg 8955, Level 16, State 1, Line 1

    Data row (1:1413369:5) identified by (HEAP RID = (1:1413369:5)) with index values (no = 22006. and HEAP RID = (1:1413369:5)).

    CHECKDB found 0 allocation errors and 3 consistency errors in table 'pn' (object ID 572994355).

    CHECKDB found 0 allocation errors and 4 consistency errors in database 'YYYYYYYYYYYYY'.

    repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (YYYYYYYYY).

    Yet again, different results.

    I'm baffled.

    ____________________________________________________________

    If you can't do things right at the first time, don't try skydiving. I won't.

  • NTesla (11/29/2012)


    GilaMonster (11/29/2012)


    Are you doing index rebuilds right after the consistency checks?

    These results were obtained running the command right from a query window.

    I do have a scheduled maintenance plan (runs everyday at 2:00AM) that performs the following (by this order):

    - checkdb

    - rebuild

    - reorganize

    - update

    - checkdb (again)

    Ow, that's a waste of time. Ditch that plan entirely, go grab Ola's index maintenance script. Use that. Put checkDB (and your backups) in a separate plan

    At this point I would be looking at getting that DB onto different storage and running some diagnostics on your RAID array. Especially check that the write cache is either battery-backed or disabled.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (11/29/2012)


    Ow, that's a waste of time. Ditch that plan entirely, go grab Ola's index maintenance script. Use that. Put checkDB (and your backups) in a separate plan

    At this point I would be looking at getting that DB onto different storage and running some diagnostics on your RAID array. Especially check that the write cache is either battery-backed or disabled.

    That's a new ocean for me. I mean, Ola's index maintenance script.

    Anyway, i currently have 3 plans. The one stated above, and 2 other for backups. One runs every 6 hours to storage array A and the other runs another backup routine, once a day, to storage B.

    Maintenance plan execution times are not colliding.

    With your last sentence you just sentenced (no pun intended) this whole issue. The server is custom built and is not using those powerfull raid boards with battery and all that stuff. Which leads me to think that we need to invest on REAL hardware sooner than later. Operational environment is becoming more and more complex. More and more transactions are being issued but the hardware stayed the same... it's like wanting to have a beetle to race as fast as a Porsche.

    I now have some convincing to do...

    Thanks for your help and readiness. 🙂

    ____________________________________________________________

    If you can't do things right at the first time, don't try skydiving. I won't.

  • NTesla (11/29/2012)


    The server is custom built and is not using those powerfull raid boards with battery and all that stuff.

    Yeah, I got that. In that case make sure that any RAID write cache is disabled.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Just finished going through the procedure of running CHECKDB on another spin.

    The first run required REPAIR_REBUILD. After that i ran CHECKDB 3 times in a row, and the results were always the same (no errors).

    Went to check the raid driver and is indeed enabled to allow write caching. I will disable it in a few minutes, and in a few hours i'll have the first results.

    Boy... am I nervous...

    ____________________________________________________________

    If you can't do things right at the first time, don't try skydiving. I won't.

  • Machine getting frequently rebooted?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (11/29/2012)


    Machine getting frequently rebooted?

    Not really, no.

    Thanks for bringing that up, i'll now check if the write-caching gets enabled after each boot.

    ____________________________________________________________

    If you can't do things right at the first time, don't try skydiving. I won't.

  • Solved.

    RAID array had one black sheep.

    Too bad that the card is so cheap it doesn't give any warning...

    ____________________________________________________________

    If you can't do things right at the first time, don't try skydiving. I won't.

  • NTesla (11/29/2012)


    Solved.

    RAID array had one black sheep.

    Ah-ha!

    Too bad that the card is so cheap it doesn't give any warning...

    Well there's your motivation for better hardware...

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply