2008 R2 Corruption

  • Hey all,

    I have a client whose database has become corrupt. Their last clean backup was on 1/23. The nightly backups had failed since that time.

    The root cause of the problem was found: a bad drive in a RAID 5 array. The client's IT performed a hot swap and they report that they are no longer receiving any storage alerts.

    When I try to take a full database backup, it fails with the error:

    Read on "E:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\XXXX.mdf" failed: 1(Incorrect function.)

    DBCC runs with the following errors:

    Msg 8966, Level 16, State 2, Line 1

    Unable to read and latch page (1:12437502) with latch type SH. 1(Incorrect function.) failed.

    Msg 8966, Level 16, State 2, Line 1

    Unable to read and latch page (1:12437503) with latch type SH. 1(Incorrect function.) failed.

    Msg 8966, Level 16, State 2, Line 1

    Unable to read and latch page (1:13329048) with latch type SH. 1(Incorrect function.) failed.

    Msg 8966, Level 16, State 2, Line 1

    Unable to read and latch page (1:15179883) with latch type SH. 1(Incorrect function.) failed.

    Msg 8966, Level 16, State 2, Line 1

    Unable to read and latch page (1:16028566) with latch type SH. 1(Incorrect function.) failed.

    Msg 8966, Level 16, State 2, Line 1

    Unable to read and latch page (1:16174196) with latch type SH. 1(Incorrect function.) failed.

    Msg 8966, Level 16, State 2, Line 1

    Unable to read and latch page (1:16184040) with latch type SH. 1(Incorrect function.) failed.

    CHECKDB found 0 allocation errors and 7 consistency errors not associated with any single object.

    Msg 2533, Level 16, State 1, Line 1

    Table error: page (1:13329048) allocated to object ID 1381579960, index ID 1, partition ID 72057598648254464, alloc unit ID 71866662285475840 (type LOB data) was not seen. The page may be invalid or may have an incorrect alloc unit ID in its header.

    Msg 8965, Level 16, State 1, Line 1

    Table error: Object ID 1381579960, index ID 1, partition ID 72057598648254464, alloc unit ID 71866662285475840 (type LOB data). The off-row data node at page (1:13329048), slot 0, text ID 3563562074112 is referenced by page (1:13328541), slot 0, but was not seen in the scan.

    Msg 8929, Level 16, State 1, Line 1

    Object ID 1381579960, index ID 1, partition ID 72057598648254464, alloc unit ID 72057598803836928 (type In-row data): Errors found in off-row data with ID 3563562074112 owned by data record identified by RID = (1:9868377:42)

    CHECKDB found 0 allocation errors and 3 consistency errors in table 'PPAttachment' (object ID 1381579960).

    Msg 2533, Level 16, State 1, Line 1

    Table error: page (1:16184040) allocated to object ID 1413580074, index ID 2, partition ID 72057600840433664, alloc unit ID 72057601141440512 (type In-row data) was not seen. The page may be invalid or may have an incorrect alloc unit ID in its header.

    Msg 8976, Level 16, State 1, Line 1

    Table error: Object ID 1413580074, index ID 2, partition ID 72057600840433664, alloc unit ID 72057601141440512 (type In-row data). Page (1:16184040) was not seen in the scan although its parent (1:17222540) and previous (1:16477040) refer to it. Check any previous errors.

    Msg 8978, Level 16, State 1, Line 1

    Table error: Object ID 1413580074, index ID 2, partition ID 72057600840433664, alloc unit ID 72057601141440512 (type In-row data). Page (1:17189713) is missing a reference from previous page (1:16184040). Possible chain linkage problem.

    CHECKDB found 0 allocation errors and 3 consistency errors in table 'PVAgg' (object ID 1413580074).

    Msg 2533, Level 16, State 1, Line 1

    Table error: page (1:12437502) allocated to object ID 1713296688, index ID 1, partition ID 72057595308933120, alloc unit ID 72057595308802048 (type In-row data) was not seen. The page may be invalid or may have an incorrect alloc unit ID in its header.

    Msg 8976, Level 16, State 1, Line 1

    Table error: Object ID 1713296688, index ID 1, partition ID 72057595308933120, alloc unit ID 72057595308802048 (type In-row data). Page (1:12437502) was not seen in the scan although its parent (1:12439872) and previous (1:12437501) refer to it. Check any previous errors.

    Msg 2533, Level 16, State 1, Line 1

    Table error: page (1:12437503) allocated to object ID 1713296688, index ID 1, partition ID 72057595308933120, alloc unit ID 72057595308802048 (type In-row data) was not seen. The page may be invalid or may have an incorrect alloc unit ID in its header.

    Msg 8980, Level 16, State 1, Line 1

    Table error: Object ID 1713296688, index ID 1, partition ID 72057595308933120, alloc unit ID 72057595308802048 (type In-row data). Index node page (1:12439872), slot 184 refers to child page (1:12437503) and previous child (1:12437502), but they were not encountered.

    Msg 8978, Level 16, State 1, Line 1

    Table error: Object ID 1713296688, index ID 1, partition ID 72057595308933120, alloc unit ID 72057595308802048 (type In-row data). Page (1:12437568) is missing a reference from previous page (1:12437503). Possible chain linkage problem.

    CHECKDB found 0 allocation errors and 5 consistency errors in table 'cusGET_195_3' (object ID 1713296688).

    Msg 2533, Level 16, State 1, Line 1

    Table error: page (1:16174196) allocated to object ID 1871449841, index ID 12, partition ID 72057600812187648, alloc unit ID 72057601111556096 (type In-row data) was not seen. The page may be invalid or may have an incorrect alloc unit ID in its header.

    Msg 8976, Level 16, State 1, Line 1

    Table error: Object ID 1871449841, index ID 12, partition ID 72057600812187648, alloc unit ID 72057601111556096 (type In-row data). Page (1:16174196) was not seen in the scan although its parent (1:16173020) and previous (1:16174195) refer to it. Check any previous errors.

    Msg 8978, Level 16, State 1, Line 1

    Table error: Object ID 1871449841, index ID 12, partition ID 72057600812187648, alloc unit ID 72057601111556096 (type In-row data). Page (1:16174197) is missing a reference from previous page (1:16174196). Possible chain linkage problem.

    CHECKDB found 0 allocation errors and 3 consistency errors in table 'PVI' (object ID 1871449841).

    Msg 2533, Level 16, State 1, Line 1

    Table error: page (1:15179883) allocated to object ID 1977058079, index ID 2, partition ID 72057601267597312, alloc unit ID 72057601591869440 (type In-row data) was not seen. The page may be invalid or may have an incorrect alloc unit ID in its header.

    Msg 8976, Level 16, State 1, Line 1

    Table error: Object ID 1977058079, index ID 2, partition ID 72057601267597312, alloc unit ID 72057601591869440 (type In-row data). Page (1:15179883) was not seen in the scan although its parent (1:15178531) and previous (1:14825998) refer to it. Check any previous errors.

    Msg 8978, Level 16, State 1, Line 1

    Table error: Object ID 1977058079, index ID 2, partition ID 72057601267597312, alloc unit ID 72057601591869440 (type In-row data). Page (1:15179884) is missing a reference from previous page (1:15179883). Possible chain linkage problem.

    Msg 8978, Level 16, State 1, Line 1

    Table error: Object ID 1977058079, index ID 6, partition ID 72057600840368128, alloc unit ID 72057601141374976 (type In-row data). Page (1:5586659) is missing a reference from previous page (1:16028566). Possible chain linkage problem.

    Msg 2533, Level 16, State 1, Line 1

    Table error: page (1:16028566) allocated to object ID 1977058079, index ID 6, partition ID 72057600840368128, alloc unit ID 72057601141374976 (type In-row data) was not seen. The page may be invalid or may have an incorrect alloc unit ID in its header.

    Msg 8976, Level 16, State 1, Line 1

    Table error: Object ID 1977058079, index ID 6, partition ID 72057600840368128, alloc unit ID 72057601141374976 (type In-row data). Page (1:16028566) was not seen in the scan although its parent (1:16010131) and previous (1:5586091) refer to it. Check any previous errors.

    CHECKDB found 0 allocation errors and 6 consistency errors in table 'ALog' (object ID 1977058079).

    CHECKDB found 0 allocation errors and 27 consistency errors in database 'XXXX'.

    repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (XXXX).

    Some of these errors were returned for non-clustered indexes, others are tables that I can drop and rebuild. There is one table that contains data where loss would be a problem. Even in that case, the client finds data loss here acceptable if we can get the database back online (without restoring from the 1/23 backup).

    I have not been able to find much on the errors in bold above and it is not clear to me if it is possible to overcome this corruption.

    This database is in full recovery mode, but the client appears to only be attempting full nightly backups.

    I am hesitant to perform any further action until I understand the problem in greater detail.

    Any help would be appreciated!

    Mike

  • > This database is in full recovery mode, but the client appears to only be attempting full nightly backups.

    > I am hesitant to perform any further action until I understand the problem in greater detail.

    If it's in Full recovery model, then stop reading this and take a log backup! Then set up a job to do that. Even if the database is in SUSPECT mode and all the data files are missing, but the log's intact, you can take that log backup and recovery everything.

    If they've never taken log backups, and it's running with Full recovery, that log backup plus any working full backup can put that database back together by restoring the whole thing or even just individual damaged files or pages.

    What's important is that you have a continuous chain of log backups starting before whatever full backup you use to start the recovery.

    If they haven't been taking log backups, how did they deal with transaction log growth? Either the active portion of the log is huge, that database rarely changes, or someone 'freed log space' by switching between Full and Simple mode. Note: taking a full backup does NOT break the log chain. Switch recovery models does, though.

    But first, get that log backup, then use RESTORE HEADERONLY commands on both your transaction log backup and the most recent full backup you can get to see if the FirstLSN of the log backup is before (less than) the LastLSN of the full backup (see RESTORE HEADERONLY in Books Online for more details).

    If the log backup and the full backup overlap, then you can restore the entire database and lose no data.

    You could also restore just the damaged pages (RESTORE DATABASE <database name> PAGE = '1:13329048, 1:16184040, <more pages>' ...), as detailed in Books Online in the "Performing Page Restores" topic.

    Good luck,

    -Eddie

    Eddie Wuerch
    MCM: SQL

  • Eddie Wuerch (2/17/2016)


    > This database is in full recovery mode, but the client appears to only be attempting full nightly backups.

    > I am hesitant to perform any further action until I understand the problem in greater detail.

    If it's in Full recovery model, then stop reading this and take a log backup! Then set up a job to do that. Even if the database is in SUSPECT mode and all the data files are missing, but the log's intact, you can take that log backup and recovery everything.

    If they've never taken log backups, and it's running with Full recovery, that log backup plus any working full backup can put that database back together by restoring the whole thing or even just individual damaged files or pages.

    What's important is that you have a continuous chain of log backups starting before whatever full backup you use to start the recovery.

    If they haven't been taking log backups, how did they deal with transaction log growth? Either the active portion of the log is huge, that database rarely changes, or someone 'freed log space' by switching between Full and Simple mode. Note: taking a full backup does NOT break the log chain. Switch recovery models does, though.

    But first, get that log backup, then use RESTORE HEADERONLY commands on both your transaction log backup and the most recent full backup you can get to see if the FirstLSN of the log backup is before (less than) the LastLSN of the full backup (see RESTORE HEADERONLY in Books Online for more details).

    If the log backup and the full backup overlap, then you can restore the entire database and lose no data.

    You could also restore just the damaged pages (RESTORE DATABASE <database name> PAGE = '1:13329048, 1:16184040, <more pages>' ...), as detailed in Books Online in the "Performing Page Restores" topic.

    Good luck,

    -Eddie

    Eddie,

    Thank you for the help! I'm taking the log backup now. It is not clear to me what level of management they have had on this database. The client had stated that they were only taking full backups, but I see some log backups that show me that they were at least experimenting with a t-log backup approach in the past.

    I am taking the transaction log backup now and the client is restoring the last good backup. I should know more soon! Now off to books online to learn.

    Thank you again for the help!

    Mike

  • Just a quick update on this.

    I think I have sequential log chain from the last full backup to today.

    The client found a log backup that was taken immediately after the full, which appears to complete the log chain:

    FILE: XXXX_db_201601282028.BAK (last good full)

    LAST LSN: 905137000008343600001

    FILE: XXXX_tlog_201602802354.LOG (log backup that the client found)

    FIRST LSN: 905136000022514700001

    LAST LSN: 905137000016105600001

    tlogbak.bak (backup taken today)

    FIRST LSN: 905137000016105600001

    I am restoring the last good backup with 'NORECOVERY' now.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply