SQL SERVER detected a logical consistency-based I/O error: torn page

Question

Post reply

SQL SERVER detected a logical consistency-based I/O error: torn page

Viewing 12 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply

James L Kerr Right there with Babe Points: 760 More actions · Answer 1

I have not yet repaired the msdb error ...

So, I started with

DBCC CHECKDB (msdb) WITH ALL_ERRORMSGS, NO_INFOMSGS

I recieve the following:

Msg 8967, Level 16, State 216, Line 1

An internal error occurred in DBCC that prevented further processing. Contact Customer Support Services.

Now, I executed DBCC CHECKTABLE ('backupmediafamily') to find out what table had the error.

Msg 8967, Level 16, State 216, Line 1

An internal error occurred in DBCC that prevented further processing. Contact Customer Support Services.

Then, I decided to attempt to SELECT * FROM dbo.backupmediafamily, but this actually worked properly.

I decided to try the second index, SELECT media_family_id FROM backupmediafamily, and BINGO!

Msg 824, Level 24, State 2, Line 1

SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:54080; actual 775:-653262834). It occurred during a read of page (1:54080) in database ID 4 at offset 0x0000001a680000 in file 'D:\MSSQL\Data\MSSQL.3\MSSQL\DATA\MSDBData.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

I think I have this tracked down to a physical hard drive issue on our SAN, but I have no way to verify that. The telling factor is that this same exact page was the issue on the 5th with a completely different table (we fixed that issue with the DBCC CHECKDB command in my previous post). According to MSDB.dbo.suspect_pages, I have 8 pages (1 64-KB block of physical data) that is suspect:

DB-ID File Page Event count last error ts

4154080713492009-03-05 14:39:26.057

41540857132009-03-05 14:39:26.090

4154081712009-03-05 14:39:26.057

4154082122009-03-22 00:30:02.980

4154085152009-03-23 17:45:03.477

4154083162009-03-24 12:00:14.117

4154086122009-03-22 16:00:09.950

4154084142009-03-24 02:30:04.110

41540801102009-03-25 13:04:39.470

41540873122009-03-25 11:33:23.530

As you can see, I have errors on the 5th and the 25th for the same pages 54080 and 54081. I have to now attempt to locate the actual physical address of these pages on the hard disk. Any ideas??

Paul Randal One Orange Chip Points: 29544 More actions · Answer 2

No idea how to tell what the physical address of the pages are on the hard disk - all I can say is that the offset of a page within the SQL data file is the page number x 8192 bytes. If NTFS has split the actual file over multiple contiguous chunks of disk, that makes things even harder, then you're dealing with how RAID is setup, drive geometries etc etc. But really all you want to know is which physical drive/array is the problem with - and then replace it.

The 8967 is a known bug, when you already have corruption (key point) - see CHECKDB bug that people are hitting - Msg 8967, Level 16, State 216

Regardless, you need to move the database onto different storage - the fact that the same errors occured on different days says that drive (or RAID array) is toast.

Thanks

Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

James L Kerr Right there with Babe Points: 760 More actions · Answer 3

Paul, I have some really interesting information that I would like to get your opinion on. We currently have a support ticket open with Hitachi to verify integrity on the SAN, but I was digging a little deeper and thought this was most interesting.

DBCC TRACEON (3604, -1)

DBCC PAGE (msdb, 1, 54080, 3)

PAGE: (775:-653262834)

BUFFER:

BUF @0x00000000A1FD1200

bpage = 0x00000000A1448000 bhash = 0x0000000000000000 bpageno = (1:54080)

bdbid = 4 breferences = 0 bUse1 = 54470

bstat = 0x3c00809 blog = 0x21212159 bnext = 0x0000000000000000

PAGE HEADER:

Page @0x00000000A1448000

m_pageId = (775:-653262834) m_headerVersion = 6 m_type = 46

m_typeFlagBits = 0x0 m_level = 129 m_flagBits = 0x0

m_objId (AllocUnitId.idObj) = 67111936 m_indexId (AllocUnitId.idInd) = 768

Metadata: AllocUnitId = 216177180361621504 Metadata: PartitionId = 0

Metadata: IndexId = -1 Metadata: ObjectId = 0 m_prevPage = (0:132096)

m_nextPage = (2:-2105506296) pminlen = 4 m_slotCnt = 0

m_freeCnt = 57 m_freeData = 0 m_reservedCnt = 1536

m_lsn = (16782592:637544192:1792) m_xactReserved = 5377 m_xdesId = (0:66560)

m_ghostRecCnt = 22 m_tornBits = 1089300228

Allocation Status

GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED

PFS (1:48528) = 0x40 ALLOCATED 0_PCT_FULL DIFF (1:6) = NOT CHANGED

ML (1:7) = NOT MIN_LOGGED

Msg 2514, Level 16, State 5, Line 1

A DBCC PAGE error has occurred: Invalid page type - dump style 3 not possible.

I also tried dump style 2, but I am not going to post that ugly mess on here 🙂

What I found MOST interesting is the AllocUnitId ... I took the ID number and ran this:

SELECT * FROM msdb.sys.allocation_units WHERE allocation_unit_id = 216177180361621504

Empty Set ... There was no record of that allocation unit for 5 pages 54080, 54082, 54083, 54084, 54085 ... and they all appear to have the same issue. Page 54081 is in a different allocation unit, and it returned 144 rows properly using the dump style 3 flag.

I am waiting to hear from Hitachi if there is anything on the SAN that might be the root cause, and if not, I am going to start asking about testing the other components in the physical server and connecting the server to the SAN. Do you have any additional ideas or suggestions?

Paul Randal One Orange Chip Points: 29544 More actions · Answer 4

This looks just like I/O subsystem corruption - the page is full of garbage.

I'd like you to turn on page checksums, which will catch when the I/O subsystem is causing problems.

ALTER DATABASE yourdb SET PAGE_VERIFY CHECKSUM

See How to tell if the IO subsystem is causing corruptions?

Thanks

Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

Ranga N SSChasing Mays Points: 642 More actions · Answer 5

hi Paul,

I had a database that checkdb reported errors, I did a checktable for all the tables and few tables reported problems. See the output from each step. Interestingly, the row count and page count remained the same after checktable with REPAIR_ALLOW_DATA_LOSS option was run, So, I am wondering were there any data loss ? Please share your thoughts.

Thanks,

Ranga

dbcc checktable ( 'dbo.CorruptTable', REPAIR_REBUILD )

------------------------------------------------------

Msg 8939, Level 16, State 5, Line 1

Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page (1:40464). Test (m_headerVersion == HEADER_7_0) failed. Values are 6 and 1.

The repair level on the DBCC statement caused this repair to be bypassed.

Msg 8939, Level 16, State 7, Line 1

Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page (1:40464). Test (m_freeData >= PAGEHEADSIZE && m_freeData <= (UINT)PAGESIZE - m_slotCnt * sizeof (Slot)) failed. Values are 0 and 8186.

Repairing this error requires other errors to be corrected first.

Msg 8939, Level 16, State 8, Line 1

Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page (1:40464). Test (m_freeCnt = PAGEHEADSIZE && m_freeData <= (UINT)PAGESIZE - m_slotCnt * sizeof (Slot)) failed. Values are 0 and 8186.

The error has been repaired.

Msg 8939, Level 16, State 8, Line 1

Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page (1:40464). Test (m_freeCnt <= PAGESIZE - PAGEHEADSIZE) failed. Values are 44123 and 8096.

The error has been repaired.

Msg 8909, Level 16, State 1, Line 1

Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:40465) contains an incorrect page ID in its page header. The PageId in the page header = (262:589825).

The error has been repaired.

Msg 8909, Level 16, State 1, Line 1

Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:40466) contains an incorrect page ID in its page header. The PageId in the page header = (262:262145).

The error has been repaired.

Msg 8909, Level 16, State 1, Line 1

Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:40467) contains an incorrect page ID in its page header. The PageId in the page header = (262:1441793).

The error has been repaired.

CHECKTABLE found 0 allocation errors and 6 consistency errors not associated with any single object.

CHECKTABLE fixed 0 allocation errors and 6 consistency errors not associated with any single object.

DBCC results for 'CorruptTable'.

Repair: The page (1:40464) has been deallocated from object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data).

Repair: The page (1:40465) has been deallocated from object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data).

Repair: The page (1:40466) has been deallocated from object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data).

Repair: The page (1:40467) has been deallocated from object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data).

Msg 8928, Level 16, State 1, Line 1

Object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data): Page (1:40464) could not be processed. See other errors for details.

The error has been repaired.

Msg 8928, Level 16, State 1, Line 1

Object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data): Page (1:40465) could not be processed. See other errors for details.

The error has been repaired.

Msg 8928, Level 16, State 1, Line 1

Object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data): Page (1:40466) could not be processed. See other errors for details.

The error has been repaired.

Msg 8928, Level 16, State 1, Line 1

Object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data): Page (1:40467) could not be processed. See other errors for details.

The error has been repaired.

There are 218534 rows in 2213 pages for object "CorruptTable".

CHECKTABLE found 0 allocation errors and 4 consistency errors in table 'CorruptTable' (object ID 789838126).

CHECKTABLE fixed 0 allocation errors and 4 consistency errors in table 'CorruptTable' (object ID 789838126).

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

dbcc checktable ( 'dbo.CorruptTable')

--------------------------------------

DBCC results for 'CorruptTable'.

There are 218534 rows in 2213 pages for object "CorruptTable".

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Paul Randal One Orange Chip Points: 29544 More actions · Answer 6

Please start a new thread rather than continuing an unrelated one.

In this case you were lucky - the pages were linked into the table (probably by the corruption) and didn't have any data on them. You need to figure out how the corruption happened and sort out your backup strategy so that you don't have to run repair next time corruption occurs.

Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

Ranga N SSChasing Mays Points: 642 More actions · Answer 7

Ranga N

SSChasing Mays

Points: 642

September 16, 2009 at 11:49 am

#1054644

Thank you.

nnss14 SSC Rookie Points: 49 More actions · Answer 8

nnss14

SSC Rookie

Points: 49

March 13, 2010 at 1:01 am

#1133162

thank you

abhi620 SSC Enthusiast Points: 126 More actions · Answer 9

abhi620

SSC Enthusiast

Points: 126

May 22, 2011 at 6:11 am

#1328177

thank you

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

Ninja's_RGR'us

SSC Guru

Points: 294069

May 22, 2011 at 9:54 am

#1328197

ya thank you :hehe:.

Akkare SSCrazy Points: 2848 More actions · Answer 11

I got the error for tempdb while executing a script.Pls advise.

Msg 824, Level 24, State 2, Line 74

SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0x55555555; actual signature: 0x85505bd1). It occurred during a read of page (1:1224) in database ID 2 at offset 0x00000000990000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\tempdb.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 12

Ninja's_RGR'us

SSC Guru

Points: 294069

October 18, 2011 at 8:18 am

#1395992

Please start a new thread.