March 25, 2009 at 12:11 pm
I have not yet repaired the msdb error ...
So, I started with
DBCC CHECKDB (msdb) WITH ALL_ERRORMSGS, NO_INFOMSGS
I recieve the following:
Msg 8967, Level 16, State 216, Line 1
An internal error occurred in DBCC that prevented further processing. Contact Customer Support Services.
Now, I executed DBCC CHECKTABLE ('backupmediafamily') to find out what table had the error.
Msg 8967, Level 16, State 216, Line 1
An internal error occurred in DBCC that prevented further processing. Contact Customer Support Services.
Then, I decided to attempt to SELECT * FROM dbo.backupmediafamily, but this actually worked properly.
I decided to try the second index, SELECT media_family_id FROM backupmediafamily, and BINGO!
Msg 824, Level 24, State 2, Line 1
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:54080; actual 775:-653262834). It occurred during a read of page (1:54080) in database ID 4 at offset 0x0000001a680000 in file 'D:\MSSQL\Data\MSSQL.3\MSSQL\DATA\MSDBData.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
I think I have this tracked down to a physical hard drive issue on our SAN, but I have no way to verify that. The telling factor is that this same exact page was the issue on the 5th with a completely different table (we fixed that issue with the DBCC CHECKDB command in my previous post). According to MSDB.dbo.suspect_pages, I have 8 pages (1 64-KB block of physical data) that is suspect:
DB-ID File Page Event count last error ts
4154080713492009-03-05 14:39:26.057
41540857132009-03-05 14:39:26.090
4154081712009-03-05 14:39:26.057
4154082122009-03-22 00:30:02.980
4154085152009-03-23 17:45:03.477
4154083162009-03-24 12:00:14.117
4154086122009-03-22 16:00:09.950
4154084142009-03-24 02:30:04.110
41540801102009-03-25 13:04:39.470
41540873122009-03-25 11:33:23.530
As you can see, I have errors on the 5th and the 25th for the same pages 54080 and 54081. I have to now attempt to locate the actual physical address of these pages on the hard disk. Any ideas??
March 25, 2009 at 2:08 pm
No idea how to tell what the physical address of the pages are on the hard disk - all I can say is that the offset of a page within the SQL data file is the page number x 8192 bytes. If NTFS has split the actual file over multiple contiguous chunks of disk, that makes things even harder, then you're dealing with how RAID is setup, drive geometries etc etc. But really all you want to know is which physical drive/array is the problem with - and then replace it.
The 8967 is a known bug, when you already have corruption (key point) - see CHECKDB bug that people are hitting - Msg 8967, Level 16, State 216
Regardless, you need to move the database onto different storage - the fact that the same errors occured on different days says that drive (or RAID array) is toast.
Thanks
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
March 26, 2009 at 10:48 am
Paul, I have some really interesting information that I would like to get your opinion on. We currently have a support ticket open with Hitachi to verify integrity on the SAN, but I was digging a little deeper and thought this was most interesting.
DBCC TRACEON (3604, -1)
DBCC PAGE (msdb, 1, 54080, 3)
PAGE: (775:-653262834)
BUFFER:
BUF @0x00000000A1FD1200
bpage = 0x00000000A1448000 bhash = 0x0000000000000000 bpageno = (1:54080)
bdbid = 4 breferences = 0 bUse1 = 54470
bstat = 0x3c00809 blog = 0x21212159 bnext = 0x0000000000000000
PAGE HEADER:
Page @0x00000000A1448000
m_pageId = (775:-653262834) m_headerVersion = 6 m_type = 46
m_typeFlagBits = 0x0 m_level = 129 m_flagBits = 0x0
m_objId (AllocUnitId.idObj) = 67111936 m_indexId (AllocUnitId.idInd) = 768
Metadata: AllocUnitId = 216177180361621504 Metadata: PartitionId = 0
Metadata: IndexId = -1 Metadata: ObjectId = 0 m_prevPage = (0:132096)
m_nextPage = (2:-2105506296) pminlen = 4 m_slotCnt = 0
m_freeCnt = 57 m_freeData = 0 m_reservedCnt = 1536
m_lsn = (16782592:637544192:1792) m_xactReserved = 5377 m_xdesId = (0:66560)
m_ghostRecCnt = 22 m_tornBits = 1089300228
Allocation Status
GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED
PFS (1:48528) = 0x40 ALLOCATED 0_PCT_FULL DIFF (1:6) = NOT CHANGED
ML (1:7) = NOT MIN_LOGGED
Msg 2514, Level 16, State 5, Line 1
A DBCC PAGE error has occurred: Invalid page type - dump style 3 not possible.
I also tried dump style 2, but I am not going to post that ugly mess on here 🙂
What I found MOST interesting is the AllocUnitId ... I took the ID number and ran this:
SELECT * FROM msdb.sys.allocation_units WHERE allocation_unit_id = 216177180361621504
Empty Set ... There was no record of that allocation unit for 5 pages 54080, 54082, 54083, 54084, 54085 ... and they all appear to have the same issue. Page 54081 is in a different allocation unit, and it returned 144 rows properly using the dump style 3 flag.
I am waiting to hear from Hitachi if there is anything on the SAN that might be the root cause, and if not, I am going to start asking about testing the other components in the physical server and connecting the server to the SAN. Do you have any additional ideas or suggestions?
March 26, 2009 at 12:46 pm
This looks just like I/O subsystem corruption - the page is full of garbage.
I'd like you to turn on page checksums, which will catch when the I/O subsystem is causing problems.
ALTER DATABASE yourdb SET PAGE_VERIFY CHECKSUM
See How to tell if the IO subsystem is causing corruptions?
Thanks
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
September 16, 2009 at 9:02 am
hi Paul,
I had a database that checkdb reported errors, I did a checktable for all the tables and few tables reported problems. See the output from each step. Interestingly, the row count and page count remained the same after checktable with REPAIR_ALLOW_DATA_LOSS option was run, So, I am wondering were there any data loss ? Please share your thoughts.
Thanks,
Ranga
dbcc checktable ( 'dbo.CorruptTable', REPAIR_REBUILD )
------------------------------------------------------
Msg 8939, Level 16, State 5, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page (1:40464). Test (m_headerVersion == HEADER_7_0) failed. Values are 6 and 1.
The repair level on the DBCC statement caused this repair to be bypassed.
Msg 8939, Level 16, State 7, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page (1:40464). Test (m_freeData >= PAGEHEADSIZE && m_freeData <= (UINT)PAGESIZE - m_slotCnt * sizeof (Slot)) failed. Values are 0 and 8186.
Repairing this error requires other errors to be corrected first.
Msg 8939, Level 16, State 8, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page (1:40464). Test (m_freeCnt = PAGEHEADSIZE && m_freeData <= (UINT)PAGESIZE - m_slotCnt * sizeof (Slot)) failed. Values are 0 and 8186.
The error has been repaired.
Msg 8939, Level 16, State 8, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page (1:40464). Test (m_freeCnt <= PAGESIZE - PAGEHEADSIZE) failed. Values are 44123 and 8096.
The error has been repaired.
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:40465) contains an incorrect page ID in its page header. The PageId in the page header = (262:589825).
The error has been repaired.
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:40466) contains an incorrect page ID in its page header. The PageId in the page header = (262:262145).
The error has been repaired.
Msg 8909, Level 16, State 1, Line 1
Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:40467) contains an incorrect page ID in its page header. The PageId in the page header = (262:1441793).
The error has been repaired.
CHECKTABLE found 0 allocation errors and 6 consistency errors not associated with any single object.
CHECKTABLE fixed 0 allocation errors and 6 consistency errors not associated with any single object.
DBCC results for 'CorruptTable'.
Repair: The page (1:40464) has been deallocated from object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data).
Repair: The page (1:40465) has been deallocated from object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data).
Repair: The page (1:40466) has been deallocated from object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data).
Repair: The page (1:40467) has been deallocated from object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data).
Msg 8928, Level 16, State 1, Line 1
Object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data): Page (1:40464) could not be processed. See other errors for details.
The error has been repaired.
Msg 8928, Level 16, State 1, Line 1
Object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data): Page (1:40465) could not be processed. See other errors for details.
The error has been repaired.
Msg 8928, Level 16, State 1, Line 1
Object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data): Page (1:40466) could not be processed. See other errors for details.
The error has been repaired.
Msg 8928, Level 16, State 1, Line 1
Object ID 789838126, index ID 0, partition ID 51762831425536, alloc unit ID 51762831425536 (type In-row data): Page (1:40467) could not be processed. See other errors for details.
The error has been repaired.
There are 218534 rows in 2213 pages for object "CorruptTable".
CHECKTABLE found 0 allocation errors and 4 consistency errors in table 'CorruptTable' (object ID 789838126).
CHECKTABLE fixed 0 allocation errors and 4 consistency errors in table 'CorruptTable' (object ID 789838126).
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
dbcc checktable ( 'dbo.CorruptTable')
--------------------------------------
DBCC results for 'CorruptTable'.
There are 218534 rows in 2213 pages for object "CorruptTable".
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
September 16, 2009 at 11:20 am
Please start a new thread rather than continuing an unrelated one.
In this case you were lucky - the pages were linked into the table (probably by the corruption) and didn't have any data on them. You need to figure out how the corruption happened and sort out your backup strategy so that you don't have to run repair next time corruption occurs.
Paul Randal
CEO, SQLskills.com: Check out SQLskills online training!
Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005
September 16, 2009 at 11:49 am
Thank you.
March 13, 2010 at 1:01 am
thank you
May 22, 2011 at 6:11 am
thank you
May 22, 2011 at 9:54 am
ya thank you :hehe:.
October 18, 2011 at 8:16 am
I got the error for tempdb while executing a script.Pls advise.
Msg 824, Level 24, State 2, Line 74
SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0x55555555; actual signature: 0x85505bd1). It occurred during a read of page (1:1224) in database ID 2 at offset 0x00000000990000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\tempdb.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
October 18, 2011 at 8:18 am
Please start a new thread.
Viewing 12 posts - 16 through 26 (of 26 total)
You must be logged in to reply to this topic. Login to reply