August 10, 2018 at 4:59 pm
Around 12:51 this morning I was paged because the mssql service had restarted unexpectedly. I investigate and found what appear to be a memory issue in a backup process:
SQLVDI: Loc=SignalAbort. Desc=Client initiates abort. ErrorCode=(0). Process=10756. Thread=24412. Client. Instance=SERVERNAME. VD=Global\//LOCATION/Database_Log_201808100052.safe_SQLVDIMemoryName_0.
The mssql service came back online and everything appear to be okay until later in the day when a simple process ran to update flags on a table...at which I noticed blocking...I did a few standard things and checked the stats, indexes, etc and then noticed there was a lingering transaction from 12:51am
Transaction information for database 'CORELIBRARY'.
Oldest active transaction:
SPID (server process ID): 518
UID (user ID) : -1
Name : user_transaction
LSN : (365886:1284741:19)
Start time : Aug 10 2018 12:51:39:473AM
SID : 0x010500000000000515000000be043e32bb0ada1107e53b2b5f130000
Replicated Transaction Information:
Oldest distributed LSN : (365887:3062532:5)
Oldest non-distributed LSN : (365886:1284741:19)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Any attempts to update the table or drop a non-clustered index causes blocking from a LCK_M_SCH_M wait type, and running any integrity check (CHECKDB or CHECK TABLE) results in the 7929 error.
I've dug around and found the transaction in the transaction log and believe it to be related to a non-committed replication command that never got comitted to the subscriber, but am unable to resolve the issue.
At this point, I don't know what else to do...
My last report plan for this evening is:
ALTER DATABASE CORELIBRARY SET EMERGENCY
ALTER DATABASE CORELIBRARY SET SINGLE_USER
DBCC CHECKTABLE ('achstlmtIntermediate') WITH NO_INFOMSGS, ALL_ERRORMSGS
--(Hope there's nothing out of the ordinary here)
DBCC CHECKDB ('corelibrary') WITH NO_INFOMSGS, ALL_ERRORMSGS
--(Pray its just a bad non-clustered index so I can rebuild it and be done with it)
ALTER DATABASE CORELIBRARY SET ONLINE
ALTER DATABASE CORELIBRARY SET MULTI_USER
If all goes well, enjoy Friday night...
Has anyone ran into this before? And other than a CHECKDB with allowing data loss, what steps did you take to resolve it?
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
August 11, 2018 at 1:00 am
How did you get on with this?
😎
Last time I ran into the 7929, I ended up restoring from the latest backups (5 min transaction log backup interval so not much of a risk).
August 11, 2018 at 9:09 am
We never did the maintenance (which is good because I'd like more time to research).
I'm going to try and restore a copy of the database from after the service restart in efforts to test against it first.
When I try to run an update against the table in question, it hangs and I immediately get a wait_type = LCK_M_U and a wait_resource = PAGE: 11:4:20851363. When I look at what's on the page:DBCC TRACEON(3604)
I get:
GO
DBCC PAGE(11,4,20851363)
GO
DBCC TRACEOFF(3604)
GO
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
PAGE: (4:20851363)
BUFFER:
BUF @0x000000D9A4B6A680
bpage = 0x000000D8E480C000 bhash = 0x0000000000000000 bpageno = (4:20851363)
bdbid = 11 breferences = 0 bcputicks = 0
bsampleCount = 0 bUse1 = 31922 bstat = 0x9
blog = 0x7a7a7a7a bnext = 0x0000000000000000
PAGE HEADER:
Page @0x000000D8E480C000
m_pageId = (4:20851363) m_headerVersion = 1 m_type = 2
m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 2635 m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594210615296
Metadata: PartitionId = 72057594194165760 Metadata: IndexId = 53
Metadata: ObjectId = 933578364 m_prevPage = (4:20806697) m_nextPage = (4:7397261)
pminlen = 461 m_slotCnt = 13 m_freeCnt = 1309
m_freeData = 7905 m_reservedCnt = 0 m_lsn = (365887:2468408:9)
m_xactReserved = 0 m_xdesId = (1:-1848951422) m_ghostRecCnt = 0
m_tornBits = -1926036539 DB Frag ID = 1
Allocation Status
GAM (4:20449280) = ALLOCATED SGAM (4:20449281) = NOT ALLOCATED
PFS (4:20850864) = 0x40 ALLOCATED 0_PCT_FULL DIFF (4:20449286) = CHANGED
ML (4:20449287) = NOT MIN_LOGGED
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Kind of stuck from here...
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
August 11, 2018 at 9:43 am
MyDoggieJessie - Saturday, August 11, 2018 9:09 AMWe never did the maintenance (which is good because I'd like more time to research).
I'm going to try and restore a copy of the database from after the service restart in efforts to test against it first.When I try to run an update against the table in question, it hangs and I immediately get a wait_type = LCK_M_U and a wait_resource = PAGE: 11:4:20851363. When I look at what's on the page:
DBCC TRACEON(3604)
I get:
GO
DBCC PAGE(11,4,20851363)
GO
DBCC TRACEOFF(3604)
GO
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
PAGE: (4:20851363)
BUFFER:
BUF @0x000000D9A4B6A680bpage = 0x000000D8E480C000 bhash = 0x0000000000000000 bpageno = (4:20851363)
bdbid = 11 breferences = 0 bcputicks = 0
bsampleCount = 0 bUse1 = 31922 bstat = 0x9
blog = 0x7a7a7a7a bnext = 0x0000000000000000PAGE HEADER:
Page @0x000000D8E480C000m_pageId = (4:20851363) m_headerVersion = 1 m_type = 2
m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 2635 m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594210615296
Metadata: PartitionId = 72057594194165760 Metadata: IndexId = 53
Metadata: ObjectId = 933578364 m_prevPage = (4:20806697) m_nextPage = (4:7397261)
pminlen = 461 m_slotCnt = 13 m_freeCnt = 1309
m_freeData = 7905 m_reservedCnt = 0 m_lsn = (365887:2468408:9)
m_xactReserved = 0 m_xdesId = (1:-1848951422) m_ghostRecCnt = 0
m_tornBits = -1926036539 DB Frag ID = 1Allocation Status
GAM (4:20449280) = ALLOCATED SGAM (4:20449281) = NOT ALLOCATED
PFS (4:20850864) = 0x40 ALLOCATED 0_PCT_FULL DIFF (4:20449286) = CHANGED
ML (4:20449287) = NOT MIN_LOGGEDDBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.Kind of stuck from here...
What is the object with the id of 933578364 ?
😎
August 11, 2018 at 12:45 pm
Oh, that's the actual table name.
I did try restoring a copy of the database at a point it time that was about 15 minutes after the database was brought back online (which completed successfully) - in checking that copy, the problem didn't exist. I'm not sure if that's because the uncommitted transaction rolled forward or back, but couldn't find anything specific in the SQL Server Error log other than it had restored successfully.
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
August 13, 2018 at 8:25 pm
Resolving this issue for me was simply a matter or placing the DB in EMERGENCY mode and bringing it back online:ALTER DATABASE PROBLEM_CHILD SET EMERGENCY
GO
Once it was in that state I ran a DBCC command against the table I found the issue with (which completed successfully), then brought it back online without issue (and noticed 2 transactions that were finally rolled back!
The error log showed:
Setting database option EMERGENCY to ON for database 'PROBLEM_CHILD'.
DBCC CHECKTABLE ('AVeryBadTable') WITH all_errormsgs, no_infomsgs executed by MyDoggieJessie found 0 errors and repaired 0 errors. Elapsed time: 0 hours 2 minutes 14 seconds.
Recovery of database 'PROBLEM_CHILD' (11) is 0% complete (approximately 45201 seconds remain).
Recovery of database 'PROBLEM_CHILD' (11) is 20% complete (approximately 156 seconds remain).
Recovery of database 'PROBLEM_CHILD' (11) is 30% complete (approximately 138 seconds remain).
....
Recovery of database 'PROBLEM_CHILD' (11) is 90% complete (approximately 16 seconds remain).
2 transactions rolled back in database 'PROBLEM_CHILD'
Recovery is writing a checkpoint in database 'PROBLEM_CHILD'
Recovery completed for database PROBLEM_CHILD(database ID 11) in 195 second(s) (analysis 5 ms, redo 177742 ms, undo 5986 ms.)
Once that all completed everything returned to normal.
I think i got very lucky....and I'll take that.
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply