April 30, 2007 at 11:55 am
We've been facing application lockup issues with an app that was deployed a couple of years ago. We don't have access to the source code (it's a Over the Shelf package) so we really don't have a way to figure out what's going on. But we recently started monitoring the processes and found the following in the logs that we're writing out:
SYSPROCESSES SQLSERVER 134219767
spid status blocked open_tran waitresource waittype waittime cmd lastwaittype cpu physical_io memusage last_batch login_time net_address net_library dbid ecid kpid hostname hostprocess loginame program_name nt_domain nt_username uid sid sql_handle stmt_start stmt_end
------ ------------------------------ ------- --------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- ----------- ---------------- -------------------------------- ----------- -------------------- ----------- -------------------------- -------------------------- ------------ ------------ ------ ------ ------ -------------------------------------------------------------------------------------------------------------------------------- ----------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- ------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------ ----------- -----------
60 sleeping 0 1 0x0000 0 AWAITING COMMAND WRITELOG� 95 27 0 2007-04-30 11:19:55.360 2007-04-30 10:33:54.930 000000000000 TCP/IP 6 0 0 MDAV6010.MDMZA.dmz.hq 0 sysadm 0 0xC109598C984A5442BD4188E67477FCB000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0x0000000000000000000000000000000000000000 0 0
94 sleeping 60 2 KEY: 6:878626173:1 (01008cff27c2)� 0x0003 63922 INSERT LCK_M_S� 11767 871 5846 2007-04-30 11:19:56.830 2007-04-30 11:05:10.103 000000000000 TCP/IP 6 0 9180 MDAV6010.MDMZA.dmz.hq 0 sysadm 5 0xC109598C984A5442BD4188E67477FCB000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0x01000600534D8A2CC000BA560000000000000000 0 -1
ESP 0
SYSPROC FIRST PASS
spid ecid waittype
------ ------ --------
94 0 0x0003
Blocking via locks at 2007-04-30 11:21:00.770
SPIDs at the head of blocking chains
spid
------
60
We're deducing from the above that process 94 was blocked by process 60. However the DBCC buffer for process 60 indicates that the following statment was excuting when the blocking happened:
*********************************************************************
Print out DBCC Input buffer for all blocked or blocking spids.
*********************************************************************
DBCC INPUTBUFFER FOR SPID 60
EventType Parameters EventInfo
--------- ---------- ----------------
RPC Event 0 sp_cursorclose;1
The application typically locks up completely after 10-15 mins and here's the list of processes that were listed the last time it locked up :
SYSPROCESSES SQLSERVER 134219767
spid status blocked open_tran waitresource waittype waittime cmd lastwaittype cpu physical_io memusage last_batch login_time net_address net_library dbid ecid kpid hostname hostprocess loginame program_name nt_domain nt_username uid sid sql_handle stmt_start stmt_end
------ ------------------------------ ------- --------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- ----------- ---------------- -------------------------------- ----------- -------------------- ----------- -------------------------- -------------------------- ------------ ------------ ------ ------ ------ -------------------------------------------------------------------------------------------------------------------------------- ----------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- ------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------ ----------- -----------
60 sleeping 0 1 0x0000 0 AWAITING COMMAND WRITELOG� 95 27 0 2007-04-30 11:19:55.360 2007-04-30 10:33:54.930 000000000000 TCP/IP 6 0 0 MDAV6010.MDMZA.dmz.hq 0 sysadm 0 0xC109598C984A5442BD4188E67477FCB000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0x0000000000000000000000000000000000000000 0 0
67 sleeping 94 2 KEY: 6:878626173:1 (01008cff27c2)� 0x0003 1136687 INSERT LCK_M_S� 16592 612 7590 2007-04-30 11:22:03.957 2007-04-30 09:36:37.390 000000000000 TCP/IP 6 0 7876 MDAV6010.MDMZA.dmz.hq 0 sysadm 5 0xC109598C984A5442BD4188E67477FCB000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0x01000600E68BDC07C0C06D570000000000000000 0 -1
94 sleeping 60 2 KEY: 6:878626173:1 (01008cff27c2)� 0x0003 1263812 INSERT LCK_M_S� 11767 871 5846 2007-04-30 11:19:56.830 2007-04-30 11:05:10.103 000000000000 TCP/IP 6 0 9180 MDAV6010.MDMZA.dmz.hq 0 sysadm 5 0xC109598C984A5442BD4188E67477FCB000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0x01000600534D8A2CC000BA560000000000000000 0 -1
105 sleeping 94 0 KEY: 6:878626173:1 (01008cff27c2)� 0x0003 1107484 EXECUTE LCK_M_S� 16 0 4 2007-04-30 11:22:33.177 2007-04-30 11:22:02.050 000000000000 TCP/IP 6 0 9760 MDAV6010.MDMZA.dmz.hq 0 sysadm 5 0xC109598C984A5442BD4188E67477FCB000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0x030001007974E51D010000000000000000000000 0 -1
ESP 0
SYSPROC FIRST PASS
spid ecid waittype
------ ------ --------
94 0 0x0003
67 0 0x0003
105 0 0x0003
Blocking via locks at 2007-04-30 11:41:00.690
SPIDs at the head of blocking chains
spid
------
60
This gives us reason to believe that process #60 never finished and held on to critical resources that the application needed to complete the other transactions. When ever this happens, we have to manually go in and kill the 'root' process (process #60 in the instance above) and shutdown the applications instance and restart it. The problem is that this is a 'mission critical' application and a lot of people hear about it when it needs to be reset manually. It there any way this can be fixed at the DB Level ? Does anyone know of any related issues on the JDBC, DTC and/or SQL server side that could be causing this ?
And b.t.w., the application is a J2EE based system that uses JDBC and XA Transactions. We've got JDBC for SQL Server SP3 with JBoss 3.2.3 as the application server.
Any help will be truely appreciated.
Thanks
Gurvinder
July 23, 2007 at 12:12 am
Hi
In our application we are facing the same issue. By any chance did you find any known issues regarding this?
Thanks,
Sugi
July 23, 2007 at 12:20 am
Unfortunately we have not been able to find a suitable solution. We ended up writing a SQLAgent job that runs every 10 mins and kills any processes that are hung or have open ended transactions. So fare we've not run into any issues. We'll have to wait and see. We have managed to stablize the application quite a bit with this process running. We not hang maybe once in a couple of weeks. Earlier we had to recover 2-3 times a day ! So it's on a back burner for now !
July 24, 2007 at 12:55 pm
Please could you share the script that kills any hung or open ended transactions.
We are getting alerts and outages once in 2 days.
this will be of great help.
Regards,
Sugeetha
July 25, 2007 at 9:39 am
Have you guys tried to debug the issue with the software vendor?
July 25, 2007 at 11:35 am
To add it we are using Microsoft JDBC type 4 drivers and they seem to have isssues when managing transactions when the resource involved is 2 databases present in two different servers.
Does anybody know about known issues with MS JDBC driver?
When i did a search on these driver I saw that they run into issues when using these fire hose cursors.
But what bothers me is that it locks up the whole SQL server. We are not able to access any databases in that server. The SQL server fully closes up.
I am still trying to contact micorosoft support on that.
July 25, 2007 at 11:36 am
Have you managed to come up with a method of approaching what content is hanging? For all you know, it's poorly edited input from a particular end-user. Train the end-user to do it right, and the problem may go away.
Sometimes software is NOT the answer. But it can sure help to improve the quality of the question.
Also, there may have been system changes - the monthly Tuesday updates from our friends in Redmond - that adversely impacted the application. Appropriate question to ask the vendor, if the vendor is still in business. If the vendor is no longer in business, and no one is supporting the product, how much longer will you use the product? This could end up being a rear-guard action that requires no further effort.
July 27, 2007 at 7:57 am
I had a similar problem and eventually found a SP on the web which I extracted into a Job (agent). It's interactive but it does sort out the immediate issue;
/*
block-sniper.sql
Function: This script identifies the blocking locks at the top of the blocking chain and kills them.
Specifically, it reports what the top blocking spids are doing, kills them, waits three seconds and then reports on current blocking status.
Questions: See header above.
Updates: http://www.integer.org/cjm/files/block-sniper.sql
copyright (C) 2003 Cameron Michelis copying and redistribution of this file is permitted provided this notice and the above comments are preserved.
*/
use master
go
select getdate()
go
-- Report on blocking spids to be terminated.
select blocked as "Blocker"
,count(*) as "Victim Count"
from master..sysprocesses
where blocked > 0
group by blocked
go
DECLARE @blocker varchar(75)
DECLARE blocker_cursor CURSOR FOR select distinct blocked from sysprocesses where blocked !=0
OPEN blocker_cursor
FETCH NEXT FROM blocker_cursor INTO @blocker
WHILE (@@fetch_status <> -1)
BEGIN
IF (@@fetch_status = -2)
BEGIN
FETCH NEXT FROM blocker_cursor INTO @blocker
CONTINUE
END
-- Only kill process if it is the leader and more than 5 seconds old.
IF @blocker not in (select spid from sysprocesses where blocked != 0)
AND (select datediff(ss,last_batch,getdate()) FROM sysprocesses where blocked = @blocker) > 5
BEGIN
print 'killing head blocker, spid #' + @blocker + ' who is blocking connections with this command: '
print ''
exec ('dbcc inputbuffer (' + @blocker + ')')
exec ('kill ' + @blocker ) -- kill
FETCH NEXT FROM blocker_cursor INTO @blocker
-- Display Remaining Blocking Spids
waitfor delay '00:00:03' --wait three seconds for the blocks to clear, then display status
print ''
print 'Remaining Blockers...'
print ''
select blocked as "Blocker"
,count(*) as "Victim Count"
from master..sysprocesses
where blocked > 0
group by blocked
END
ELSE
FETCH NEXT FROM blocker_cursor INTO @blocker
END
DEALLOCATE blocker_cursor
print 'Process Complete...'
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply