October 22, 2020 at 9:20 pm
Well, I'm confused... the query I gave should have shown you the rollback status on the query. Since sysprocesses and sp_who2 are both saying it is rolling back, my next thought is that it MIGHT be related to the linked server.
If I read your comments right, the job calls a stored procedure which calls stored procedures on other servers over a linked server connection. On these other servers, have you killed the SPIDs there that were called by this one? What I expect (I've not killed a SPID that called a stored procedure across a linked server before) is that you will need to kill the SPIDs on the linked servers as well before this one will be able to complete the rollback. I am actually not sure how SQL handles that with a rollback across a linked server, but I imagine it would be quite messy to manage and need a lot of "working" room, hence the tempdb growth. Especially if it has 43 hours of data changes to chase down and undo.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
October 22, 2020 at 9:28 pm
thanks again Brian for your continued assistance,
there is currently nothing processing on the linked server that i can see as being called from this server,
what i think is strange is that the code that the spid appears to have been running referrences xp_LogInfo and teh last batch being " insert #nt select distinct domain+N'\'+name, sid, sidtype from OpenRowset(TABLE NETUSERGETGROUPS, @acctnam" which is what originally made me think it was related to the job owner resetting there password as the activity monitor didnt even seem to make show it making it to step 1
it was a job that was set to run every half hour, usually only for a minute or so, it ran fine on the 19th at 15:00 but then just stuck on step zero on the 15:30 scheduled run until i bounced the server agent service at 19:00 today, just over 3 hours ago
the spid was killed at 11:24 on the 21st
---------------------------------------------------------------------------------------
The more you know, the more you know that you dont know
October 22, 2020 at 9:54 pm
Sorry I couldn't be more help, but I am stumped on this one.
I am confident that restarting the SQL Server service will fix the issue, but I am confused why it isn't rolling back the way I expect it to (the query I sent you should have given you something for that rollback). And I wouldn't recommend restarting the service except as a last resort or if you have some other reason to restart it (such as patching), so I am hoping someone else will jump into this thread with a suggestion before you go the service restart route.
If you have some other maintenance window coming up (Windows/SQL patches) the reboot from those should fix the issue. Although not sure how long a failover will take or how long it will take to come back online.
What about if you run:
KILL 205 WITH STATUSONLY
I'm thinking it won't tell you much, but maybe give you an idea on the rollback status. I'm thinking it iwll say 0% and 0 seconds remianing, but doesn't hurt to try that one out. STATUSONLY won't kill the SPID but will tell you how much longer for the rollback to complete and the rollback percent.
I am wondering though if SQL might have just gotten stuck and will need a service restart to fix that.
Yesterday was a weird day for SQL for us too. An SSRS report that normally gets its data in 200 ms took almost 2 hours to get the data creating a lot of blocking for us.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
October 22, 2020 at 9:58 pm
no worries Brian it has been good to get somebody else looking at with me,
yeah the kill with status only returns "SPID 205: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds" as you suspected,
our next patching window is a couple of weeks away and the way the tempdb log file is growing i will be out of space in less than 1 day,
looks like i might have to restart the service and hope for the best...
will let you know how i get on and thanks again for your suggestions
---------------------------------------------------------------------------------------
The more you know, the more you know that you dont know
October 23, 2020 at 9:07 am
Hi Guys,
Thanks for all of your assistance,
I have just arranged for the SQL Server to be rebooted,
Due to some systems we have i cannot restart the services without other servers being rebooted in a controlled manner,
The work order had literally just been approved when suddenly my TempDB log cleared and the killed SPID disappeared,
Looks like it may have been doing work in the background but for whatever reason was not showing that way on any queries that were run to try and gauge the progress,
I will now let the scheduled patching runs take care of shrinking the log file down and cancel the reboots,
Brian, i would like to thank you personally for your ideas and being a sounding board on this one, as a lone DBA at my shop i was going a bit nuts as the clock approached Midnight and i hadn't got anywhere, having somebody, albeit virtually, by my side was a great help in keeping me (somewhat) sane :),
Thanks again
Shayn
---------------------------------------------------------------------------------------
The more you know, the more you know that you dont know
October 23, 2020 at 12:59 pm
Gotta love it when SQL can't guess how long the rollback will take so it tells you 0% and 0 seconds. I've seen that before and there is that panic moment when your query has been rolling back for hours (or in your case over a day) and you see it is 0% complete and think it MIGHT take over 100 days to complete at this rate when suddenly it just goes to 100% and things start working.
I'd also like to say thanks to Jeff for trying too. Had it been an xp_cmdshell related thing, he had some thoughts on how to correct that.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
October 23, 2020 at 1:03 pm
aye its a cracking feature of SQL haha
yeah of course thanks to Jeff and Cebisa too 🙂
---------------------------------------------------------------------------------------
The more you know, the more you know that you dont know
October 23, 2020 at 2:15 pm
Gotta love it when SQL can't guess how long the rollback will take so it tells you 0% and 0 seconds. I've seen that before and there is that panic moment when your query has been rolling back for hours (or in your case over a day) and you see it is 0% complete and think it MIGHT take over 100 days to complete at this rate when suddenly it just goes to 100% and things start working.
I'd also like to say thanks to Jeff for trying too. Had it been an xp_cmdshell related thing, he had some thoughts on how to correct that.
Actually, SQL Server is correct at 0% and 0 seconds and has done as much as it can do to kill the spid. But it's only looking at what the SPID needs to do and not the operating system. I strongly suspect either the job or the xp_LoginInfo has a "hook" to the operating system (actually, I'm pretty sure that xp_LoginInfo does because it keeps up with Active Directory changes). Killing a SPID doesn't kill the connection to the operating system if there is one. You have to find the PID (with no "S" and is quite different from a SPID) and kill the "tree" for that PID at the operating system level. It's pretty easy to find and kill an xp_CmdShell process (just look for CMD.exe in Task Manager) but it's not so easy for some of the other stuff like xp_LoginInfo.
Hopefully, when the op says he's made arrangements for the SQL Server to be rebooted, he means the whole thing and not just bouncing the SQL Server Service. If you only do the latter, it will certainly break the connection of the SPID to the PID, but it will not stop the PID. If it's still running and using resources, you'll still have the same problem but won't be aware of it.
I actually have code that can find and kill a bad PID for xp_CmdShell that's written in T-SQL with (ironically) a call to xp_CmdShell to find and kill the bad PID. I just don't know what to look for for xp_LoginInfo.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 8 posts - 16 through 22 (of 22 total)
You must be logged in to reply to this topic. Login to reply