April 11, 2012 at 2:38 am
Hello,
I hope somebody can help me, I am getting the followng error on my SQL Server 2008 R2 64bit machine.
Time out occurred while waiting for buffer latch -- type 2, bp 00000007BEFFE000, page 1:1396234, stat 0x4c00309, database id: 7, allocation unit id: 72057594080591872, task 0x0000000007A31828 : 1, waittime 300, flags 0x1a, owning task 0x0000000004231AC8. Continuing to wait.
This error follows a stack dump, I believe the transaction logs were full at the time this happened. If I look at sp_who2, I notice that there is a lot of Suspended DELETE commands for the database
, there are about 20 under the same SPID number as well as 5 others under another SPID number. Should I kill these processes?
I have tried running a DBCC CheckDB for this database but it fails, saying it could not get exclusive access and failed to take a snapshot.
Not sure what to do here, google is coming up with similar messages, but they do not apply. Any help or advice would be very appreciated.
UPDATE: I notice there is a spid number in the BlkBY column in SP_WHO2 and it appears to be blocking several others, is it safe to kill the blocking spid? The spid also appears 3 times in the SPID column and is a suspended DELETE command.
Regards,
D.
April 11, 2012 at 5:57 am
Ok, but if I do that how will that help the production one currenlty being used? I think I MIGHT know what the issue is, there appears to be a lead block blocking a few other SPIDS, its has 2 transactions open and appears to be blocking itself. I have run a small SQL script and found that there are two spids with the same number both trying the same delete command. I am tempted to kill the one that has the two open transactions.
What do you think?
Regards,
D.
April 12, 2012 at 2:57 am
I could not do the backup because I could not get a lock on the database. So I have taken the decision to kill the process, currently it is in a Killed/Rollback state, so I am going to give it a day or so to rollback and go from there. If nothing has happened after that I shall have to consider restarting the server. My only question mark over this is that Kill 53 with Statusonly shows...
SPID 53: transaction rollback in progress. Estimated rollback completion: 0%. Estimated time remaining: 0 seconds.
But it is still showing up in SP_WHO2, and still appears to be blocking other SPIDS.
Regards,
Paul.
April 12, 2012 at 10:04 am
Hello,
I think the SPID is blocking up the Tempdb database, I'm wondering if perhaps I need to just restart the instance to get round this. If I run DBCC Opentran('tempdb') I get the following result
Transaction information for database 'tempdb'.
Oldest active transaction:
SPID (server process ID): 53
UID (user ID) : -1
Name : user_transaction
LSN : (2739:376:118)
Start time : Apr 8 2012 9:59:08:557AM
SID : 0xdc7cc23d680c324cb43a0882cb1a3a92
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
I notice the TempDB log is now getting quite large. Can I clear this by restarting the instance?
The latch timeout refers to a user database, but this does not have any open transactions on it, this leads me to think the TempDB might be the root of the problem.
Regards,
D.
April 12, 2012 at 11:43 pm
Duran (4/12/2012)
Hello,I think the SPID is blocking up the Tempdb database, I'm wondering if perhaps I need to just restart the instance to get round this. If I run DBCC Opentran('tempdb') I get the following result
Transaction information for database 'tempdb'.
Oldest active transaction:
SPID (server process ID): 53
UID (user ID) : -1
Name : user_transaction
LSN : (2739:376:118)
Start time : Apr 8 2012 9:59:08:557AM
SID : 0xdc7cc23d680c324cb43a0882cb1a3a92
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
I notice the TempDB log is now getting quite large. Can I clear this by restarting the instance?
The latch timeout refers to a user database, but this does not have any open transactions on it, this leads me to think the TempDB might be the root of the problem.
Regards,
D.
I found a similar post on SSC check http://www.sqlservercentral.com/Forums/Topic828267-266-1.aspx#bm828438. I see you have not mentioned the results of running DBCC Checkdb on a restored copy of the database. If you did run the dbcc checkdb on the restored copy of the database please let me know what were the results. You may want to be sure if your database is corrupt before you restart the server. If you have a twitter account you can use #sqlhelp, you may get lucky and maybe Paul Randal, Brent Ozar or any other MVP will help you with this.
April 13, 2012 at 5:25 am
Hello,
Thanks for getting back, I cannot get a restore done that quickly, but its in motion, I'll check out the link now.
Regards,
D.
April 16, 2012 at 4:53 am
Hello,
In case anyone out there has the same issue, I dont want them to get to the end of this thread and find no resolution, so I'll just say that after waiting a few days for a rollback to complete I decided that as it was friday afternoon, I'd chance the reboot because the blocking SPID 53 just did not go away, I also ran the profiler on the database (which I totally recommend before making any decisions), I restarted the engine, not the whole server. The service struggled to stop and brought up the warning that states it did not stop in a timely manner. But after waiting a few more minutes it did indeed stop. After restart, the user database went into a restoring state for about 4 minutes and completed successfully, I took a backup straight away and ran DBCC CHECKDB and it all came back good. According to the logs, something like just over 8000 transactions were rolled back. Things now appear to be back to normal.
I believe the problem was caused by too much I/O on the database, at the time of the stack dump, there was a reindex job going and a transaction log backup, so I shall make sure these do not clash in the future. It was not an I/O error hardware wise, first because the other databases in the instance were fine, and the error was pagelatch_sh and not pageIOlatch_sh, maybe I'm wrong about that but this is how I read it. I THINK the user database was being affected indirectly via tempdb as there were no open transactions on the user database, but two on the tempdb and it all seemed to match up a bit too much to be a coincidence of any kind. I may have this wrong but as there was a reindex going on at the time the issue occurred it made sense to me, if I am wrong in thinking this, please let me know.
I also noticed that the tempdb transaction log was very large (approaching 2gb), when normally its not even 50mb, so I felt that was another clue.
Anyway, there you go. Thank you any responses pre and post resolution.
Regards,
D.
April 16, 2012 at 7:38 am
Duran (4/16/2012)
Hello,In case anyone out there has the same issue, I dont want them to get to the end of this thread and find no resolution, so I'll just say that after waiting a few days for a rollback to complete I decided that as it was friday afternoon, I'd chance the reboot because the blocking SPID 53 just did not go away, I also ran the profiler on the database (which I totally recommend before making any decisions), I restarted the engine, not the whole server. The service struggled to stop and brought up the warning that states it did not stop in a timely manner. But after waiting a few more minutes it did indeed stop. After restart, the user database went into a restoring state for about 4 minutes and completed successfully, I took a backup straight away and ran DBCC CHECKDB and it all came back good. According to the logs, something like just over 8000 transactions were rolled back. Things now appear to be back to normal.
I believe the problem was caused by too much I/O on the database, at the time of the stack dump, there was a reindex job going and a transaction log backup, so I shall make sure these do not clash in the future. It was not an I/O error hardware wise, first because the other databases in the instance were fine, and the error was pagelatch_sh and not pageIOlatch_sh, maybe I'm wrong about that but this is how I read it. I THINK the user database was being affected indirectly via tempdb as there were no open transactions on the user database, but two on the tempdb and it all seemed to match up a bit too much to be a coincidence of any kind. I may have this wrong but as there was a reindex going on at the time the issue occurred it made sense to me, if I am wrong in thinking this, please let me know.
I also noticed that the tempdb transaction log was very large (approaching 2gb), when normally its not even 50mb, so I felt that was another clue.
Anyway, there you go. Thank you any responses pre and post resolution.
Regards,
D.
First of all I want to thank you for posting what you did in order to solve your problem. I still would like to know what went exactly wrong. So if anyone has any ideas/tips, I will be more than happy to hear them.
April 17, 2012 at 8:20 am
If I understand this correctly deletes were going on when you were reindexing? Is that correct? You really need to find a time to reindex where there are not heavy transactions going on especially deletes.
I got into an issue the same sort of thing with SQL2008. My issue was similar but different. From time to time I'd run a db backup and it would go so far and then hault. It would never end. Come to find out our AntiVirus software did not exclude the .mdf and .ldf files. Once they were excluded the problem went away. So, if you have AV running on your SQL Server make sure it is excluding your data files!
September 27, 2013 at 12:25 pm
Check this article
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply