Time-out occurred while waiting for buffer latch type 4; tempdb (also posted on ServerFault)

Question

Time-out occurred while waiting for buffer latch type 4; tempdb (also posted on ServerFault)

aurato

SSCrazy

Points: 2575
More actions
July 29, 2011 at 6:36 am

#390086

Three separate processes have failed in the last two days with very similar errors:
Time-out occurred while waiting for buffer latch type 4 for page (1:1189832),
database ID 2.
Time-out occurred while waiting for buffer latch type 4 for page (1:1189827),
database ID 2.
Time-out occurred while waiting for buffer latch type 4 for page (1:1189827),
database ID 2.
Two of them are exactly the same, the other is accessing a slightly different pagenum. Database #2 is tempdb. I tried to run DBCC PAGE(tempdb, 1, 1189827, 1) but the output is just gibberish to me. I'm fairly lost.
Does anyone know what the root cause of these timeouts might be? (I've seen dozens of suggestions on other forums, among which was memory corruption)
Any help would be greatly appreciated. If you would like to read the DBCC PAGE output, ask and I will post it.

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply

aurato SSCrazy Points: 2575 More actions · Answer 1

Link to serverfault question:

http://serverfault.com/questions/295479/mssql-2005-tempdb-time-out-occurred-while-waiting-for-buffer-latch-type-4-pagenu

Reply email from my boss:

TempDB is deleted and automatically created each time the system is restarted.

When it is created, it is created as a copy of Master.

Please adjust master and make sure it is as large as a standard tempdb

Please make sure that at least 10% free space will remain on the drive where tempdb is located after the new size and next reboot.

Please schedule a reboot of <server> for this weekend.

HowardW One Orange Chip Points: 29109 More actions · Answer 2

It's recreated from Model, not Master.

The area I would be looking at is the disk subsystem. Do you run anything like Diskeeper, or run on virtual infrastructure that could block acess to an I/O request for a long period?

If the server hasn't been restarted, it's probably a good first step, followed by disk checking tools, physically moving tempdb somewhere else on disk etc.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 3

aurato (7/29/2011)
Link to serverfault question:
http://serverfault.com/questions/295479/mssql-2005-tempdb-time-out-occurred-while-waiting-for-buffer-latch-type-4-pagenu
Reply email from my boss:
TempDB is deleted and automatically created each time the system is restarted.

Correct so far (didn't read the link)

aurato (7/29/2011)
When it is created, it is created as a copy of Master.

No it is built from model (all databases are). That's where you put logins, objects that you automatically want to have created in all new dbs.

aurato (7/29/2011)
Please adjust ~~master~~ model and make sure it is as large as a standard tempdb

No again, you can set the size of tempdb and it'll use that every time and grow if this is not enough.

Here's a sample script you can run right away and the resize will be immediate.

USE [master]

GO

ALTER DATABASE [tempdb] MODIFY FILE ( NAME = N'tempdev', SIZE = 8388608KB )

GO

aurato (7/29/2011)
Please make sure that at least ~~10%~~ 15% (and that is very little space) free space will remain on the drive where tempdb is located after the new size and next reboot.
Please schedule a reboot of <server> for this weekend.

If by schedule he means get back in the office and run that manually then I agree. But NEVER do a restart without being availble in case something goes wrong.

aurato SSCrazy Points: 2575 More actions · Answer 4

HowardW (7/29/2011)
It's recreated from Model, not Master.
The area I would be looking at is the disk subsystem. Do you run anything like Diskeeper, or run on virtual infrastructure that could block acess to an I/O request for a long period?
If the server hasn't been restarted, it's probably a good first step, followed by disk checking tools, physically moving tempdb somewhere else on disk etc.

I forwarded the question to our sysadmins along with the suggestions. I don't know offhand.

aurato SSCrazy Points: 2575 More actions · Answer 5

aurato (7/29/2011)
Please adjust ~~master~~ model and make sure it is as large as a standard tempdb

No again, you can set the size of tempdb and it'll use that every time and grow if this is not enough.

Here's a sample script you can run right away and the resize will be immediate.

USE [master]

GO

ALTER DATABASE [tempdb] MODIFY FILE ( NAME = N'tempdev', SIZE = 8388608KB )

GO

aurato (7/29/2011)
Please make sure that at least ~~10%~~ 15% (and that is very little space) free space will remain on the drive where tempdb is located after the new size and next reboot.
Please schedule a reboot of <server> for this weekend.

If by schedule he means get back in the office and run that manually then I agree. But NEVER do a restart without being availble in case something goes wrong.

By schedule he means have our sysadmins run it off-hours, watching it the whole time for any issue.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

My main point is don't waste the space on model if not absolutely necessary (especially since you seem short on HD space).

And you don't need a restart to grow the file of tempdb.

That being that won't fix the error and I would too assume HD failure of some sort (sorry nothing more usefull).

If you want to go the option to insert a new drive in the server you can move the tempdv files over there (alter location in sql server, then restart the machine). Make sure the server has access to those folders first or the server won't be able to restart and that's a PITA to fix.

aurato SSCrazy Points: 2575 More actions · Answer 7

Ninja's_RGR'us (7/29/2011)
My main point is don't waste the space on model if not absolutely necessary (especially since you seem short on HD space).
And you don't need a restart to grow the file of tempdb.
That being that won't fix the error and I would too assume HD failure of some sort (sorry nothing more usefull).
If you want to go the option to insert a new drive in the server you can move the tempdv files over there (alter location in sql server, then restart the machine). Make sure the server has access to those folders first or the server won't be able to restart and that's a PITA to fix.

Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 8

No it doesn't

either the server is under immense stress or the hardware is failing.

Can't tell which without lost more investigation...

HowardW One Orange Chip Points: 29109 More actions · Answer 9

aurato (7/29/2011)
Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.

Have you fixed the paging problem? (e.g. locked pages in memory) If not, this would be a likely cause. If you're spilling SQL Server memory to page file, you'll have no end of performance problems and I wouldn't be surprised by this type of message.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

HowardW (7/29/2011)
aurato (7/29/2011)
Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.
Have you fixed the paging problem? (e.g. locked pages in memory) If not, this would be a likely cause. If you're spilling SQL Server memory to page file, you'll have no end of performance problems and I wouldn't be surprised by this type of message.

Got a solid link about this Howard? I'd love to get more info on the topic.

aurato SSCrazy Points: 2575 More actions · Answer 11

HowardW (7/29/2011)
aurato (7/29/2011)
Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.
Have you fixed the paging problem? (e.g. locked pages in memory) If not, this would be a likely cause. If you're spilling SQL Server memory to page file, you'll have no end of performance problems and I wouldn't be surprised by this type of message.

Truth is I don't know. I just knew that weird things were happening with the server in management studio (timeouts when opening a new query, taking 2 minutes to run sp_who2) and my boss looked and found heavy page file use.

I've been collecting perfmon statistics for just about the entire month of July (planning to end the gathering today, wanted a full work-month) with the exception of when we had to reboot and haven't had the chance to glance at them yet.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 12

aurato (7/29/2011)
HowardW (7/29/2011)
aurato (7/29/2011)
Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.
Have you fixed the paging problem? (e.g. locked pages in memory) If not, this would be a likely cause. If you're spilling SQL Server memory to page file, you'll have no end of performance problems and I wouldn't be surprised by this type of message.
Truth is I don't know. I just knew that weird things were happening with the server in management studio (timeouts when opening a new query, taking 2 minutes to run sp_who2) and my boss looked and found heavy page file use.
I've been collecting perfmon statistics for just about the entire month of July (planning to end the gathering today, wanted a full work-month) with the exception of when we had to reboot and haven't had the chance to glance at them yet.

Trace this as well a few times a minute. You might want to include the excution plan as well. That could tell you what queries are killing tempdb (1 single bad query can kill the server).

SELECT

SPID = s.session_id,

s.[host_name],

s.[program_name],

s.status,

s.memory_usage,

granted_memory = CONVERT(INT, r.granted_query_memory*8.00),

t.text,

sourcedb = DB_NAME(r.database_id),

workdb = DB_NAME(dt.database_id),

mg.*,

su.*

FROM sys.dm_exec_sessions s

INNER JOIN sys.dm_db_session_space_usage su

ON s.session_id = su.session_id

AND su.database_id = DB_ID('tempdb')

INNER JOIN sys.dm_exec_connections c

ON s.session_id = c.most_recent_session_id

LEFT OUTER JOIN sys.dm_exec_requests r

ON r.session_id = s.session_id

LEFT OUTER JOIN (

SELECT

session_id,

database_id

FROM sys.dm_tran_session_transactions t

INNER JOIN sys.dm_tran_database_transactions dt

ON t.transaction_id = dt.transaction_id

WHERE dt.database_id = DB_ID('tempdb')

GROUP BY session_id, database_id

) dt

ON s.session_id = dt.session_id

CROSS APPLY sys.dm_exec_sql_text(COALESCE(r.sql_handle,

c.most_recent_sql_handle)) t

LEFT OUTER JOIN sys.dm_exec_query_memory_grants mg

ON s.session_id = mg.session_id

WHERE (r.database_id = DB_ID('tempdb')

OR dt.database_id = DB_ID('tempdb'))

AND s.status = 'running'

AND s.session_id <> @@SPID

ORDER BY SPID;

HowardW One Orange Chip Points: 29109 More actions · Answer 13

Ninja's_RGR'us (7/29/2011)
Got a solid link about this Howard? I'd love to get more info on the topic.

About performance issues with SQL Server and Paging, SQL Server Memory Management or specifically in relation to this error? Don't have anything for the latter, but it would make sense as a symptom.

SQL Server expects the Buffer Cache to be in memory, so I suspect it has much more aggressive timeouts for this than it would do from disk. If part of the Buffer Cache has been paged, as far as SQL Server's concerned, the page is still in memory so access should be fast, but has suddenly moved into the slowest subsystem at the O/S level.

Still doesn't quite explain that it was the same page number twice in a row, but it is something to be fixed (and is quite an easy fix)

aurato SSCrazy Points: 2575 More actions · Answer 14

HowardW (7/29/2011)
Ninja's_RGR'us (7/29/2011)
Got a solid link about this Howard? I'd love to get more info on the topic.
About performance issues with SQL Server and Paging, SQL Server Memory Management or specifically in relation to this error? Don't have anything for the latter, but it would make sense as a symptom.
SQL Server expects the Buffer Cache to be in memory, so I suspect it has much more aggressive timeouts for this than it would do from disk. If part of the Buffer Cache has been paged, as far as SQL Server's concerned, the page is still in memory so access should be fast, but has suddenly moved into the slowest subsystem at the O/S level.
Still doesn't quite explain that it was the same page number twice in a row, but it is something to be fixed (and is quite an easy fix)

Sorry, what's the fix? I believe you said to lock pages in memory, is that some sort of option that I have to set?