Time-out occurred while waiting for buffer latch type 4; tempdb (also posted on ServerFault)

  • Three separate processes have failed in the last two days with very similar errors:

    Time-out occurred while waiting for buffer latch type 4 for page (1:1189832),

    database ID 2.

    Time-out occurred while waiting for buffer latch type 4 for page (1:1189827),

    database ID 2.

    Time-out occurred while waiting for buffer latch type 4 for page (1:1189827),

    database ID 2.

    Two of them are exactly the same, the other is accessing a slightly different pagenum. Database #2 is tempdb. I tried to run DBCC PAGE(tempdb, 1, 1189827, 1) but the output is just gibberish to me. I'm fairly lost.

    Does anyone know what the root cause of these timeouts might be? (I've seen dozens of suggestions on other forums, among which was memory corruption)

    Any help would be greatly appreciated. If you would like to read the DBCC PAGE output, ask and I will post it.

  • Link to serverfault question:

    http://serverfault.com/questions/295479/mssql-2005-tempdb-time-out-occurred-while-waiting-for-buffer-latch-type-4-pagenu

    Reply email from my boss:

    TempDB is deleted and automatically created each time the system is restarted.

    When it is created, it is created as a copy of Master.

    Please adjust master and make sure it is as large as a standard tempdb

    Please make sure that at least 10% free space will remain on the drive where tempdb is located after the new size and next reboot.

    Please schedule a reboot of <server> for this weekend.

  • It's recreated from Model, not Master.

    The area I would be looking at is the disk subsystem. Do you run anything like Diskeeper, or run on virtual infrastructure that could block acess to an I/O request for a long period?

    If the server hasn't been restarted, it's probably a good first step, followed by disk checking tools, physically moving tempdb somewhere else on disk etc.

  • aurato (7/29/2011)


    Link to serverfault question:

    http://serverfault.com/questions/295479/mssql-2005-tempdb-time-out-occurred-while-waiting-for-buffer-latch-type-4-pagenu

    Reply email from my boss:

    TempDB is deleted and automatically created each time the system is restarted.

    Correct so far (didn't read the link)

    aurato (7/29/2011)


    When it is created, it is created as a copy of Master.

    No it is built from model (all databases are). That's where you put logins, objects that you automatically want to have created in all new dbs.

    aurato (7/29/2011)


    Please adjust master model and make sure it is as large as a standard tempdb

    No again, you can set the size of tempdb and it'll use that every time and grow if this is not enough.

    Here's a sample script you can run right away and the resize will be immediate.

    USE [master]

    GO

    ALTER DATABASE [tempdb] MODIFY FILE ( NAME = N'tempdev', SIZE = 8388608KB )

    GO

    aurato (7/29/2011)


    Please make sure that at least 10% 15% (and that is very little space) free space will remain on the drive where tempdb is located after the new size and next reboot.

    Please schedule a reboot of <server> for this weekend.

    If by schedule he means get back in the office and run that manually then I agree. But NEVER do a restart without being availble in case something goes wrong.

  • HowardW (7/29/2011)


    It's recreated from Model, not Master.

    The area I would be looking at is the disk subsystem. Do you run anything like Diskeeper, or run on virtual infrastructure that could block acess to an I/O request for a long period?

    If the server hasn't been restarted, it's probably a good first step, followed by disk checking tools, physically moving tempdb somewhere else on disk etc.

    I forwarded the question to our sysadmins along with the suggestions. I don't know offhand.

  • aurato (7/29/2011)


    Please adjust master model and make sure it is as large as a standard tempdb

    No again, you can set the size of tempdb and it'll use that every time and grow if this is not enough.

    Here's a sample script you can run right away and the resize will be immediate.

    USE [master]

    GO

    ALTER DATABASE [tempdb] MODIFY FILE ( NAME = N'tempdev', SIZE = 8388608KB )

    GO

    aurato (7/29/2011)


    Please make sure that at least 10% 15% (and that is very little space) free space will remain on the drive where tempdb is located after the new size and next reboot.

    Please schedule a reboot of <server> for this weekend.

    If by schedule he means get back in the office and run that manually then I agree. But NEVER do a restart without being availble in case something goes wrong.

    By schedule he means have our sysadmins run it off-hours, watching it the whole time for any issue.

  • My main point is don't waste the space on model if not absolutely necessary (especially since you seem short on HD space).

    And you don't need a restart to grow the file of tempdb.

    That being that won't fix the error and I would too assume HD failure of some sort (sorry nothing more usefull).

    If you want to go the option to insert a new drive in the server you can move the tempdv files over there (alter location in sql server, then restart the machine). Make sure the server has access to those folders first or the server won't be able to restart and that's a PITA to fix.

  • Ninja's_RGR'us (7/29/2011)


    My main point is don't waste the space on model if not absolutely necessary (especially since you seem short on HD space).

    And you don't need a restart to grow the file of tempdb.

    That being that won't fix the error and I would too assume HD failure of some sort (sorry nothing more usefull).

    If you want to go the option to insert a new drive in the server you can move the tempdv files over there (alter location in sql server, then restart the machine). Make sure the server has access to those folders first or the server won't be able to restart and that's a PITA to fix.

    Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.

  • No it doesn't

    either the server is under immense stress or the hardware is failing.

    Can't tell which without lost more investigation...

  • aurato (7/29/2011)


    Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.

    Have you fixed the paging problem? (e.g. locked pages in memory) If not, this would be a likely cause. If you're spilling SQL Server memory to page file, you'll have no end of performance problems and I wouldn't be surprised by this type of message.

  • HowardW (7/29/2011)


    aurato (7/29/2011)


    Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.

    Have you fixed the paging problem? (e.g. locked pages in memory) If not, this would be a likely cause. If you're spilling SQL Server memory to page file, you'll have no end of performance problems and I wouldn't be surprised by this type of message.

    Got a solid link about this Howard? I'd love to get more info on the topic.

  • HowardW (7/29/2011)


    aurato (7/29/2011)


    Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.

    Have you fixed the paging problem? (e.g. locked pages in memory) If not, this would be a likely cause. If you're spilling SQL Server memory to page file, you'll have no end of performance problems and I wouldn't be surprised by this type of message.

    Truth is I don't know. I just knew that weird things were happening with the server in management studio (timeouts when opening a new query, taking 2 minutes to run sp_who2) and my boss looked and found heavy page file use.

    I've been collecting perfmon statistics for just about the entire month of July (planning to end the gathering today, wanted a full work-month) with the exception of when we had to reboot and haven't had the chance to glance at them yet.

  • aurato (7/29/2011)


    HowardW (7/29/2011)


    aurato (7/29/2011)


    Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.

    Have you fixed the paging problem? (e.g. locked pages in memory) If not, this would be a likely cause. If you're spilling SQL Server memory to page file, you'll have no end of performance problems and I wouldn't be surprised by this type of message.

    Truth is I don't know. I just knew that weird things were happening with the server in management studio (timeouts when opening a new query, taking 2 minutes to run sp_who2) and my boss looked and found heavy page file use.

    I've been collecting perfmon statistics for just about the entire month of July (planning to end the gathering today, wanted a full work-month) with the exception of when we had to reboot and haven't had the chance to glance at them yet.

    Trace this as well a few times a minute. You might want to include the excution plan as well. That could tell you what queries are killing tempdb (1 single bad query can kill the server).

    SELECT

    SPID = s.session_id,

    s.[host_name],

    s.[program_name],

    s.status,

    s.memory_usage,

    granted_memory = CONVERT(INT, r.granted_query_memory*8.00),

    t.text,

    sourcedb = DB_NAME(r.database_id),

    workdb = DB_NAME(dt.database_id),

    mg.*,

    su.*

    FROM sys.dm_exec_sessions s

    INNER JOIN sys.dm_db_session_space_usage su

    ON s.session_id = su.session_id

    AND su.database_id = DB_ID('tempdb')

    INNER JOIN sys.dm_exec_connections c

    ON s.session_id = c.most_recent_session_id

    LEFT OUTER JOIN sys.dm_exec_requests r

    ON r.session_id = s.session_id

    LEFT OUTER JOIN (

    SELECT

    session_id,

    database_id

    FROM sys.dm_tran_session_transactions t

    INNER JOIN sys.dm_tran_database_transactions dt

    ON t.transaction_id = dt.transaction_id

    WHERE dt.database_id = DB_ID('tempdb')

    GROUP BY session_id, database_id

    ) dt

    ON s.session_id = dt.session_id

    CROSS APPLY sys.dm_exec_sql_text(COALESCE(r.sql_handle,

    c.most_recent_sql_handle)) t

    LEFT OUTER JOIN sys.dm_exec_query_memory_grants mg

    ON s.session_id = mg.session_id

    WHERE (r.database_id = DB_ID('tempdb')

    OR dt.database_id = DB_ID('tempdb'))

    AND s.status = 'running'

    AND s.session_id <> @@SPID

    ORDER BY SPID;

  • Ninja's_RGR'us (7/29/2011)


    Got a solid link about this Howard? I'd love to get more info on the topic.

    About performance issues with SQL Server and Paging, SQL Server Memory Management or specifically in relation to this error? Don't have anything for the latter, but it would make sense as a symptom.

    SQL Server expects the Buffer Cache to be in memory, so I suspect it has much more aggressive timeouts for this than it would do from disk. If part of the Buffer Cache has been paged, as far as SQL Server's concerned, the page is still in memory so access should be fast, but has suddenly moved into the slowest subsystem at the O/S level.

    Still doesn't quite explain that it was the same page number twice in a row, but it is something to be fixed (and is quite an easy fix)

  • HowardW (7/29/2011)


    Ninja's_RGR'us (7/29/2011)


    Got a solid link about this Howard? I'd love to get more info on the topic.

    About performance issues with SQL Server and Paging, SQL Server Memory Management or specifically in relation to this error? Don't have anything for the latter, but it would make sense as a symptom.

    SQL Server expects the Buffer Cache to be in memory, so I suspect it has much more aggressive timeouts for this than it would do from disk. If part of the Buffer Cache has been paged, as far as SQL Server's concerned, the page is still in memory so access should be fast, but has suddenly moved into the slowest subsystem at the O/S level.

    Still doesn't quite explain that it was the same page number twice in a row, but it is something to be fixed (and is quite an easy fix)

    Sorry, what's the fix? I believe you said to lock pages in memory, is that some sort of option that I have to set?

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply