July 29, 2011 at 6:36 am
Three separate processes have failed in the last two days with very similar errors:
Time-out occurred while waiting for buffer latch type 4 for page (1:1189832),
database ID 2.
Time-out occurred while waiting for buffer latch type 4 for page (1:1189827),
database ID 2.
Time-out occurred while waiting for buffer latch type 4 for page (1:1189827),
database ID 2.
Two of them are exactly the same, the other is accessing a slightly different pagenum. Database #2 is tempdb. I tried to run DBCC PAGE(tempdb, 1, 1189827, 1) but the output is just gibberish to me. I'm fairly lost.
Does anyone know what the root cause of these timeouts might be? (I've seen dozens of suggestions on other forums, among which was memory corruption)
Any help would be greatly appreciated. If you would like to read the DBCC PAGE output, ask and I will post it.
July 29, 2011 at 6:40 am
Link to serverfault question:
Reply email from my boss:
TempDB is deleted and automatically created each time the system is restarted.
When it is created, it is created as a copy of Master.
Please adjust master and make sure it is as large as a standard tempdb
Please make sure that at least 10% free space will remain on the drive where tempdb is located after the new size and next reboot.
Please schedule a reboot of <server> for this weekend.
July 29, 2011 at 6:49 am
It's recreated from Model, not Master.
The area I would be looking at is the disk subsystem. Do you run anything like Diskeeper, or run on virtual infrastructure that could block acess to an I/O request for a long period?
If the server hasn't been restarted, it's probably a good first step, followed by disk checking tools, physically moving tempdb somewhere else on disk etc.
July 29, 2011 at 6:50 am
aurato (7/29/2011)
Link to serverfault question:Reply email from my boss:
TempDB is deleted and automatically created each time the system is restarted.
Correct so far (didn't read the link)
aurato (7/29/2011)
When it is created, it is created as a copy of Master.
No it is built from model (all databases are). That's where you put logins, objects that you automatically want to have created in all new dbs.
aurato (7/29/2011)
Please adjustmastermodel and make sure it is as large as a standard tempdb
No again, you can set the size of tempdb and it'll use that every time and grow if this is not enough.
Here's a sample script you can run right away and the resize will be immediate.
USE [master]
GO
ALTER DATABASE [tempdb] MODIFY FILE ( NAME = N'tempdev', SIZE = 8388608KB )
GO
aurato (7/29/2011)
Please make sure that at least10%15% (and that is very little space) free space will remain on the drive where tempdb is located after the new size and next reboot.Please schedule a reboot of <server> for this weekend.
If by schedule he means get back in the office and run that manually then I agree. But NEVER do a restart without being availble in case something goes wrong.
July 29, 2011 at 6:53 am
HowardW (7/29/2011)
It's recreated from Model, not Master.The area I would be looking at is the disk subsystem. Do you run anything like Diskeeper, or run on virtual infrastructure that could block acess to an I/O request for a long period?
If the server hasn't been restarted, it's probably a good first step, followed by disk checking tools, physically moving tempdb somewhere else on disk etc.
I forwarded the question to our sysadmins along with the suggestions. I don't know offhand.
July 29, 2011 at 6:56 am
aurato (7/29/2011)
Please adjustmastermodel and make sure it is as large as a standard tempdb
No again, you can set the size of tempdb and it'll use that every time and grow if this is not enough.
Here's a sample script you can run right away and the resize will be immediate.
USE [master]
GO
ALTER DATABASE [tempdb] MODIFY FILE ( NAME = N'tempdev', SIZE = 8388608KB )
GO
aurato (7/29/2011)
Please make sure that at least10%15% (and that is very little space) free space will remain on the drive where tempdb is located after the new size and next reboot.Please schedule a reboot of <server> for this weekend.
If by schedule he means get back in the office and run that manually then I agree. But NEVER do a restart without being availble in case something goes wrong.
By schedule he means have our sysadmins run it off-hours, watching it the whole time for any issue.
July 29, 2011 at 7:02 am
My main point is don't waste the space on model if not absolutely necessary (especially since you seem short on HD space).
And you don't need a restart to grow the file of tempdb.
That being that won't fix the error and I would too assume HD failure of some sort (sorry nothing more usefull).
If you want to go the option to insert a new drive in the server you can move the tempdv files over there (alter location in sql server, then restart the machine). Make sure the server has access to those folders first or the server won't be able to restart and that's a PITA to fix.
July 29, 2011 at 7:06 am
Ninja's_RGR'us (7/29/2011)
My main point is don't waste the space on model if not absolutely necessary (especially since you seem short on HD space).And you don't need a restart to grow the file of tempdb.
That being that won't fix the error and I would too assume HD failure of some sort (sorry nothing more usefull).
If you want to go the option to insert a new drive in the server you can move the tempdv files over there (alter location in sql server, then restart the machine). Make sure the server has access to those folders first or the server won't be able to restart and that's a PITA to fix.
Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.
July 29, 2011 at 7:09 am
No it doesn't
either the server is under immense stress or the hardware is failing.
Can't tell which without lost more investigation...
July 29, 2011 at 7:09 am
aurato (7/29/2011)
Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.
Have you fixed the paging problem? (e.g. locked pages in memory) If not, this would be a likely cause. If you're spilling SQL Server memory to page file, you'll have no end of performance problems and I wouldn't be surprised by this type of message.
July 29, 2011 at 7:13 am
HowardW (7/29/2011)
aurato (7/29/2011)
Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.Have you fixed the paging problem? (e.g. locked pages in memory) If not, this would be a likely cause. If you're spilling SQL Server memory to page file, you'll have no end of performance problems and I wouldn't be surprised by this type of message.
Got a solid link about this Howard? I'd love to get more info on the topic.
July 29, 2011 at 7:14 am
HowardW (7/29/2011)
aurato (7/29/2011)
Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.Have you fixed the paging problem? (e.g. locked pages in memory) If not, this would be a likely cause. If you're spilling SQL Server memory to page file, you'll have no end of performance problems and I wouldn't be surprised by this type of message.
Truth is I don't know. I just knew that weird things were happening with the server in management studio (timeouts when opening a new query, taking 2 minutes to run sp_who2) and my boss looked and found heavy page file use.
I've been collecting perfmon statistics for just about the entire month of July (planning to end the gathering today, wanted a full work-month) with the exception of when we had to reboot and haven't had the chance to glance at them yet.
July 29, 2011 at 7:25 am
aurato (7/29/2011)
HowardW (7/29/2011)
aurato (7/29/2011)
Yeah the reason I posted the email was to see if you all here disagreed with the recommendation. I read it and wasn't convinced it was addressing the problem. The server was actually rebooted 2 days ago (because it was going to the page file like crazy) and then the problems began.Have you fixed the paging problem? (e.g. locked pages in memory) If not, this would be a likely cause. If you're spilling SQL Server memory to page file, you'll have no end of performance problems and I wouldn't be surprised by this type of message.
Truth is I don't know. I just knew that weird things were happening with the server in management studio (timeouts when opening a new query, taking 2 minutes to run sp_who2) and my boss looked and found heavy page file use.
I've been collecting perfmon statistics for just about the entire month of July (planning to end the gathering today, wanted a full work-month) with the exception of when we had to reboot and haven't had the chance to glance at them yet.
Trace this as well a few times a minute. You might want to include the excution plan as well. That could tell you what queries are killing tempdb (1 single bad query can kill the server).
SELECT
SPID = s.session_id,
s.[host_name],
s.[program_name],
s.status,
s.memory_usage,
granted_memory = CONVERT(INT, r.granted_query_memory*8.00),
t.text,
sourcedb = DB_NAME(r.database_id),
workdb = DB_NAME(dt.database_id),
mg.*,
su.*
FROM sys.dm_exec_sessions s
INNER JOIN sys.dm_db_session_space_usage su
ON s.session_id = su.session_id
AND su.database_id = DB_ID('tempdb')
INNER JOIN sys.dm_exec_connections c
ON s.session_id = c.most_recent_session_id
LEFT OUTER JOIN sys.dm_exec_requests r
ON r.session_id = s.session_id
LEFT OUTER JOIN (
SELECT
session_id,
database_id
FROM sys.dm_tran_session_transactions t
INNER JOIN sys.dm_tran_database_transactions dt
ON t.transaction_id = dt.transaction_id
WHERE dt.database_id = DB_ID('tempdb')
GROUP BY session_id, database_id
) dt
ON s.session_id = dt.session_id
CROSS APPLY sys.dm_exec_sql_text(COALESCE(r.sql_handle,
c.most_recent_sql_handle)) t
LEFT OUTER JOIN sys.dm_exec_query_memory_grants mg
ON s.session_id = mg.session_id
WHERE (r.database_id = DB_ID('tempdb')
OR dt.database_id = DB_ID('tempdb'))
AND s.status = 'running'
AND s.session_id <> @@SPID
ORDER BY SPID;
July 29, 2011 at 7:27 am
Ninja's_RGR'us (7/29/2011)
Got a solid link about this Howard? I'd love to get more info on the topic.
About performance issues with SQL Server and Paging, SQL Server Memory Management or specifically in relation to this error? Don't have anything for the latter, but it would make sense as a symptom.
SQL Server expects the Buffer Cache to be in memory, so I suspect it has much more aggressive timeouts for this than it would do from disk. If part of the Buffer Cache has been paged, as far as SQL Server's concerned, the page is still in memory so access should be fast, but has suddenly moved into the slowest subsystem at the O/S level.
Still doesn't quite explain that it was the same page number twice in a row, but it is something to be fixed (and is quite an easy fix)
July 29, 2011 at 7:31 am
HowardW (7/29/2011)
Ninja's_RGR'us (7/29/2011)
Got a solid link about this Howard? I'd love to get more info on the topic.About performance issues with SQL Server and Paging, SQL Server Memory Management or specifically in relation to this error? Don't have anything for the latter, but it would make sense as a symptom.
SQL Server expects the Buffer Cache to be in memory, so I suspect it has much more aggressive timeouts for this than it would do from disk. If part of the Buffer Cache has been paged, as far as SQL Server's concerned, the page is still in memory so access should be fast, but has suddenly moved into the slowest subsystem at the O/S level.
Still doesn't quite explain that it was the same page number twice in a row, but it is something to be fixed (and is quite an easy fix)
Sorry, what's the fix? I believe you said to lock pages in memory, is that some sort of option that I have to set?
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply