May 26, 2009 at 12:54 pm
I'm getting a SQL Error 845 error with a heavy-weight SSIS (via SQL Agent Job) with lots of data, lots of I/O, lots of Calcs, etc. The error most frequently happens after a server reboot (hardware changes, software updates, etc.). After multiple retries, eventually the job completes....and over time, the problem occurs less and less until not at all (at least until the next reboot). I don't know what the correlation is between the frequency of the error and the reboots, but I started wondering if maybe the size of TempDB has something to do with it. Particularly since TempDB's size is reset after a reboot...and then the observation that my job actually made more progress after the TempDB's size increased (due to a maint plan job that ran after hours) recently.
Then I read this: http://bytes.com/groups/ms-sql/78442-error-845-buffer-latch-timeout
Anyone know if there could be a correlation? I'm hesitant to take the above post as much more than anecdotal evidence since I've seen very little documentation to support the claim/correlation, but I'm running out of ideas.
Microsoft SQL Server 2005 - 9.00.3077.00 (Intel X86)
SAN (that's all I know about our disk/storage structure...that and it's relatively new)
May 26, 2009 at 12:59 pm
What's the full message of the error?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 26, 2009 at 1:33 pm
Not sure if this helps, but....
The job history doesn't show anything useful....just a line similar to:
Executed as user: CABLE\!SrvSqlAgent. ...0.3042.00 for 32-bit Copyright (C) Microsoft Corp 1984-2005. All rights reserved. Started: 9:01:51 AM Error: 2009-05-26 09:13:29.87 Code: 0xC002F210 Source: DailyProcessAreaStats Description: Executing the query "EXECUTE sprocDailyProcessAreaStats...(an exceprt of the SQL in the sproc)...The package execution fa... The step failed."
The SQL Server Log is a little more informative, but not really saying much more than that there seems to be a lot of I/O Stress followed by SQL Error 845:
05/26/2009 09:38:11,spid10s,Unknown,Error: 845 Severity: 17 State: 1.
05/26/2009 09:38:11,spid10s,Unknown,A time-out occurred while waiting for buffer latch -- type 3 bp 03B54530 page 1:788941 stat 0x4c0040d database id: 16 allocation unit Id: 5080394837374009344 task 0x00BD4208 : 0 waittime 300 flags 0x4003a owning task 0x00C6BA68. Not continuing to wait.
05/26/2009 09:37:47,spid117,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 03B54530 page 1:788941 stat 0x4c0040d database id: 16 allocation unit Id: 5080394837374009344 task 0x00C6BA68 : 0 waittime 300 flags 0x4001a owning task 0x00C6BA68. Not continuing to wait.
05/26/2009 09:36:14,spid2s,Unknown,SQL Server has encountered 60 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [J:\MSSQL DB\STAT_AREA1.MDF] in database [STAT_AREA1] (7). The OS file handle is 0x000008A8. The offset of the latest long I/O is: 0x000012de16a000
05/26/2009 09:35:53,spid65,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036D2D5C page 1:870238 stat 0x1c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00D973D8 : 4 waittime 300 flags 0x3a owning task 0x00929978. Not continuing to wait.
05/26/2009 09:35:53,spid65,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036C9808 page 1:870046 stat 0x7c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00928988 : 2 waittime 300 flags 0x3a owning task 0x00DAC3E8. Not continuing to wait.
05/26/2009 09:35:53,spid65,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 03667A50 page 1:870110 stat 0xc0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00D803E8 : 1 waittime 300 flags 0x3a owning task 0x0089B6A8. Not continuing to wait.
05/26/2009 09:35:53,spid65,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036D01B8 page 1:869854 stat 0x5c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00BD5018 : 8 waittime 300 flags 0x3a owning task 0x00D81E28. Not continuing to wait.
05/26/2009 09:35:53,spid65,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036D619C page 1:870174 stat 0x3c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x0089B978 : 7 waittime 300 flags 0x3a owning task 0x00BD4898. Not continuing to wait.
05/26/2009 09:35:53,spid65,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036DA500 page 1:869918 stat 0x6c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00C6A7A8 : 3 waittime 300 flags 0x3a owning task 0x00D97C48. Not continuing to wait.
05/26/2009 09:35:53,spid65,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036BC3C4 page 1:869982 stat 0x4c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x0092E7A8 : 5 waittime 300 flags 0x3a owning task 0x00C6B798. Not continuing to wait.
05/26/2009 09:35:02,spid2s,Unknown,SQL Server has encountered 2 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [J:\MSSQL DB\STAT_AREA2.mdf] in database [STAT_AREA2] (12). The OS file handle is 0x00000964. The offset of the latest long I/O is: 0x0000001a216000
05/26/2009 09:34:41,spid113,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036C9808 page 1:870046 stat 0x7c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x55354D48 : 5 waittime 300 flags 0x3a owning task 0x00DAC3E8. Not continuing to wait.
05/26/2009 09:34:41,spid113,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 03667A50 page 1:870110 stat 0xc0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00BD45C8 : 1 waittime 300 flags 0x3a owning task 0x0089B6A8. Not continuing to wait.
05/26/2009 09:34:41,spid113,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036D01B8 page 1:869854 stat 0x5c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00929888 : 2 waittime 300 flags 0x3a owning task 0x00D81E28. Not continuing to wait.
05/26/2009 09:34:41,spid113,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036BC3C4 page 1:869982 stat 0x4c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00D97B58 : 4 waittime 300 flags 0x3a owning task 0x00C6B798. Not continuing to wait.
05/26/2009 09:34:41,spid113,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036D2D5C page 1:870238 stat 0x1c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x0092F1F8 : 7 waittime 300 flags 0x3a owning task 0x00929978. Not continuing to wait.
05/26/2009 09:34:41,spid113,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036D619C page 1:870174 stat 0x3c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x0089AB68 : 8 waittime 300 flags 0x3a owning task 0x00BD4898. Not continuing to wait.
05/26/2009 09:34:41,spid113,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036DA500 page 1:869918 stat 0x6c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00C6A3E8 : 3 waittime 300 flags 0x3a owning task 0x00D97C48. Not continuing to wait.
05/26/2009 09:34:01,spid71,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036D619C page 1:870174 stat 0x3c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00929798 : 13 waittime 300 flags 0x3a owning task 0x00BD4898. Not continuing to wait.
05/26/2009 09:34:01,spid71,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036DA500 page 1:869918 stat 0x6c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00BD53D8 : 16 waittime 300 flags 0x3a owning task 0x00D97C48. Not continuing to wait.
05/26/2009 09:34:01,spid71,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 03667A50 page 1:870110 stat 0xc0040d database id: 15 allocation unit Id: 72057594512605184 task 0x0092E6B8 : 12 waittime 300 flags 0x3a owning task 0x0089B6A8. Not continuing to wait.
05/26/2009 09:34:01,spid71,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036D01B8 page 1:869854 stat 0x5c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x0089A3E8 : 11 waittime 300 flags 0x3a owning task 0x00D81E28. Not continuing to wait.
05/26/2009 09:34:01,spid71,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036C9808 page 1:870046 stat 0x7c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00C6B018 : 10 waittime 300 flags 0x3a owning task 0x00DAC3E8. Not continuing to wait.
05/26/2009 09:34:01,spid71,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036D2D5C page 1:870238 stat 0x1c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00DAD1F8 : 9 waittime 300 flags 0x3a owning task 0x00929978. Not continuing to wait.
05/26/2009 09:34:01,spid71,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036BC3C4 page 1:869982 stat 0x4c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00D96D48 : 14 waittime 300 flags 0x3a owning task 0x00C6B798. Not continuing to wait.
05/26/2009 09:33:59,spid124,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036D01B8 page 1:869854 stat 0x5c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x00BD5978 : 7 waittime 300 flags 0x3a owning task 0x00D81E28. Not continuing to wait.
05/26/2009 09:33:59,spid124,Unknown,A time-out occurred while waiting for buffer latch -- type 2 bp 036DA500 page 1:869918 stat 0x6c0040d database id: 15 allocation unit Id: 72057594512605184 task 0x009294C8 : 5 waittime 300 flags 0x3a owning task 0x00D97C48. Not continuing to wait.
05/26/2009 09:33:11,spid10s,Unknown,Time-out occurred while waiting for buffer latch type 3 for page (1:870110) database ID 15.
The only other thing I could add is that database ID 15 is one of the heaviest hit databases on the server, followed by database ID 7.
May 26, 2009 at 1:43 pm
Probably IO contention.
What's the layout of databases on LUNs?
What's the RAID level on those LUNs
Are the dedicated or shared LUNs?
What do the perfmon counters Physical Disk: Avg sec/read, Avg sec/write and % idle time look like for the drives with database files?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 26, 2009 at 1:48 pm
I checked the timing of the last failed attempt of the job and found the following in the SQL Server Log:
A time-out occurred while waiting for buffer latch -- type 2, bp 03B59D5C, page 1:4779903, stat 0x1c0040d, database id: 15, allocation unit Id: 72057594569687040, task 0x00928B68 : 9, waittime 300, flags 0x1a, owning task 0x00928B68. Not continuing to wait.
So I'm surprised I don't see an actual entry for SQL Error 845 at the same time as the job failure.
BTW: The job history entry for the error was the same as previously reported.
May 26, 2009 at 1:53 pm
What's the layout of databases on LUNs?
The MDFs are all on "J". The LDFs are all on "P"...for the most part. There is plenty of space on those "drives".
What's the RAID level on those LUNs
Not sure. I'll ask our server team.
Are the dedicated or shared LUNs?
What do the perfmon counters Physical Disk: Avg sec/read, Avg sec/write and % idle time look like for the drives with database files?
ADeveloperTurnedDBASaysWhat? 😀 I'll ask our server guys about this and post a reply.
May 26, 2009 at 2:00 pm
Uncle Moki (5/26/2009)
What's the layout of databases on LUNs?
The MDFs are all on "J". The LDFs are all on "P"...for the most part.
All of them? The heavily used and lightly used together?
Where's TempDB?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 26, 2009 at 2:00 pm
Also post back the layout of tempdb files? # data files, # log files.
Having two separate LUNS from the SAN isn't necessarily having 2 separate sets of spindles. If you consider for example that depending on the SAN, you can add storage in 6-8 drive sets, and you could be adding 300GB drives, depending on your RAID levels and configuration, usually 6, or 5 based on experience, that set of spindles can be roughly 2TB of space. From that multiple LUN's are carved and if you get 2 LUNS for SQL from the same set of disks, you aren't benefiting to much from having separate LUN's. That is why Gail asks specifics on configuration. Are the LUNS on dedicated disks, or are they shared as in the configuration I describe above? One of the unfortunate sides of SAN's is that they are often built for capacity and not speed. Ask your SAN admin the size of the physical disks that are in the array? If they are larger in size, you get less disks which means lower IOPS when compared with a similarly sized array made up of lots of smaller disks.+
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
May 26, 2009 at 2:26 pm
GilaMonster (5/26/2009)
All of them? The heavily used and lightly used together?Where's TempDB?
With the exception of MASTER and MODEL, all DB data files, including TempDB, are on one drive (J). All log files are on another drive (P). MASTER and MODEL have their MDF and LDF files on one drive as well (C).
Yes, some are very heavily used DB's while others are not. I see DB's for ReportingServer (SSRS, I presume) and VMWare (not sure what a VMWare DB is for) as well so it looks like everything is on the same drive. I'm guessing that's not a good idea? In our server team's defense, this server has evolved into a beast in a very short time.
In all, I see about 20 databases (although more may be hidden from me).
Jonathan Kehayias (5/26/2009)
Also post back the layout of tempdb files? # data files, # log files
From what I can see, tempdb is one data file and one log file (see above).
May 26, 2009 at 2:31 pm
To add to what Jonathan has stated - find out if those LUN's are being presented from shared volumes. In other words, are the volumes (and disks) dedicated to SQL Server or are you sharing with the corporate file shares.
If you are sharing the storage - that is going to cause additional (and unpredictable) issues.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 26, 2009 at 2:48 pm
Uncle Moki (5/26/2009)
I'm guessing that's not a good idea?
That's an understatement. With all the data files on one drive, all of the DBs are fighting over what's probably quite limited IO 'bandwidth'. Same with the logs.
I need to see the perfmon counters to be sure, but I would guess that you need to spread that IO load out over multiple LUNs of the SAN (preferably dedicated LUNs)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 26, 2009 at 2:52 pm
Jeffrey Williams (5/26/2009)
To add to what Jonathan has stated - find out if those LUN's are being presented from shared volumes. In other words, are the volumes (and disks) dedicated to SQL Server or are you sharing with the corporate file shares.If you are sharing the storage - that is going to cause additional (and unpredictable) issues.
Sorry, I forget my sys admin terminology as far as "volumes" from "partitions" from "drives", nor do I have access to the server to see the physical drive configuration. However, I do know that the "J-drive" is dedicated to the SQL Data files and the "P-drive" is for the SQL Log files. Physically, I don't know if J and P are on the same physical drive (I doubt it) or if they span multiple drives or what kind of RAID is set-up or how many other partitions/volumes are on the physical drive.
Judging by the the direction that this thread is taking, should I gather that the size of TempDB is not a factor with my failed jobs and buffer latch errors?
May 26, 2009 at 2:57 pm
Uncle Moki (5/26/2009)
However, I do know that the "J-drive" is dedicated to the SQL Data files and the "P-drive" is for the SQL Log files. Physically, I don't know if J and P are on the same physical drive (I doubt it) or if they span multiple drives or what kind of RAID is set-up or how many other partitions/volumes are on the physical drive.
J and P may be dedicated to SQL, but that's no guarantee that the underlying disk in the SAN are dedicated to those LUNs or if they're shared and possibly even shared with other LUNs used by other servers.
Ask your storage engineer (whoever controls the SAN) what the RAID level is and whether the LUNs are shared or dedicated.
And see if you can get those perfmon traces run. You'll have to ask the server admin if you don't have access to the server.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
May 26, 2009 at 2:58 pm
GilaMonster (5/26/2009)
That's an understatement. With all the data files on one drive, all of the DBs are fighting over what's probably quite limited IO 'bandwidth'. Same with the logs.I need to see the perfmon counters to be sure, but I would guess that you need to spread that IO load out over multiple LUNs of the SAN (preferably dedicated LUNs)
Hopefully they'll feel comfortable sending me that info. I'll be sure and post it (no, I really will...I promise...if I get the info).
So, I'm guessing that increasing the size of TempDB is not going to help? I was hoping it was as easy as that. Although I guess relocating the MDF and LDF files isn't a big deal either....if they have have the resources.
May 26, 2009 at 3:03 pm
Uncle Moki (5/26/2009)
So, I'm guessing that increasing the size of TempDB is not going to help?
Probably not, but make sure that TempDB's initial size is set to something reasonable, based on how large it can get to during normal operation.
eg, if TempDB regularly reaches 2GB, a starting size of 2MB with a growth of 10% is not a particularly good idea. Rather set the starting size to 2GB.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 15 (of 32 total)
You must be logged in to reply to this topic. Login to reply