January 2, 2009 at 3:05 pm
Hi guys,
We have been experiencing a major production issue, Memory usage on the active node is at 2.6 GB even though there is 8GB installed on each node.
SQL and OS version:
SQL: 2000 ent 32 bit SP4
OS: Windows 2003 enterprise 32bit SP1
Other settings:
1. Awe enabled
2. Max SQL server mem at 6.4 GB
3. Lock Pages granted to SQL service group
whenever this problem occur to fix it i had to failover to node 2........which will show mem usage up to 7.4GB. However after five days it went back down to 2.6 GB bringing SQL to a halt (we then failed back to node 1 and everything seem fine at least for now).
Has anyone experienced this before?
Any solutions to this problem is greatly appreciated
January 2, 2009 at 4:41 pm
possibly this bug?
http://support.microsoft.com/kb/899761
If I remember rightly the original release of SP4 contained this flaw, so it was re-released, so where did you get your SP4 executable from? Presumably you only applied it recently, so is it a copy of SP4 you have had for a while and are you sure its the latest version of it?
You say mem usage starts at 7.4GB even though SQL max memory is set to 6.4GB, so presumably other processes are using some of this. Are you running perfmon to track memory usage of processes on the server to see whats happening?
Are the PAE and 3GB switches set in the boot.ini?
---------------------------------------------------------------------
January 2, 2009 at 5:00 pm
Thanks for ur reply..
i recently inherited this environment 5 months however they had this same prob then.
SP version>> SP4 8.00.2040
The PEA swith is set in the boot.ini file on both servers.
The issue is we are seeing total memory usage at 2.6 GB in task manager, even thought max mem setting is at 6.4 GB.
As for tracking mem usage, i have been doing so by looking at DBCC memorystatus results but i don't see anything that would have indicated memory pressure see below.
(Buffer dist) (Buffers)
Stolen 3616
Free 1369
Procedures14952
Inram 0
Dirty 13809
Kept 0
I/O 0
Latched 184
Other 817053
(Buffer Counts) (VALUE)
Commited 850983
Target 850983
Hashed 831046
InternalReservation731
ExternalReservation356
Min Free 656
Visible 199456
(Dynamic Memory Man) (Buffers)
Stolen 18113
OS Reserved 2488
OS Committed 2466
OS In Use 2452
General 1897
QueryPlan 15037
Optimizer 21
Utilities 94
Connection 2396
ALSO SQL Server: (Total server memory) in perfmon is now showing 6.4 GB
January 3, 2009 at 5:24 am
when using AWE the value shown for sql server memory in task manager is incorrect, use the value from perfmon.
I would also set the /3GB switch in the boot.ini to take maximum advantage of the memory below the 4GB line.
what do you mean when you say sql server comes to a halt, are you sure its memory pressure? Any errors in the log?
track perfmon counters such as buffer cache hit ratio, page faults a second and available memory to show any high level signs of memory problems.
sql server target memory and sql server total memory should also be the same value.
---------------------------------------------------------------------
January 3, 2009 at 6:42 am
The problem occured again this Morning.
I have been using the values from perfmon.........this time it is showing Total Memory at 1.5GB and Target mem at 1.5 GB.
as shown in the results below from DBCC memory Stats, Total SQL mem and Target mem is different from the results shown above.
(Current)
Commited206416 or 1.6GB
Target 206416 or 1.6GB
Hashed 201558
(Previuos)
Buffer Counts) (VALUE)
Commited 850983 or 6.8 GB
Target 850983 or 6.8GB
Buffer cache hit ratio currently: 99.86
Page faults/sec: 1879
Available physical mem: 5.4 GB
As far as the error logs are concerned, i'm seeing a lot of excessive locks\deadlook in the early morning during 3-5 am(occurs everyday).
Also from my understanding the \PEA setting should allow SQL to address memory above 4GB and we have 8 GB installed.
January 3, 2009 at 12:53 pm
How much physical memory do you have on each node of the cluster? I'm assuming they are both configured with the same amount of memory.
Instead of failing the services over, try to simply stop and start them from Cluster administrator the next time you have the issue.
I recently had memory added to a SQL 2000 cluster, and stopping/starting the service after changing the max value was necessary. However, it almost seems as though other processes on the server are getting the memory before SQL Server needs it.
In SQL 2000 with the max memory option set, SQL will only grab that memory when it needs it.
I would try to set the min memory to 2GB and max to 6.4, stop and start SQL SErver, then check the perf mon values when it comes back up.
Do you have other services running on the systems? Any background services?
January 3, 2009 at 12:57 pm
Hmmm, page\sec way too high. This value can be thrown out though by processes such as backing up large files to tape, as these really thrash the memory, so check your average is not being thrown out by periods of intense activity,
This value can also point to a disk bottleneck, so check out those counters too, see
http://www.microsoft.com/technet/prodtechnol/windows2000serv/reskit/core/fnec_evl_qnjo.mspx?mfr=true
Not sure why memory allocated to SQL is decreasing, but some other process must require it. Track process:private bytes and/or process:working set to see if anything is increasing. A slow constant increase could suggest a memory leak. I believe a sql fix between SP4 and rollup 2187 addressed a memory leak issue.
AWE allocation is dynamic in windows 2003 so perhaps attempt to fix the memory allocated to SQL by setting min and max the same. Make sure enough left for OS though, I presume no other apps running on your SQL cluster.
If anyone else out there got suggestions, feel free to chime in.:)
---------------------------------------------------------------------
January 3, 2009 at 1:10 pm
george sibbald (1/3/2009)
Hmmm, page\sec way too high. This value can be thrown out though by processes such as backing up large files to tape, as these really thrash the memory, so check your average is not being thrown out by periods of intense activity,This value can also point to a disk bottleneck, so check out those counters too, see
http://www.microsoft.com/technet/prodtechnol/windows2000serv/reskit/core/fnec_evl_qnjo.mspx?mfr=true
Not sure why memory allocated to SQL is decreasing, but some other process must require it.
I think the memory SQL Server is using is right where it needs to be.
The cache hit ratio is great so SQL is getting what it needs from the cache.
the system looks to be disk bound.
I bet there's high log writes/flushes per second and lots of log waits.
also I/O waits.
Anyway, the trick is to look at waits instead of measures (bad pun, but true).
Craig Outcalt
January 3, 2009 at 1:17 pm
SQLBOT (1/3/2009)
george sibbald (1/3/2009)
Hmmm, page\sec way too high. This value can be thrown out though by processes such as backing up large files to tape, as these really thrash the memory, so check your average is not being thrown out by periods of intense activity,This value can also point to a disk bottleneck, so check out those counters too, see
http://www.microsoft.com/technet/prodtechnol/windows2000serv/reskit/core/fnec_evl_qnjo.mspx?mfr=true
Not sure why memory allocated to SQL is decreasing, but some other process must require it.
I think the memory SQL Server is using is right where it needs to be.
The cache hit ratio is great so SQL is getting what it needs from the cache.
the system looks to be disk bound.
I bet there's high log writes/flushes per second and lots of log waits.
also I/O waits.
Anyway, the trick is to look at waits instead of measures (bad pun, but true).
yea, I guess we need denby to confirm what he means by SQL 'comes to a halt'. Does it just get really slow or is it not possible to log in at all(via anything including EM, QA)
---------------------------------------------------------------------
January 3, 2009 at 2:21 pm
thanks again for the response.
(SQL coming to a Halt)>> Users complain of slow performances when this issue occurs and jobs timing out.
i also noticed this error in the event log:
Error 1229: Process ID %d:%d owns resources that are blocking processes on scheduler %d.
the above error poped up in event log for every 5 seconds for 3 hrs.
i tried something different this time, as SK suggested, i had went ahead and change min mem setting from default of 0 to 4Gb (i did this thinking SQL will throw an exeption error not enough memory available should this problem occur again) and restarted SQL service. Memory usage was back to 6.5GB(all is good, at least for now).
I also setup a trace log on memory counters to monitor memory usage and hopefully to see at what time does memory usage starts to fall.
Other services: OLAP and Report services
January 3, 2009 at 3:18 pm
sounds like a server under stress problem caused by the app. This may help
http://support.microsoft.com/default.aspx?scid=kb;en-us;319892
next time it happens run sp_who2 to get info on processes running, I would expect there to be blocking going on (if a number of users complain about slowness and time outs occur, blocking is a frequent cause). A quick fix would be to kill the spid at the head of the blocking chain, this should correspond to the spid mentioned in the 1229 error message.
To get more info on this process, use dbcc inputbuffer, sp_locks and profiler and various selects from sysprocesses with order by to see whats going on. As SQLBOT suggested ordering by waittypes could help identify any bottlenecks.
---------------------------------------------------------------------
January 3, 2009 at 3:38 pm
Thanks for the info george i'll start looking into this.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply