August 21, 2009 at 11:13 am
This is a tricky one. We have a 32 bit 2008 Enterprise server that is having a consistent problem about once every two weeks. Here's the scenario, the server behaves well and performs beautifully. CPU utilization is around 15%, procs run fast, hardly any recompiles. Then out of the blue, for reasons I do not understand, everything goes to hell. The CPU spikes to 100%, processor queue gets backed up and every proc is recompiling every time it runs! The only solution we have is to stop and restart the SQL Service and then we're in the clear for about 2 weeks.
Here are my observations:
During normal operations we average around 1000 procs in cache. When the issue hits...that drops down to around 20:w00t: I'm using DBCC MEMORYSTATUS and sys.dm_exec_cached_plans to track this.
When I compare values for sys.dm_os_memory_clerks the only major variances between my happy days and my sad days are the multi_pages_kb and single_pages_kb for CACHESTORE_OBJCP and CACHESTORE_SQLCP. These types drop by 90% when we encounter the issue. Everything else looks the same.
So my questions are...what the heck is happening to my proc cache? Why is SQL not allocating enough memory to it? I can't find another type in sys.dm_os_memory_clerks that is stealing the memory away from it, so where is it going and what else can I check?
:ermm:
August 21, 2009 at 11:33 am
Has someone run DBCC FREEPROCCACHE? Restored a database or log? Run an sp_configure changing a setting?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 21, 2009 at 11:50 am
No, this problem happens sometimes at 4 am on a Sunday night when I'm sure there are not admins logged into the server to make any changes like that. And sometimes it happens in the middle of the day. Good question though.
August 21, 2009 at 12:34 pm
Any pattern in when it happens? Any entries in the SQL error log or windows event log that coincide with the time it starts?
What OS, how much memory? Do you have max memory set and, if so, to what? Do you have lock pages set? Awe or /3GB set?
I'd try and run a trace (server-side, not profiler) to see exactly what's happening before this problem starts. May involve some long-term tracing though.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 21, 2009 at 12:39 pm
Are you rebuilding indexes/shrinking stuff/updating statistics?
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
August 21, 2009 at 1:06 pm
I've had a similar problem on 2005sp3 box, and while I'm not sure if your causes are the same, I'll post what I found out. One of our applications kept connections open and those connections were consuming more and more memory pages, forcing sql server to give up procedure cache and start recompiling queries. I looked at sys.sysprocesses, and here they were, sitting on thousands of pages:) After killing old processes(connections), the CPU went back to normal. Here's the SQL I used to determine which connections were more than 1 day old and consuming the memory:
SELECT SPID
,LOGIN_TIME
,HOSTNAME
,PROGRAM_NAME
,LOGINAME
,MEMUSAGE
FROM
SYS.SYSPROCESSES
WHERE NET_LIBRARY = 'TCP/IP'
AND LOGIN_TIME < Getdate()
- 1
ORDER BY MEMUSAGE DESC
p.s. i haven't tried the code on 2008.
August 21, 2009 at 2:02 pm
Matt Miller (8/21/2009)
Are you rebuilding indexes/shrinking stuff/updating statistics?
No rebuilding or stats going on.
August 21, 2009 at 2:02 pm
Thanks binko, I'll be sure to run that query in the future to see if we have any stale connections out there.
August 21, 2009 at 4:20 pm
GilaMonster (8/21/2009)
Any pattern in when it happens? Any entries in the SQL error log or windows event log that coincide with the time it starts?What OS, how much memory? Do you have max memory set and, if so, to what? Do you have lock pages set? Awe or /3GB set?
I'd try and run a trace (server-side, not profiler) to see exactly what's happening before this problem starts. May involve some long-term tracing though.
This happens about every two weeks but no specific time of day.. The only entries of note in the logs are a few errors like this "Error: 18056, Severity: 20, State: 29". I believe those are caused by the CPU utilization and the server not being able to grab a new connection.
The OS is Windows 2003 Server Standard. 4 gigs of memory, AWE is not enabled and the 3gb switch is not on. Max memory is set to 3,072 MB. Lock pages is not set.
August 22, 2009 at 1:45 am
Ok, without AWE or /3GB SQL will never be able to reach that 3GB max memory. On 32 bit, without either AWE or /3GB, max memory per process is 2GB.
Consider enabling /3GB.
If there anything in the windows even log that coincides with the time the problem starts? You said there's a sev 20 error in the error log. What's the message? Anything before that, even if it's not high severity? Anything that's not regular backup messages or login failed.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply