March 22, 2013 at 3:32 am
We've noticed that memory usage on the DB cluster had suddenly changed. Normally PLE is fairly stable staying at around 6000 seconds. In the afternoon the PLE suddenly dropped from 6000 to under 300 and bounced between 0 & 600 averaging 300 for the rest of the day. At 6:11pm the cluster failed over, again at 00:12 and latest failover at 6:17am this morning.
After the last failover, the PLE and memory usage went back till normal until approx 8:07am when the PLE suddenly went to 0 and SQL memory usage jumped up by approx 4GB
In the SQL Error log before each cluster failover we get these messages:
03/21/2013 18:00:13,spid27s,Unknown,AppDomain 796 (mssqlsystemresource.sys[runtime].1143) unloaded.
03/21/2013 18:00:13,spid1s,Unknown,AppDomain 796 (mssqlsystemresource.sys[runtime].1143) is marked for unload due to memory pressure.
then
03/21/2013 18:11:53,Logon,Unknown,Error: 18451<c/> Severity: 14<c/> State: 1.
03/21/2013 18:11:52,Logon,Unknown,Login failed for user 'swconsumer'. Only administrators may connect at this time. [CLIENT: 10.0.100.38]
03/21/2013 18:11:52,Logon,Unknown,Error: 18451<c/> Severity: 14<c/> State: 1.
03/21/2013 18:11:52,Logon,Unknown,Login failed for user 'swbroker'. Only administrators may connect at this time. [CLIENT: 10.0.200.31]
03/21/2013 18:11:52,Logon,Unknown,Error: 18451<c/> Severity: 14<c/> State: 1.
03/21/2013 18:11:52,Logon,Unknown,Login failed for user 'swapi'. Only administrators may connect at this time. [CLIENT: 10.0.200.27]
03/21/2013 18:11:52,Logon,Unknown,Error: 18451<c/> Severity: 14<c/> State: 1.
03/21/2013 18:11:52,spid327,Unknown,The client was unable to reuse a session with SPID 327<c/> which had been reset for connection pooling. The failure ID is 23. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.
at which point SQL Shuts down and causes a cluster fail-over
We had a software release yesterday, so I assume that a change from that is causing these issues. Question is, where do I start looking to find the cause of this?
Any help would be much appreciated
Andrew
March 22, 2013 at 4:06 am
I would start looking into sys.dm_exec_query_stats to find the queries that are not behaving as normal per your baseline.
Queries that use a lot of memory must be reading lots of data from disk, so I would start looking for rows with high total_logical_reads / execution_count-
Also, the error you are getting is likely a symptom that MAX_SERVER_MEMORY was not set correctly and both SQL Server and OS are starving. Set MAX_SERVER_MEMORY to a reasonable limit. If you already set it, you are probably getting more multi-page allocations than usual. As you probably know, MAX_SERVER_MEMORY does not limit multi-page allocations and you should be looking for those in sys.dm_os_memory_clerks.
Hope this helps
Gianluca
-- Gianluca Sartori
March 22, 2013 at 4:38 am
Thanks for the reply
The server has 24GB mem with SQL set to a max of 20GB and win 2008R2 having 4GB. SQL normally sits at 16GB but something is using the extra 4GB.
Am going through the DMVs you suggested to see if i can find the culprits
Andrew
March 22, 2013 at 5:09 am
ok running this query
select *
from sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st
ORDER BY qs.total_logical_reads DESC
. Am assuming that this DMV data is since the last fail over at 6:17am this morning.
1 proc has been executed 999 times with a total_logical_read = 453124440
Having never looked at these figures before its hard to analyse but, iirc that indicates that this proc is trying to read an awful lot of data for only 1000 executions. I assume that wont help memory issues
Looking in sys.dm_os_memory_clerks the biggest user of Multi_page_kb is MEMORYCLERK_SQLCLR with multi_page_kb = 59240
My only worry is that we don't use CLR code on our servers yet CLR was mentioned in the errorlog.
March 22, 2013 at 5:18 am
mealies (3/22/2013)
ok running this query
select *
from sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st
ORDER BY qs.total_logical_reads DESC
. Am assuming that this DMV data is since the last fail over at 6:17am this morning.
1 proc has been executed 999 times with a total_logical_read = 453124440
Having never looked at these figures before its hard to analyse but, iirc that indicates that this proc is trying to read an awful lot of data for only 1000 executions. I assume that wont help memory issues
Take a look at the execution plan of this procedure. If it is doing some large table/index scans then it means it is missing some required indexes.
March 22, 2013 at 5:59 am
The execution plan wasn't pretty but no missing indexes reported. Although the SSMS toolkit plugin warns the query plan takes 5mb of space.
Looked into the SQL stack dumps and its showing the CLR is causing the memory pressure and we think we found the proc responsible. Its using Spatial Indexes and I believe spatial indexes use CLR
Going to see what calls this proc and see if we can stop it for now to see if it solves the issue.
March 22, 2013 at 6:33 am
mealies (3/22/2013)
The execution plan wasn't pretty but no missing indexes reported. Although the SSMS toolkit plugin warns the query plan takes 5mb of space.Looked into the SQL stack dumps and its showing the CLR is causing the memory pressure and we think we found the proc responsible. Its using Spatial Indexes and I believe spatial indexes use CLR
Going to see what calls this proc and see if we can stop it for now to see if it solves the issue.
Spatial is indeed a CLR consumer under the covers - you don't need CLR enabled for it to function. Spatial is something few people have much experience with and there are a LOT of ways to zig when you should zag. You may want to get a professional to remote in and take a look at what is going on. I hate seeing production systems failing while people hunt around on forums for assistance!! :w00t:
I didn't see it mentioned, but is this a 32 bit system by any chance? Those are notoriously BAD with memory issues due to the memtoleave situation and 2GB lower memory limitations. There are some things you can do to help out there though.
Also, run dbcc memorystatus and see if anything else there pops up as a problem.
It IS possible that the large-IO-consuming query is responsible, but if it didn't show very large scans and hashes that may not be it. If it has large numbers of nested-loop iterations that would explain high IO and no missing indexes. That situation is likely result of bad estimates (which have MANY causes).
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
March 22, 2013 at 7:22 am
Thanks for all the replies, and yes, hunting around on forums whilst the production system crashes & burns is never fun to watch.
We got it in the end, it was Tivoli backup process that although was turned off somehow was still being called. It was continually trying to fire and taking memory each time, until crashing SQL.
We have fully removed it from the system and the memory counters are looking like normal again.
Am off for a beer
Andrew
March 22, 2013 at 8:09 am
Ha! I tend to look at things inside the RDBMS first, while the problem could actually lie somewhere else. My bad.
Glad you sorted it out.
Cheers
Gianluca
-- Gianluca Sartori
March 22, 2013 at 8:25 am
Third party apps have been the cause of TREMENDOUS grief for clients over the years!! Backup, "management" apps, antivirus (LOTS of issues!!), etc...
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
March 22, 2013 at 9:01 am
@arvisam on #sqlhelp has asked me to provide memorystatus output. Here is truncated output
Memory Manager KB
---------------------------------------- -----------
VM Reserved 34954104
VM Committed 21951148
Locked Pages Allocated 0
Reserved Memory 1024
Reserved Memory In Use 0
(5 row(s) affected)
Memory node Id = 0 KB
---------------------------------------- -----------
VM Reserved 11200
VM Committed 10776
Locked Pages Allocated 0
MultiPage Allocator 9080
SinglePage Allocator 1312344
(5 row(s) affected)
Memory node Id = 1 KB
---------------------------------------- -----------
VM Reserved 34939064
VM Committed 21936680
Locked Pages Allocated 0
MultiPage Allocator 156432
SinglePage Allocator 1312344
(5 row(s) affected)
Memory node Id = 64 KB
---------------------------------------- -----------
VM Reserved 2752
VM Committed 2656
Locked Pages Allocated 0
MultiPage Allocator 2576
SinglePage Allocator 1312344
(5 row(s) affected)
MEMORYCLERK_SQLGENERAL (node 0) KB
---------------------------------------- -----------
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 4040
MultiPage Allocator 11664
(7 row(s) affected)
MEMORYCLERK_SQLGENERAL (node 1) KB
---------------------------------------- -----------
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 104
MultiPage Allocator 0
(7 row(s) affected)
MEMORYCLERK_SQLGENERAL (Total) KB
---------------------------------------- -----------
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 4144
MultiPage Allocator 11664
(7 row(s) affected)
MEMORYCLERK_SQLBUFFERPOOL (node 0) KB
---------------------------------------- -----------
VM Reserved 25214976
VM Committed 21289936
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 0
MultiPage Allocator 592
(7 row(s) affected)
MEMORYCLERK_SQLQUERYEXEC (node 0) KB
---------------------------------------- -----------
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 496
MultiPage Allocator 0
(7 row(s) affected)
MEMORYCLERK_SQLOPTIMIZER (node 0) KB
---------------------------------------- -----------
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 408
MultiPage Allocator 944
(7 row(s) affected)
MEMORYCLERK_SQLUTILITIES (node 0) KB
---------------------------------------- -----------
VM Reserved 120
VM Committed 120
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 176
MultiPage Allocator 0
(7 row(s) affected)
MEMORYCLERK_SQLSTORENG (node 0) KB
---------------------------------------- -----------
VM Reserved 2304
VM Committed 2304
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 8904
MultiPage Allocator 984
(7 row(s) affected)
MEMORYCLERK_SQLSTORENG (node 1) KB
---------------------------------------- -----------
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 1488
MultiPage Allocator 3800
(7 row(s) affected)
MEMORYCLERK_SQLSTORENG (Total) KB
---------------------------------------- -----------
VM Reserved 2304
VM Committed 2304
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 10392
MultiPage Allocator 4784
(7 row(s) affected)
MEMORYCLERK_SQLCONNECTIONPOOL (node 0) KB
---------------------------------------- -----------
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 2152
MultiPage Allocator 0
(7 row(s) affected)
MEMORYCLERK_SQLCONNECTIONPOOL (node 1) KB
---------------------------------------- -----------
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 2104
MultiPage Allocator 0
(7 row(s) affected)
MEMORYCLERK_SQLCONNECTIONPOOL (Total) KB
---------------------------------------- -----------
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 4256
MultiPage Allocator 0
(7 row(s) affected)
MEMORYCLERK_SQLCLR (node 0) KB
---------------------------------------- -----------
VM Reserved 9467900
VM Committed 415868
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 1672
MultiPage Allocator 58776
(7 row(s) affected)
MEMORYCLERK_SQLSERVICEBROKER (node 0) KB
---------------------------------------- -----------
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 152
MultiPage Allocator 544
(7 row(s) affected)
MEMORYCLERK_SQLHTTP (node 0) KB
---------------------------------------- -----------
VM Reserved 0
VM Committed 0
Locked Pages Allocated 0
SM Reserved 0
SM Committed 0
SinglePage Allocator 8
MultiPage Allocator 0
(7 row(s) affected)
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy