The TL;DR on this is simple:
"BECAUSE IT'S SUPPOSED TO!"
A frequent complaint we receive comes from a client that has an in-house sysadmin monitoring tool like Microsoft SCOM/SCCM. They turn the tool on and it starts red-alerting because the Windows server hosting SQL Server is at 90%+ used RAM. The sysadmin (or local DBA) logs on to the server and finds that there is 5GB free (~4%), and the sqlservr.exe process is using 120GB of the 128GB on the server!
http://cdn.meme.am/instances/55759451.jpg |
Like almost all things SQL Server, this is a simplification of a situation that requires some serious #ItDepends.
- What is the baseline - how much RAM is usually free on this server? This can be seen by tracking the Perfmon counter Memory\Available MBytes.
- What is the Buffer Manager\Page Life Expectancy number - not right this second, but over time?
- Is there any actual performance issue on the server?
This brings us back to our three questions above. Rather than worrying about some artificial guideline like 10%, how much memory is usually free on this server? Track Memory\Available MBytes over time and see where it tops out - if the regular operation of your server always leaves 2-4GB free, you are probably OK from an Windows point of view.
What is the Page Life Expectancy of the SQL Server instance over time? As many other have written about, PLE is one of the most misused statistics in SQL Server. A DBA checks Perfmon or runs a DMV query (like this one from the awesome set curated by Glenn Berry (blog/@GlennAlanBerry)):
-- Page Life Expectancy (PLE) value for each NUMA node in current instance (Query 36) (PLE by NUMA Node)
SELECT @@SERVERNAME AS [Server Name], [object_name], instance_name, cntr_value AS [Page Life Expectancy]
FROM sys.dm_os_performance_counters WITH (NOLOCK)
WHERE [object_name] LIKE N'%Buffer Node%' -- Handles named instances
AND counter_name = N'Page life expectancy' OPTION (RECOMPILE);
They find that the PLE is some relatively small number and freak out, or worse, they find it is a large number and think "my SQL Server doesn't need this much RAM, I can cut back."
https://lost100pounds.files.wordpress.com/2011/05/danger-cat.jpg |
Like Available MBytes, PLE is a number that needs to be measured over time. On the servers I touch I always recommend setting up a "permanent" Perfmon collector of SQL Server-related counters so that you can look at the numbers over time. For example, a usual PLE chart over a week can look like this:
...or it can look like this:
Even over time, some people look at this second chart and say one of two things:
- My PLE goes down to 14, what do I do?
- My PLE maxes out at 50,000, what do I do?
Is anybody or any process complaining (besides SCCM)? Are any numbers out of line - for example is a virus scan that normally takes 2 hours suddenly taking 4-5? If nothing else is abnormal, is there a problem?
--
As I mentioned at the start of this post, quite often your SQL Server process(es) are simply *supposed* to be using the lion's share by far of the resources on the Windows server - especially RAM. As long as you reserve sufficient head room for the operating system (usually 4GB-16GB as described here by Jonathan Kehayias (blog/@SQLPoolBoy) but modified by monitoring Avail MBytes) then the percentage of RAM used probably doesn't matter.
...and if you have good sysadmins they will listen to you and realize that SQL Servers (like Oracle, Exchange, and other database servers) are simply different from web servers or file servers.
Hope this helps!