PLE and BufferPool Contents

Question

PLE and BufferPool Contents

SteveOC

Ten Centuries

Points: 1257
More actions
February 26, 2017 at 9:40 am

#335676

Hello.
I have been looking at Memory utilisation for some time now, and am looking for a definitive way to determine what is using the Bufferpool.
I have access to clusters running SQL 2014 Standard Edition and often see PLE drop from 50,000+ to near zero.
This used to (many months ago) only occur overnight during DBCC CHECKDB and Index Rebuild, which I get, but now occurs multiple times during a day.
Both Servers have 136GB (available to the OS), and they have Bufferpool Max of 116GB or more (one is 128GB) and both Solarwinds DPA and DMV queries confirm these sizes
The bit I don't understand is exactly what is in the Bufferpool when the PLE drops, or indeed at any point in time.
The only way I have found to see what is in the Bufferpool (and relate it to DB Objects) is some code that is available from various sites on the Internet - I don't know if I am allowed to link here but the code must originate from a single source (one site attributes to Jonathan Kehaysias I think).
It doesn't matter when I run this code, the total space occupied by the sum of all the object sizes is way below my bufferpoolsize???
I even tried running the code multiple times in a day and saving the results to a spreadsheet and then comparing the results from before and after PLE drops and it hasn't changed???
When PLE drops, is anything actually removed from the Bufferpol, or are pages just marked as eligible for removal - because what I am seeing querying before and after a PLE drop surely suggests the latter?
However, if that is the case, and pages are marked as eligible for removal from the bufferpool, shouldn't the contents of the bufferpool always be at or close to the Max setting (I.e. 128GB),
which is not what I am seeing.
I am guessing that the issue lies with my understanding of the processing, so really looking for some pointers here as to what I am looking at.
TIA
Steve O.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

SteveOC - Sunday, February 26, 2017 9:40 AM
Hello.
I have been looking at Memory utilisation for some time now, and am looking for a definitive way to determine what is using the Bufferpool.
I have access to clusters running SQL 2014 Standard Edition and often see PLE drop from 50,000+ to near zero.
This used to (many months ago) only occur overnight during DBCC CHECKDB and Index Rebuild, which I get, but now occurs multiple times during a day.
Both Servers have 136GB (available to the OS), and they have Bufferpool Max of 116GB or more (one is 128GB) and both Solarwinds DPA and DMV queries confirm these sizes
The bit I don't understand is exactly what is in the Bufferpool when the PLE drops, or indeed at any point in time.
The only way I have found to see what is in the Bufferpool (and relate it to DB Objects) is some code that is available from various sites on the Internet - I don't know if I am allowed to link here but the code must originate from a single source (one site attributes to Jonathan Kehaysias I think).
It doesn't matter when I run this code, the total space occupied by the sum of all the object sizes is way below my bufferpoolsize???
I even tried running the code multiple times in a day and saving the results to a spreadsheet and then comparing the results from before and after PLE drops and it hasn't changed???
When PLE drops, is anything actually removed from the Bufferpol, or are pages just marked as eligible for removal - because what I am seeing querying before and after a PLE drop surely suggests the latter?
However, if that is the case, and pages are marked as eligible for removal from the bufferpool, shouldn't the contents of the bufferpool always be at or close to the Max setting (I.e. 128GB),
which is not what I am seeing.
I am guessing that the issue lies with my understanding of the processing, so really looking for some pointers here as to what I am looking at.
TIA
Steve O.

1) No idea what code you are running, so cannot advise on what you are seeing. Give dbcc memorystatus a go and see what you can glean from it's outputs (documentation can be found online). Glenn Berry's SQL Server Diagnostic Scripts has some useful stuff too.

2) PLE drop can be associated with MANY things from bugs (often driver/firmware) to hitting some massive archive table with a scan to huge memory grants. It is a pretty good indicator that your buffer pool was flushed and the stuff that was getting hit a lot is now gone and replaced with either nothing (bug/huge memory grant) or unwanted rarely used stuff (archive scenario).

3) Are you patched up on SQL 2014? Is EVERY piece of hardware in your server patched up in both drivers and firmware?

4) There are a variety of things you can do to check for what may be causing buffer flushes such as profiling and shredding query plans looking for large memory grants.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 2

So PLE is just measuring how long a page lasts in memory (it's an average, obviously). So a page of data from a lookup table used in lots of different queries may stay in memory for a very long time indeed. However, along comes a massive data load, and everything in the buffer is flushed to make room, and PLE drops. Or, as Kevin already pointed out, a query plan gets a row estimate that is wildly off, decides it needs to allocate a bunch of memory to satisfy the plan, the cache is flushed and PLE drops, and then the query, when it runs, doesn't use very much memory at all.

In general, PLE is a very rough indicator of the behavior of your system. I've seen PLE measure in months on systems that had problems and PLE measure in seconds on systems that were chugging along happily. Don't get overly focused on PLE as a good or bad thing. It just shows the volatility of your memory. It doesn't show the efficiency of it. That comes from other measures, primarily wait stats.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

SteveOC Ten Centuries Points: 1257 More actions · Answer 3

Thanks Folks - the learning process continues.

"Or, as Kevin already pointed out, a query plan gets a row estimate that is wildly off, decides it needs to allocate a bunch of memory to satisfy the plan, the cache is flushed and PLE drops, and then the query, when it runs, doesn't use very much memory at all".

This is something that I haven't come across previously, and may be the answer to the question as to why PLE drops to nothing and yet the SUM of the sizes of all of the contents of the BP is nowhere near the size of the BP?

I guess that over time I expect the BP to be 'full' as it loads objects, and then when PLE drops to next to nothing it is still full but with other objects - given the above this may not be the case.

Sounds like good statistics and getting a good execution plan with accurate estimates is (as ever) important and I need to pull the execution plans and look for memory grants as you both said.

I am curious - Do you know of any references to the algorithm used to remove objects from the Bufferpool - i.e. does it just remove x number of least used pages to satisfy just the memory grant for instance?

Regards
Steve O.

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 4

Off the top of my head? No. I'd want to go back & reread the relevant sections of Kalen's book to be able to answer your question with even a semblance of intelligence.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

lmarkum SSCertifiable Points: 7989 More actions · Answer 5

SolarWinds DPA shows you what is going on in your server at any given moment. Since you have very large drops in PLE, correlate that to the queries that DPA is surely showing as expensive during that time. Check the wait types for those queries and you will probably see DPA's memory category. That will translate either high physical disk IO and or high logical reads. I'm assuming the ultimate goal is to minimize the sudden plunges in PLE. If so, tune the queries DPA shows you as contributing to the problem.