February 27, 2013 at 5:36 pm
Last weekend we migrated our primary ERP system from a Windows 2003, SQL 2008R2 server to a windows 2008R2, SQL 2012 server.
The new servers are much beefier than the old ones, including an extra 32 gig of ram.
The ERP database is ~500 gig, of which a good amount is audit data. On the old server I generally had page life expectancy values around 1000 or more for normal daily activity.
On the new server I have been capturing perfmon stats for the past week and I see the PLE jumping all over the place. It climbs up into the several hundred but then suiddenly dives to ridiculously low values like 12 or even 7.
The server usage profile has not changed, it's still the same users and applications doing the same things they were doing last week.
The instance is set to have min 70 gig, max 110 gig allocated to SQL.
The service account has lock pages in memory permission.
Can anyone think of some setting I might have missed during migraiton, or some new option to be set in SQL2012 that I've overlooked, that could explain this?
February 27, 2013 at 6:24 pm
Well, if we could see what was running on the server when PLE drops we might be able to help you out, unfortunately we can't see it.
One guess would be to ask if you are running CHECKDB on the database at the time PLE drops. Would also help to know if there are any other maintenance routines that might be running then as well.
February 27, 2013 at 6:36 pm
Yeh, right now I'm running a server side trace and a few more measures to see if I can match up activity to counters. As I said, though, there's no new usage pattern on the server, no new apps deployed, just all the same jobs, users and apps that were there before. A sudden usage pattern change doesn't seem as likely as me possibly missing some new setting or other.
February 28, 2013 at 2:15 am
No server options that could cause this. Probably CheckDB or index rebuilds that move a huge amount of data through the cache.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 28, 2013 at 6:22 am
A few guesses (based on very limited information provided and a few answered questions):
1) You have much faster IO now and it is able to "swap out data" in RAM (i.e. read it off disk) much more quickly. This can lead to lower PLE although query performance would still likely be better.
2) Another possibility is that someone messed with the schema and dropped one or more important indexes from large tables, leading to scans instead of seeks.
3) Oh, what about MAXDOP server setting?? If you had it set to 1 (not uncommon in the ERP world) and now it is back to default of 0, you would be parallelizing potentially many more queries, which would be consuming data more quickly.
4) Yet another is that you simply did monitoring while some very large reports were being run for some muckety-muck VP. 🙂
5) It used to be that upgrading editions meant having to do a full-scan statistics update on all indexes/stats. But I don't think that is required when going from SQL 2008R2 to 2012.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 28, 2013 at 8:07 am
It could be that the old server had a different number of system buses (NUMA nodes) and this is affecting PLE. I assume you're looking at the Buffer Manager: PLE counter? Given the specs you've said above, you must have multiple NUMA nodes and this counter is the average across all of these. Buffer Node: PLE gives the stats for individual pools.
Threads can access foreign memory to their local NUMA node, but there's a preference for locality, which I'm assuming would end up recycling individual (smaller) pools faster than one big one, therefore have a lower average PLE. In some situations, that could be worse for overall system performance, but generally not.
What are you seeing in terms of overall system performance, e.g. wait stats, average transaction times etc.?
March 7, 2013 at 10:05 pm
Thanks for the responses. I've been looking at this for a few days on and off. Today I've been sitting here watching the counters because I am seeing such weird, weird output.
Here are some numbers from right this instant:
Resource Monitor\Memory (Instance allocation is min 70,000 max 100,000)
In use: 105200
Standby: 25000
Free: 70
Standby and free look weird, I would expect 15 gig less in standby and 15 gig more in free.
Perfmon:
Buffer Manager:PLE = 12
% Disk Time (data) = 92,000 (!!!!)
Avg Queue Length (data) = 920
Disk Reads / sec (data) = 6000 (!!!!)
Disk Writes / sec (data) = 55
% Disk Time (log) = --
Avg Queue Length (log) = 0.02
Disk Reads / sec (log) = 0
Disk Write / sec (log) = 2
% Processor Time = 14.5
Instantaneous output from sp_whoisactive at the same time shows no signficant reads from query activity:
reads writes physical_reads
------ ------ --------------
54 0 64
29 0 96
6 0 0
75,567 0 28,832
4 0 0
4 0 0
4 0 0
4 0 0
4 0 0
5 0 0
40 0 14
18 0 0
5 0 0
5 0 0
4 0 0
NULL NULL NULL
13 0 16
5 0 0
5 0 0
NULL NULL NULL
NULL NULL NULL
56,959 5 56,780
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
69,398 36 3,417
1,452 0 1,722
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
120 3 16
NULL NULL NULL
NULL NULL NULL
4 0 0
1,756 8 304
NULL NULL NULL
7 0 7
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
207 2 23
NULL NULL NULL
90 0 0
NULL NULL NULL
8 0 0
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
27 0 0
98 0 0
NULL NULL NULL
8 0 0
NULL NULL NULL
NULL NULL NULL
101 3 92
224 6 16
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
sys.dm_os_wait_stats
CXPACKET is at the top (15%), followed by...
SP_SERVER_DIAGNOSTICS_SLEEP
XE_TIMER_EVENT
HADR_FILESTREAM_IOMGR_IOCOMPLETION
DIRTY_PAGE_POLL
BROKER_EVENTHANDLER
XE_DISPATCHER_WAIT
LOGMGR_QUEUE
PAGEIOLATCH_SH is down in 12th place (4.3%)
NUMA
Unfortunately I know virtually nothing about NUMA yet. I guess this is me learning about it, heh.
I see 5 nodes in sys.dm_os_nodes, 2 "ONLINE", 1 "ONLINE DAC", 2 "OFFLINE".
In sys.dm_os_memory_nodes I see:
memory_node_id virtual_address_space_reserved_kb virtual_address_space_committed_kb locked_page_allocations_kb pages_kb shared_memory_reserved_kb shared_memory_committed_kb cpu_affinity_mask online_scheduler_mask processor_group foreign_committed_kb
-------------- --------------------------------- ---------------------------------- -------------------------- -------------------- ------------------------- -------------------------- -------------------- --------------------- --------------- --------------------
0 203592464 2183796 48977664 10796224 1920 1920 1048575 1048575 0 21068516
1 35712 35732 2784 2784 0 0 1048575 0 1 0
2 35712 35732 2784 2784 0 0 1099510579200 0 1 0
3 32768 32788 51128584 11795384 0 0 1099510579200 1099510579200 0 23373076
64 0 20 0 10797256 0 0 1048575 1048575 0 0
It looks really weird to me. Query activity is minimal but disk activity is thorugh the roof and PLE is collapsing at semi-regular intervals. Any interpretations would be most welcome.
March 7, 2013 at 10:13 pm
Just curious about a couple of BIOS settings.
1. What is the hardware reserve memory set at for the server? This could be impacting your "standby" memory count.
2. Are the energy saving features enabled for this server? I know you are looking at memory and disk. But I am curious if it is being caused by a CPU that is being taxed due to the energy saving features being enabled.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 7, 2013 at 10:17 pm
Hardware reserved set at 64
CPU use is way down at 5 - 15%.
March 7, 2013 at 10:27 pm
Hardware reserved, is that in MB or GB that you are showing?
NUMA - did you make any changes to CPU affinity or is the default setting still active (SQL Server managing)?
HowardW asked a question concerning Buffer Node v Buffer Manager in regards to which PLE you are monitoring. Which one is reporting the erratic stats?
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 7, 2013 at 10:35 pm
SQLRNNR (3/7/2013)
Hardware reserved, is that in MB or GB that you are showing?NUMA - did you make any changes to CPU affinity or is the default setting still active (SQL Server managing)?
HowardW asked a question concerning Buffer Node v Buffer Manager in regards to which PLE you are monitoring. Which one is reporting the erratic stats?
Sorry, hardware reserved is 64 MB.
NUMA is default.
Pefmon node breakdowns hows PLE on node "000" as 38, node "003" as 43 right now (ie, seems both are equally low)
Gail:
Index rebuilds are set for Sunday nights only (and don't appear in sp_whoisactive when I see PLE plummit). CheckDB is not run against this system, I run it on the DR site.
March 7, 2013 at 10:37 pm
What are the maximum values you have seen for PLE on each node?
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 7, 2013 at 10:41 pm
I haven't been tracking PLE by node before now, but the max I've seen for the buffer manager counter is plenty high: 50,000.
If I graph the this counter over the last week I see a dozen-and-a-half or so precipitous drops from very high values (20,000+) down to virtually zero. There's no obvious pattern to the timing, even if I round to the nearest hour. Sometimes it's at around 4 am (start of ETL's), but it also can occur at seemingly any time day or night.
March 7, 2013 at 10:48 pm
Is SQL Server the only thing on this box or do you have other apps and things like AntiVirus installed on this box too?
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 7, 2013 at 10:52 pm
All SQL baby!
The next highest memory consumer (right now, per resource monitor) is the DTS service with a commit charge of 353 meg (this is a 2 node clustered instance on a 4 node windows cluster).
Integration services 2008 is (unfortunately) installed in order to run half a dozen legacy packages that weren't migrated prior to the big server move, but the packages are tiny and execute in seconds/minutes, and only run once a day at around 2am.
Cheers for the continued attention BTW. I hope you're as confused as I am 😛
Viewing 15 posts - 1 through 15 (of 55 total)
You must be logged in to reply to this topic. Login to reply