February 16, 2011 at 10:05 am
I have a SQL Server 2005 SP3 instance which is apparently running slow.
Spec: 1 dual core CPU, 4GB RAM
It's a virtual machine, and I notice that there has been some memory ballooning, which I have asked the Server team to investigate.
Other issues:
Page life expectancy can drop very low (<50 at times), and stay low for minutes at a time.
Buffer Cache hit ratio is generally over 99% but does occasionally drop to 94%
Memory/Available MBytes is steady at 250
Target server memory/Total Server memory target is more than total server memory (only by 12MB. I say "only" - is this an issue?)
Average disk queue length is generally well behaved at around 1.5, but spikes to over 90 at times!
Average Disk sec/read can be very high (12sec at one point!), and when at their lowest are around 40ms
Average Disk sec/write is also high, though not as much
Main wait stats are PAGEIOLATCH_SH and CXPACKET (the latter pointing to PAGEIOLATCH_ waits)
Finally, here's my question:
Am I correct in thinking memory is an issue and hence should be the first thing to troubleshoot here? Could the poor I/O activity be caused by the memory pressure/ballooning? Or is it likely that I/O is an issue on top of memory?
Many thanks for your time
February 16, 2011 at 11:58 am
Sounds like you have both problems. I'd attack each one, but try to do it individually so that you can determine if adding memory helps or if getting better, more, different disks helps
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 18, 2011 at 9:28 am
Thanks Grant
Well, taking care of the memory ballooning (and the fact that the VM was limited to 2GB RAM when it should have been 4!) didn't make much difference to the user's experience. The disk that the VM was sitting on was identified as being very slow, so it is now going to be migrated to a faster disk. Hopefully this will sort the issue out.
February 18, 2011 at 9:44 am
What do the wait stats look like? What are the processes on the server waiting for? That'll help you determine the biggest pain points.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 18, 2011 at 9:53 am
Not sure how to format it, if it turns out terrible I will try to fix it
The CXPACKETs as far as I could tell were waiting on PAGEIOLATCHes. The instance was last restarted jsut ove a month ago.
Wait_SResource_S Signal_S WaitCount PercentageAvgWait_S AvgRes_S AvgSig_S
PAGEIOLATCH_SH 811720.88 809065.28 2655.5919674186 41.820.04130.04110.0001
CXPACKET 615717.3 611265.05 4452.2513998978 31.720.0440.04370.0003
PAGEIOLATCH_EX 92123.88 91919.94 203.942145286 4.750.04290.04280.0001
MSQL_XP 89660.39 89660.39 024023 4.623.73233.73230
BACKUPIO 55680.05 55450.34 229.7940057 2.870.05920.0590.0002
OLEDB 44847.09 44847.09 094825460 2.310.00050.00050
BACKUPBUFFER 44721.03 44373.91 347.131619776 2.30.02760.02740.0002
ASYNC_IO_COMPLETION 41390.86 41390.48 0.38456 2.1390.769490.76860.0008
IO_COMPLETION 35012.73 34895.45 117.281958921 1.80.01790.01780.0001
WRITELOG 32058.2 31483.66 574.551446581 1.650.02220.02180.0004
Edit - that's better!
February 18, 2011 at 10:54 am
mazzz (2/18/2011)
Not sure how to format it, if it turns out terrible I will try to fix itThe CXPACKETs as far as I could tell were waiting on PAGEIOLATCHes. The instance was last restarted jsut ove a month ago.
Wait_SResource_S Signal_S WaitCount PercentageAvgWait_S AvgRes_S AvgSig_S
PAGEIOLATCH_SH 811720.88 809065.28 2655.5919674186 41.820.04130.04110.0001
CXPACKET 615717.3 611265.05 4452.2513998978 31.720.0440.04370.0003
PAGEIOLATCH_EX 92123.88 91919.94 203.942145286 4.750.04290.04280.0001
MSQL_XP 89660.39 89660.39 024023 4.623.73233.73230
BACKUPIO 55680.05 55450.34 229.7940057 2.870.05920.0590.0002
OLEDB 44847.09 44847.09 094825460 2.310.00050.00050
BACKUPBUFFER 44721.03 44373.91 347.131619776 2.30.02760.02740.0002
ASYNC_IO_COMPLETION 41390.86 41390.48 0.38456 2.1390.769490.76860.0008
IO_COMPLETION 35012.73 34895.45 117.281958921 1.80.01790.01780.0001
WRITELOG 32058.2 31483.66 574.551446581 1.650.02220.02180.0004
Edit - that's better!
Well, lots of IO waits, for sure, and the CXPACKET is parallelism, but possibly caused by the IO. I'd spend time working on the IO & disk based on what I'm seeing here. That's a little outside my area of expertise, especially when dealing with virtuals.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 18, 2011 at 11:20 am
I would agree that you definitely look IO bound and would make that the priority in this case. I wouldn't be at all surprised if you are having difficulty completing checkpoints either (see checkpoint pages/sec counter).
Unfortunately this is the area that is most neglected when putting up a VM and it will, as we know, bring a database to its knees pretty quickly. We don't just need the space, we need the spindles to get the IOPS.
Also, being that your PLE is dropping low means that you are spending more time on disk which is probably exacerbating your situation as well.
Let us know how things play out.
David
@SQLTentmaker“He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot
February 18, 2011 at 12:13 pm
Thank you both for your input
The VM is being migrated to another disk tonight. Strangely, that SQL Server was only sharing a disk with one other machine, a dev box that doesn't appear to have seen any action for a while.
I will run the various checks again on Monday to see if there's any improvement and report back.
Have a good weekend!
February 24, 2011 at 1:03 pm
Well, the VM was migrated, and after four days the customer came back to say performance had improved.
The metrics I pulled out after the migration also supported this. There's still some drops in PLE and I/O isn't great, but unfortunately that seems to be the case with most of our databases - we have no input in terms of SAN set-up, and it's all RAID 5 across the board.
At any rate, the debilitating performance issue appears to have been fixed, the users can do their work, so I'm happy.
Thank you again for your time and input.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply