December 16, 2005 at 10:55 am
We are receiving a lot of complaints about poor performance with an application. The only possible issue I see on SQL Server is memory. I ran perfmon for the following counters over a 30 minute window. While perfmon was running, the users said performance dropped during a 10 minute portion of this window. Below are the results.
Counter Average
Page Faults/sec 6966.941
Page Reads/sec 59.555
Page Writes/sec .857
Pages Input/sec 952.435
Pages Output/sec 13.712
Pages/sec 966.146
We have a 2GB pagefile and only about 50MB is being used, which isn't bad. 2 - 4% paging appears to be about normal.
Physical memory is 4GB, with 3GB allocated to SQL Server via the /3GB switch. The OS is Windows Server 2003 SE, SP1 and SQL is SQL Server 2000 EE with SP3a.
I'm not sure how to interpret the counters, specifically Page Faults/sec. The count appears very high, yet the page file usage is very small.
What are your thoughts?
Thanks, Dave
December 16, 2005 at 12:31 pm
I'm not sure what you have tried, but proper indexes(usefulness and defragged), updated statistics, trace of the queries being executed against the database, bad execution plan on a query. Someone running adhoc reporting against the db. I usually find a lot more problems with the code being executed than the hardware.
What size processor are you using 1way,2way, 4way. Disk raid 0, 1,5 ? What other apps are running on the server. Virus scan on the sql files (bad). Number of users, parallel processing parameter.
Tom
December 16, 2005 at 12:41 pm
I definitly agree with what was said above. I would also try running a profile against the server to see long runnning code. Here is a link to a very good page that describes the counters and even includes a trace template file and some associated procedures for reporting on the results.
December 16, 2005 at 3:28 pm
Thanks. I've seen this article and actually have run his scripts on a few servers. I'm more concerned with understanding how Page Faults/Sec can be high when paging is only 2%.
In regards to what RAID configurations I am using or the number of processors on the server, I don't see how either will affect Page Faults/Sec. Page Faults has to do with disk access vs. Physical Memory access.
Dave
December 16, 2005 at 3:34 pm
you might also want to check your average disk queue and the sql server buffer chache hit ratio with perfmon. the buffer cache hit ratio will give you an idea of how frequently your system pages to disk. the disk queue will give you an idea of how many transactions are waiting to be written to or read from the disk.
December 16, 2005 at 3:36 pm
i have had a very similar situation and found that 8 gig of ram mapped with awe was the ticket to quiesence.
December 18, 2005 at 9:19 pm
Further investigation revealed the page faults occur every 15 minutes while a transaction log backup job is running. The backup job copies the T-Log file to a server located at our alternate data center. The job step to copy the file is what appears to be causing the page faults. Perhaps a high amount of page faults during a file copy are normal.
Any thoughts?
Thanks, Dave
December 23, 2005 at 8:44 am
Page Faults/sec can be soft fault (the page is still in physical ram) or hard faults (page is in pagefile).
Page Reads/sec and Page Writes/sec are the number of reads and writes to pagefiles (hard faults);
Pages Input/sec and Pages Output/sec are the pages read from and written to pagefiles
CPU can handle large number of soft page faults with no problem. Your hard page faults number is not bad. And the page faults may or may not be caused by SQL server. It may be caused by other applications or windows system. Check your MAX ram configuration for SQL server. and check other services on the server, and stop them if not necessary.
December 27, 2005 at 8:35 am
I noticed the NIC's throughput would spike up to 95 mbps on a NIC configured for 100 mbps. I don't know the danger point with NICs in terms of a threshold that would indicate a bottleneck, but I assume spikes to 95% during the time I see a high number of Page Faults is an indication the NIC is a possible bottleneck. During the two to three minutes of time it takes to copy the T-Logs to a remote server I see the NIC throughput consistently above 75% or 75mbps.
Here is my overall take of the situation.
The number of Hard Page Faults/sec, from what I've read, should be less then 20 on average (some people say less then 5 or even 2). Since I'm seeing over 900 that indicates a possible memory bottleneck. However looking at the problem a little closer reveals the high Page Faults/sec occur during a file copy operation, where the file can be as large as 1GB. Since this operation (T-Log Backups) is occurring every 15 minutes, odds are most of the data being copied does not exist in physical memory, but rather on disk. For this reason I should expect to see a high reading for Pages/sec.
Looking into the problem further I decided the copy activity would cause the 1GB file (or a large part of it) to be pulled into physical memory, thereby flushing any SQL Server generated data out of physical memory and causing other SQL Server activities to have to read from disk rather then physical memory while the file copy is occuring. I am still a bit uncertain with this part of my theory due to how the OS has been assigned 1GB of physical memory on my server and SQL Server the remaining 3GB. When the copy activity occurs is the memory usage being taken from the OS allocated memory or the SQL Server allocated memory (since SQL Server invoked the copy command from the SQL Job scheduler).
Finally, looking at the NIC throughput during the file copy shows the NIC was being hit heavily, further adding to slow server performance.
At least this is what I think is happening. Does this make reasoning make sense or am I missing something?
Thanks, Dave
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply