July 31, 2007 at 7:39 am
From the sysprocesses system table output, the waitresource may show up as "2:1:1" (PFS Page) or "2:1:3" (SGAM Page). Depending on the degree of contention, this may also lead to SQL Server appearing unresponsive for short periods.
These operations heavily use tempdb:
• | Repeated create and drop of temporary tables (local or global). |
• | Table variables that use tempdb for storage purposes. |
• | Work tables associated with CURSORS. |
• | Work tables associated with an ORDER BY clause. |
• | Work tables associated with an GROUP BY clause. |
• | Work files associated with HASH PLANS. |
Heavy and significant use of these activities may lead to the contention problems.
As part of allocating a page from the mixed extent, SQL Server must scan the Page Free Space (PFS) page to find out which mixed page is free to be allocated. The PFS page keeps track of free space available on every page, and each PFS page tracks about 8000 pages. Appropriate synchronization is maintained to make changes to the PFS and SGAM pages; and that can stall other modifiers for short periods.
When SQL Server searches for a mixed page to allocate, it always starts the scan on the same file and SGAM page. This results in intense contention on the SGAM page when several mixed page allocations are underway, which can cause the problems documented in the "Symptoms" section of this article.
Regarding the recomendations from Microsoft on the tempdb database, since I haven't personally tested the recommendations I cannot say that I disagree or agree. I can say that the theory sounds reasonable but this has to do with the way thread scheduling and allocation works within SQL Server. The allocation mechanism in SQL Server by design will cause contention by searching for free space in partially used pages.
The contention is the result of the design so Microsoft is effectively recommending a work around. Basically by implementing these recommendations you are having SQL Server allocate a full page for every object and allocate it on a different file group which will cause the allocation mechanism to avoid searching for free space in partial pages and avoid the contention that would result from having to allocate the space on a single file group. However, based on the documentation it sounds like the problem is cause by the allocation mechanism searching for free space and that's what causes the disk contention. If the allocation mechanism has to search a tempdb that's several GB is size it's going to take some time and since all allocation must be routed through the same "gate keeper", SQL Server could appear to stall while the allocation is occuring. It's certainly possible and probable that when multiple files groups are implemented multiple allocation "gate keeper" threads are allowed to run so that could reduce contention to some extent even when the filegroups are on the same disk spindles but that's going to have a limited effect on the issue because the disk subsystem is still going to be the bottleneck.
Regardless, in the future I will certainly make sure that I've allocated multiple filegroups for the tempdb regardless of the number of disk subsystems on my server because doing so won't have any significant negative consequences. I will also make sure that my servers run with the -T1118 option enabled unless space is a serious issue because the only negative consequence is the tempdb will use more space which is something I can easily live with.
So basically my stance has changed regarding this subject with regards to the tempdb. Ideally though I still want each filegroup for my tempdb on it's own disk subsystem because that's going to have a far greater impact on reducing contention than having multiple filegroups on a single disk subsystem.
Thanks for the enlightenment as this is very good to know.
July 31, 2007 at 8:45 am
Also, we are both talking about tuning SQL Server but my philosophy is that you have to first understand where the bottlenecks are first before you can tune SQL Server. All too often I'm told by someone that they believe their database server is slow because they only have a 4 way (4 processor) server. I examine their server and find that it has 2GB of memory and 6 disks, 2 disks mirrored on which the OS is loaded and 4 disks in a raid 5 array.
My approach to tuning is slightly different but we are both shooting for the same end goal which is to get the most out of the hardware SQL Server is running on.
July 31, 2007 at 6:22 pm
Right on, mate. and I was pretty sure on this because I personally spoke to the program manager (who code the SQL 2005 storage allocation) at M$ saying SQL would be smart enough to distribute work evenly across tempdb filegroups to alleviate contention. So, if this theory is wrong, I’ll definitely chase him up. I vaguely remember asking him about same setting on user DB (it was sometime ago), and he said something like “yes, but it depends. should see a bit improvement, but I wouldn’t go this way as the main design. Not worth it unless you’ve considered blah blah blah..”.
Also, if you hear anyone making a blunt comment “add more CPU, or we’ve got slow CPUs” when the DB server hits high CPU usage, that’s a load of crap. I can get a junior DBA to do the same. No proof is needed, no evidence, no monitoring, etc. As I’ve said earlier, adding more h/w resources doesn’t necessarily solve an “obvious” issue without proper check (sometimes experience does the job too). I’ve seen SQL error log reporting an IO issue, but it was 100% related to CPU. No CPU was added, no memory was added, disabling parallelism saves the day.
SQL Server has encountered 2 occurrence(s) of IO requests taking longer than 15 seconds to complete on file [e:\mssql\data\xxxxdata1.mdf] in database [xxxx] (7). The OS file handle is 0x000003C0. The offset of the latest long IO is: 0x00000297780000
A lof of people said RAID5 is crap for SQL, but I've got a 1TB OLTP data filegroup on RAID5 on SAN running alright so far (except 1 or 2 days in a month). The SAN has this thing called "engenio" enabled which optimises RAID5 parity writing. Not sure how much it has helped the system, but its cheap on disk space and plus all the large cache capability, SATA drives, etc. Size doesnt speak all the time, but it gets 1000transaction/s a lot of time. So, I cant really complain much about this setup. But I wouldnt dare to say RAID5 on a non-SAN would work fine or all RAID5 would work fine on SAN. I believe the large cache, optimised parity writing, etc has helped to reduce the disks contention as read/write signal came from memory as disks are writing at the same time. There's always a give a take.
Simon
PS I'm not selling this "engenio" thingy, but if anyone uses the same storage, feel free to share info on its advantages/disadvantages, or any other SAN that you think you think you're getting bang for your bucks.
Simon Liew
Microsoft Certified Master: SQL Server 2008
August 12, 2008 at 4:03 am
Hi all....
The above methods wats mentioned are good in finding out the CPU utilization.Can anyone advice me how i can monitor all the processes consuming CPU usage continuously (by the use of some job scheduled to execute a procedure which determines the CPU usage)and can keep some threshold like 70% and if CPU usage crosses that threshold I should be able to get an alert.
Thanks in Advance...
Vinay
Viewing 4 posts - 16 through 18 (of 18 total)
You must be logged in to reply to this topic. Login to reply