General pointers on reading wait statistics

  • Hello,

    I know it's not easy to answer questions about performance tuning given the variations and unique circumstances across SQL systems, but I was wondering if anyone could provide general pointers for how I might focus my investigation given a certain measure.

    I ran the Top Waits query located here on one of our SQL servers:

    http://glennberrysqlperformance.spaces.live.com/blog/cns!45041418ECCAA960!2991.entry

    Many thanks to Glenn Berry for the blog post containing this query. The query is:

    -- Isolate top waits for server instance since last restart or statistics clear

    WITH Waits AS

    (SELECT wait_type, wait_time_ms / 1000. AS wait_time_s,

    100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct,

    ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn

    FROM sys.dm_os_wait_stats

    WHERE wait_type NOT IN ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE',

    'SLEEP_TASK','SLEEP_SYSTEMTASK','SQLTRACE_BUFFER_FLUSH','WAITFOR','LOGMGR_QUEUE',

    'CHECKPOINT_QUEUE','REQUEST_FOR_DEADLOCK_SEARCH','XE_TIMER_EVENT','BROKER_TO_FLUSH',

    'BROKER_TASK_STOP','CLR_MANUAL_EVENT','CLR_AUTO_EVENT','DISPATCHER_QUEUE_SEMAPHORE',

    'FT_IFTS_SCHEDULER_IDLE_WAIT','XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN'))

    SELECT W1.wait_type,

    CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s,

    CAST(W1.pct AS DECIMAL(12, 2)) AS pct,

    CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct

    FROM Waits AS W1

    INNER JOIN Waits AS W2

    ON W2.rn <= W1.rn

    GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct

    HAVING SUM(W2.pct) - W1.pct < 95; -- percentage threshold

    The server in question is running SQL 2005 Standard Edition 64-bit on Windows 2003 Enterprise Edition. 16 GB of RAM with 12 GB (12288 MB) allocated to SQL Server.

    Here is the result from SELECT @@VERSION:

    [font="Courier New"]Microsoft SQL Server 2005 - 9.00.3080.00 (X64) Sep 6 2009 09:15:46 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)[/font]

    And here is the result of the Top Waits query. Glenn Berry's blog post points out that these wait statistics are since the last time SQL Server started. In this case,

    As you can see, CXPACKET is by far the highest percentage for the top waits (51.08%), with LCK_M_S a distant second at 14.59%. Is there any kind of general advice you can give as a starting point for investigating these wait types? Specifically, I'm wondering whether there's a way to find out what queries are contributing to the CXPACKET waits.

    wait_type wait_time_s pct running_pct

    ------------------------------------------------------------ --------------------------------------- --------------------------------------- ---------------------------------------

    CXPACKET 1272007.88 51.08 51.08

    LCK_M_S 363449.94 14.59 65.67

    SOS_SCHEDULER_YIELD 247536.67 9.94 75.61

    PAGEIOLATCH_SH 110914.13 4.45 80.07

    ASYNC_NETWORK_IO 108995.27 4.38 84.45

    OLEDB 64863.73 2.60 87.05

    LCK_M_IX 38279.98 1.54 88.59

    BACKUPIO 34273.66 1.38 89.96

    PAGELATCH_UP 33290.70 1.34 91.30

    ASYNC_IO_COMPLETION 31712.20 1.27 92.57

    BACKUPBUFFER 31179.06 1.25 93.83

    BACKUPTHREAD 30815.53 1.24 95.06

    (12 row(s) affected)

    I know that there is a SQL Server Central best practice for posting performance problems, but that article seems to be based on knowing the specific procedure in question. I'm not even at that point yet; I'm still trying to narrow down what, if any, specific processes or queries are causing the waits in the first place.

    Thanks in advance for any help.

    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • webrunner (4/19/2010)


    As you can see, CXPACKET is by far the highest percentage for the top waits (51.08%), with LCK_M_S a distant second at 14.59%. Is there any kind of general advice you can give as a starting point for investigating these wait types? Specifically, I'm wondering whether there's a way to find out what queries are contributing to the CXPACKET waits.

    CXPACKET: These 'waits' are often normal, since it is quite unlikely that all threads will finish executing their allocated work at exactly the same time. On the other hand, excessive CXPACKET waits might indicate a problem with order-preserving exchanges, data skew, poor statistics, or scheduler overloading.

    One sensible way to approach the next stage of analysis is to regularly record data about executing queries - using server-side traces or sampling queries based on DMVs and DMFs. The idea is to highlight queries that are contributing most to the observed waits, and to give you a way to prioritise your tuning efforts.

    There are a number of good books on this subject (Chapter 4 of Inside SQL Server 2008 T-SQL Querying springs to mind) and some good online resources too, like SQL Server Best Pratices: Waits and Queues

  • Thank you, Paul.

    I have read through the waits and queues documents at a high level, but I will go back to it and study it more carefully, as well as set up a way to collect wait statistics to correlate with specific queries. I will post back on how it goes. Also, thanks for the book chapter reference - I didn't know about that.

    Thanks again,

    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • 1) the BEST way to learn how to interpret and make use of the wait stats is to have a professional tuner mentor you while giving your system a review using them

    2) without any more information and giving a very quick review of your output I would say the two biggest things are:

    a) based on CXPACKETs you have MAXDOP too high (0 is bad for virtually all systems out there because the IO is insufficient to feed the CPUs), your Cost Threshhold for Parallelism is too low (5 is almost universally too low).

    b) based on CXPACKETs and LCK_M_S you are also probably missing some important indexes, thus doing table scans leading to too much IO and too long lock holding.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Download Tom Davidson's white paper from Microsoft (Google ---> search for sys.dm_os_wait_stats white paper).

    You can find there scenarios for OLTP and warehousing environments.

  • Thank you, Paul.

    I have read through the waits and queues documents at a high level, but I will go back to it and study it more carefully, as well as set up a way to collect wait statistics to correlate with specific queries. I will post back on how it goes. Also, thanks for the book chapter reference - I didn't know about that.

    Thanks again,

    webrunner

    Hi Webrunner,

    Did you resolve the CXPACKET issue? If yes, Could you please share your experience on resolving this issue..

    We are experiencing CXACKET wait for Share point Content database even though the Indexes & Statistics are up to date!

  • Please post new questions in a new thread.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply