Massive Buffer I/O Wait Times

  • coronaride (8/2/2011)


    We're seeing a ton of PAGEIOLATCH waits with large Buffer I/O wait times, upwards of 30000ms. Before starting to analyze the query, does anyone have any immediate suggestions?

    I don't know whether you looked at white paper from microsoft regarding wait_types and queues which helped me a lot before here[/url]

    Just might helpful.

    Before when I see the PageIolatch waits a lot, the usual reason would be high transactions/sec along with bulk operations.

    Excessive waiting on pages allocation, and the three types of pages are GAM,PFS, SGAM

    to find exactly what kind of pages its waiting on

    ----------------------------------------------

    --//FIND WHAT PAGE ALLOCATION CONTENSTION YOU HAVE

    ----------------------------------------------

    Select session_id,

    wait_type,

    wait_duration_ms,

    blocking_session_id,

    resource_description,

    ResourceType = Case

    When Cast(Right(resource_description, Len(resource_description) - Charindex(':', resource_description, 3)) As Int) - 1 % 8088 = 0 Then 'Is PFS Page'

    When Cast(Right(resource_description, Len(resource_description) - Charindex(':', resource_description, 3)) As Int) - 2 % 511232 = 0 Then 'Is GAM Page'

    When Cast(Right(resource_description, Len(resource_description) - Charindex(':', resource_description, 3)) As Int) - 3 % 511232 = 0 Then 'Is SGAM Page'

    Else 'Is Not PFS, GAM, or SGAM page'

    End

    From sys.dm_os_waiting_tasks

    Where wait_type Like 'PAGELATCH_UP'

    And resource_description Like '5:%'

    Also here is teh query to find the wait_types you have currently,task and plan

    -----------------------------------------

    --//FIND WAITING TASK WITH WAIT_TYPE AND PLAN

    -----------------------------------------

    select r.cpu_time

    , r.logical_reads

    , r.session_id

    into #temp

    from sys.dm_exec_sessions as s

    inner join sys.dm_exec_requests as r

    on s.session_id =r.session_id --and s.last_request_start_time=r.start_time

    where is_user_process = 1

    waitfor delay '00:00:01'

    select substring(h.text, (r.statement_start_offset/2)+1 , ((case r.statement_end_offset when -1

    then datalength(h.text) else r.statement_end_offset end - r.statement_start_offset)/2) + 1) as text

    , r.cpu_time-t.cpu_time as CPUDiff

    , r.logical_reads-t.logical_reads as ReadDiff

    , p.query_plan

    , r.wait_type

    , r.wait_time

    , r.last_wait_type

    , r.wait_resource

    , r.command

    , r.database_id

    , r.blocking_session_id

    , r.granted_query_memory

    , r.session_id

    , r.reads

    , r.writes

    , r.row_count

    , s.[host_name]

    , s.program_name

    , s.login_name

    from sys.dm_exec_sessions as s

    inner join sys.dm_exec_requests as r

    on s.session_id =r.session_id and s.last_request_start_time=r.start_time

    full outer join #temp as t on t.session_id=s.session_id

    CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) h

    cross apply sys.dm_exec_query_plan(r.plan_handle) p

    order by 3 desc

    drop table #temp

    Also check Trace flag 1118 by sqlskills.com , This traceflag helped me reducing the pageiolatches a lot for only some days and issue is back again.

    So you analyze from the all this rough information to find what solves your issue.

    These might not solve issue directly but helps to find a way to reduce pageiolatches.

    [font="Tahoma"]
    --SQLFRNDZ[/url]
    [/font]

  • What I haven't seen yet is the memory and CPU specs for the server. That will also have an effect on performance here.

    For a database that large, I would even consider drives less than 10K - and would push for 15K. With 24 drives available, how have you carved that up for the LUNs? Not really clear how you configured this - which could definitely be a factor.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Even with a HUGE amount of RAM on the box you have absolutely no hope of good IO from your cheap IO subsystem. I have a 2 year old LAPTOP that will smoke your production server. You are trying to process enterprise-scale data volumes on a penny-ante budget. Just cannot be done. The best you can do on this hardware is to have good indexing and make your code 'batchable' where you process 10000 rows at a time in an index-seek set-based manner.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Yes, yes, understood, the hardware configuration is not great. As previously mentioned, when I made the hardware request, we were still very much in the proof of concept phase. I knew we were going to end up with a fair amount of data, so I had to make a choice between speed and space. I went with the latter, since getting enough 15k drives to provide data redundancy to support up to 8TB was going to triple the price.

    I think that, after we can at least finish this phase and "wow" the big boss, we'll go back and request faster I/O.

  • ... san might not be overkill here. kevin whould have better pointers than that to offer.

  • coronaride (8/2/2011)


    It is a SELECT only, with multiple joins on PK-FK relationships to child tables. There is no inserting, updating, or deleting.

    As far as the number of concurrent processes hitting the database, after much troubleshooting on my own, I had a feeling that this might be the bottleneck. At this point, I'm thinking that, due to all of the waits we're experiencing, we're giving up any gains from multi-threading and should try reducing the number of processes.

    I know I'm coming in on this a bit late, but I have to ask. Do you specifically index the FK on the child table side? The reason I ask this is that I've seen some developers assume that declaring the FK relationship automatically creates an index on the FK side, which is not the case. The FK has to be indexed or you get table scans of the child table on the join from parent to child.

    Other than the above, it does look like your hardware lacks a bit of oomph.

    Todd Fifield

  • Yes, we have the field indexed. Thanks, though. 🙂

  • Ninja's_RGR'us (8/4/2011)


    ... san might not be overkill here. kevin whould have better pointers than that to offer.

    For most small shops (and even lots of medium/big ones) direct-attached storage is often best for SQL Server stuff. WAY simpler to setup, configure, manage and you NEVER have to worry about what the SAN admin(s) will do with your LUNs.

    To the OP: consider getting a tuning professional to give your database/app/process a review. In 15 years of doing said work I have NEVER been to a client that didn't have LOTS of low-hanging performance fruit.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply