Page Split Mini-Series - Part II - Page Latch Contention

Last time, in Part

I of my page-split mini-series, we began speaking about page

splits, defining what they are, how they occur, and which counter we can use to

track them.

This time we

will talk a little about how they show up as a resource wait, and how we can

identify one of the common symptoms of excessive page-splitting.

Although we

recognize a higher number of page splits per second on the system may indicate

an IO bottleneck, it doesn’t give us a good indication of where the splits are

coming from.

The reason why

too many page splits can decrease the performance of the SQL server is due to

the large number of I/O operations. One solution is to deploy a faster I/O

subsystem, where page splits would be less of an issue. Of course, throwing hardware at the problem

can be an expensive and temporary band-aid.

In addition,

pages highly in demand during page-splitting, can cause in-memory contention,

and show up as PageLatches.

During data

modifications where access to the specific data page is requested, SQL Server

will take a short latch on the page to add the data. However, once there is not enough space on a

data page, as a new page is allocated and data is copied from the old page to

the new page, and then new data is added.

Taking into account whether there is a clustered index on the table, SQL

Server must take and hold the latch for a longer period of time. This is where the end-user begins to see a

performance slowdown. All the user is

hoping to do is add new data, not realizing or understanding the internals of

the data engine.

Therefore, one

way to tell if excessive page splits exist on the system is to look at wait

stats. High wait times for latch-wait statistics is

one possible indication of too many page splits. The latches are held in EX

(exclusive) mode, therefore will show up in the waiting queue as PageLatch_EX.

Contrary

to many folks confusion, PageLatch_EX is NOT an I/O request. Rather, it occurs when a task is waiting on a

latch for a buffer, and can cause contention of access to the in-memory pages. In other words, while page split operations

are occurring, the latches will hold them in memory, and not release them until

the operation is complete.

An index is made up of a set of pages (index nodes)

that are organized in a type of structure that is hierarchical in nature. This is called a B-Tree. It is important to know this concept as latches are held on the page that is

being split, the existing next page in the index, and the parent page in the

index tree for the duration of the split. Any attempts to access any of these pages

during the split are blocked, including read access. As the operation does not occur instantly,

performance bottlenecks arise.

Now

that we know excessive page splitting can show up as page latches, you can

further explore counters such as Average Latch Wait Time (ms), Latch Waits per

sec, Total Latch Wait Time and Super latches in this MSDN article on SQL Server Latches

Object.

Using

the SQL DMV that exposes the cumulative wait statistics, sys.dm_os_wait_stats,

we can narrow down our culprit by looking for PAGELATCH_EX waits, which usually

indicate heavy inserts, and with heavy inserts, page splitting can occur. With the help of Glenn Berry’s time

trusted DMV Script

for finding Top Wait Types on the system:

WITH Waits AS
(SELECT wait_type, wait_time_ms / 1000. AS wait_time_s,
100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct,
ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK'
,'SLEEP_SYSTEMTASK','SQLTRACE_BUFFER_FLUSH','WAITFOR', 'LOGMGR_QUEUE','CHECKPOINT_QUEUE'
,'REQUEST_FOR_DEADLOCK_SEARCH','XE_TIMER_EVENT','BROKER_TO_FLUSH','BROKER_TASK_STOP','CLR_MANUAL_EVENT'
,'CLR_AUTO_EVENT','DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT'
,'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN'))
SELECT W1.wait_type,
CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s,
CAST(W1.pct AS DECIMAL(12, 2)) AS pct,
CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct
FROM Waits AS W1
INNER JOIN Waits AS W2
ON W2.rn <= W1.rn
GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct
HAVING SUM(W2.pct) - W1.pct < 95

If your system is not properly configured,

such as setting the appropriate Fill Factor, (which we will talk about later in

this series), page splits become a big headache. Therefore, you will need to continue your

diagnosis.

If the top waits show up for PAGELATCH_* in the sys.dm_os_wait_stats

view, you should further investigate

by querying the sys.dm_os_waiting_tasks DMV. This

view shows us what’s happening now on the system, and shows the queue of

waiting tasks waiting on a resource. To

search for a task waiting for PAGELATCH, you can run the following query:

SELECT session_id, wait_type, resource_description

FROM sys.dm_os_waiting_tasks

WHERE wait_type LIKE 'PAGELATCH%

This will

identify the associated session, and the resource_description column, that

tells you the actual database_id,

file_id, and page_id. Below is an image of sample output of the

above query from an article on Page Latches by the SQLCat

team. You can do a further deep dive of Resolving

PAGELATCH Contention on Highly Concurrent INSERT Workloads, by clicking on their in-depth article. You can use their complex query to actually

look up the page involved in the identified wait.

Furthermore,

you can Cross Apply the sys.dm_os_waiting_tasks with other DMVs, such as dm_exec_requests, dm_exec_sessions, dm_exec_sql_text

and dm_exec_query_plan to get the correlated SQL query and plan info.

select
            db_name(er.database_id) databasename,
            wt.wait_duration_ms,
            wt.wait_type,
            et.text,
            ep.query_plan,
            wt.session_id,
            exec_ssion.cpu_time,
            exec_ssion.memory_usage,
            exec_ssion.logical_reads,
            exec_ssion.total_elapsed_time,
            exec_ssion.program_name,
            exec_ssion.status
    from
            sys.dm_os_waiting_tasks wt
           
            inner join sys.dm_exec_requests er
                        on
                        wt.session_id = er.session_id
           
            inner join sys.dm_exec_sessions exec_ssion
                        on
                        exec_ssion.session_id = er.session_id
           
            cross apply sys.dm_exec_sql_text (er.sql_handle) et
           
            cross apply sys.dm_exec_query_plan (er.plan_handle) ep

Now

that we took a look at how page-splits manifest themselves as a resource wait,

and examined page latch contention, next time we will explore additional ways

to pinpoint page splitting, the number of page splits and steps to reduce them. Stay bookmarked to the Pearl Knows Blog

for the next entry in this Page-Splitting Mini-Series.

Please follow me on Twitter @Pearlknows, and check out our web-site for all our available services at http://www.pearlknows.com.

Take our HealthySQL Challenge! Are you SQL Servers healthy? How do you know for sure? Please contact us about our 15-point Health Check report, which will identify areas for improvement, and allow for best practice recommendations for your SQL Server(s). If we find NOTHING wrong with your SQL Server, the report is FREE! Contact us as pearlknows@yahoo.com

Page Split Mini-Series - Part II - Page Latch Contention

Rate

Share

Share

Rate

Page Split Mini-Series - Part II - Page Latch Contention

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts