Last time, in Part
I of my page-split mini-series, we began speaking about page
splits, defining what they are, how they occur, and which counter we can use to
track them.
This time we
will talk a little about how they show up as a resource wait, and how we can
identify one of the common symptoms of excessive page-splitting.
Although we
recognize a higher number of page splits per second on the system may indicate
an IO bottleneck, it doesn’t give us a good indication of where the splits are
coming from.
The reason why
too many page splits can decrease the performance of the SQL server is due to
the large number of I/O operations. One solution is to deploy a faster I/O
subsystem, where page splits would be less of an issue. Of course, throwing hardware at the problem
can be an expensive and temporary band-aid.
In addition,
pages highly in demand during page-splitting, can cause in-memory contention,
and show up as PageLatches.
During data
modifications where access to the specific data page is requested, SQL Server
will take a short latch on the page to add the data. However, once there is not enough space on a
data page, as a new page is allocated and data is copied from the old page to
the new page, and then new data is added.
Taking into account whether there is a clustered index on the table, SQL
Server must take and hold the latch for a longer period of time. This is where the end-user begins to see a
performance slowdown. All the user is
hoping to do is add new data, not realizing or understanding the internals of
the data engine.
Therefore, one
way to tell if excessive page splits exist on the system is to look at wait
stats. High wait times for latch-wait statistics is
one possible indication of too many page splits. The latches are held in EX
(exclusive) mode, therefore will show up in the waiting queue as PageLatch_EX.
Contrary
to many folks confusion, PageLatch_EX is NOT an I/O request. Rather, it occurs when a task is waiting on a
latch for a buffer, and can cause contention of access to the in-memory pages. In other words, while page split operations
are occurring, the latches will hold them in memory, and not release them until
the operation is complete.
An index is made up of a set of pages (index nodes)
that are organized in a type of structure that is hierarchical in nature. This is called a B-Tree. It is important to know this concept as latches are held on the page that is
being split, the existing next page in the index, and the parent page in the
index tree for the duration of the split. Any attempts to access any of these pages
during the split are blocked, including read access. As the operation does not occur instantly,
performance bottlenecks arise.
Now
that we know excessive page splitting can show up as page latches, you can
further explore counters such as Average Latch Wait Time (ms), Latch Waits per
sec, Total Latch Wait Time and Super latches in this MSDN article on SQL Server Latches
Object.
Using
the SQL DMV that exposes the cumulative wait statistics, sys.dm_os_wait_stats,
we can narrow down our culprit by looking for PAGELATCH_EX waits, which usually
indicate heavy inserts, and with heavy inserts, page splitting can occur. With the help of Glenn Berry’s time
trusted DMV Script
for finding Top Wait Types on the system:
WITH Waits AS (SELECT wait_type, wait_time_ms / 1000. AS wait_time_s, 100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct, ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn FROM sys.dm_os_wait_stats WHERE wait_type NOT IN ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK' ,'SLEEP_SYSTEMTASK','SQLTRACE_BUFFER_FLUSH','WAITFOR', 'LOGMGR_QUEUE','CHECKPOINT_QUEUE' ,'REQUEST_FOR_DEADLOCK_SEARCH','XE_TIMER_EVENT','BROKER_TO_FLUSH','BROKER_TASK_STOP','CLR_MANUAL_EVENT' ,'CLR_AUTO_EVENT','DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT' ,'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN')) SELECT W1.wait_type, CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s, CAST(W1.pct AS DECIMAL(12, 2)) AS pct, CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct FROM Waits AS W1 INNER JOIN Waits AS W2 ON W2.rn <= W1.rn GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct HAVING SUM(W2.pct) - W1.pct < 95
If your system is not properly configured,
such as setting the appropriate Fill Factor, (which we will talk about later in
this series), page splits become a big headache. Therefore, you will need to continue your
diagnosis.
If the top waits show up for PAGELATCH_* in the sys.dm_os_wait_stats
view, you should further investigate
by querying the sys.dm_os_waiting_tasks DMV. This
view shows us what’s happening now on the system, and shows the queue of
waiting tasks waiting on a resource. To
search for a task waiting for PAGELATCH, you can run the following query:
SELECT session_id, wait_type, resource_description
FROM sys.dm_os_waiting_tasks
WHERE wait_type LIKE 'PAGELATCH%
This will
identify the associated session, and the resource_description column, that
tells you the actual database_id,
file_id, and page_id. Below is an image of sample output of the
above query from an article on Page Latches by the SQLCat
team. You can do a further deep dive of Resolving
PAGELATCH Contention on Highly Concurrent INSERT Workloads, by clicking on their in-depth article. You can use their complex query to actually
look up the page involved in the identified wait.
Furthermore,
you can Cross Apply the sys.dm_os_waiting_tasks with other DMVs, such as dm_exec_requests, dm_exec_sessions, dm_exec_sql_text
and dm_exec_query_plan to get the correlated SQL query and plan info.
select db_name(er.database_id) databasename, wt.wait_duration_ms, wt.wait_type, et.text, ep.query_plan, wt.session_id, exec_ssion.cpu_time, exec_ssion.memory_usage, exec_ssion.logical_reads, exec_ssion.total_elapsed_time, exec_ssion.program_name, exec_ssion.status from sys.dm_os_waiting_tasks wt inner join sys.dm_exec_requests er on wt.session_id = er.session_id inner join sys.dm_exec_sessions exec_ssion on exec_ssion.session_id = er.session_id cross apply sys.dm_exec_sql_text (er.sql_handle) et cross apply sys.dm_exec_query_plan (er.plan_handle) ep
Now
that we took a look at how page-splits manifest themselves as a resource wait,
and examined page latch contention, next time we will explore additional ways
to pinpoint page splitting, the number of page splits and steps to reduce them. Stay bookmarked to the Pearl Knows Blog
for the next entry in this Page-Splitting Mini-Series.
Please follow me on Twitter @Pearlknows, and check out our web-site for all our available services at http://www.pearlknows.com.
Take our HealthySQL Challenge! Are you SQL Servers healthy? How do you know for sure? Please contact us about our 15-point Health Check report, which will identify areas for improvement, and allow for best practice recommendations for your SQL Server(s). If we find NOTHING wrong with your SQL Server, the report is FREE! Contact us as pearlknows@yahoo.com