Blocking in SQL Server can be good – after all, it’s one of the ways consistency is guaranteed – we usually don’t want data written to by two processes at once.
It can also be bad. Really bad – a rogue blocker can bring a system to a complete halt, which we definitely don’t want.
So we need monitoring and alerting on it. Enterprise monitoring tools can do this, and do it well – but if you don’t have one, or don’t have enough licenses for your entire estate, you’ll need to roll your own. (OK, or copy someone else’s if you don’t need the learnin’). This post will demonstrate a basic method for detecting blocking and alerting based on a given threshold.
First, here’s a simple demo table to illustrate the process – containing two CPU products. We know it’s a demo, because ChipStockLevel = 0 everywhere in the world at the moment…
We open an UPDATE transaction in one query window – but leave it uncommitted – then run a SELECT in another, mimicing a situation where a process or user updating a stock level has gone wrong and is holding a lock on the row. The UPDATE shows as completed, but the transaction it’s in is still open, so the SELECT is stuck on ‘Executing query…’;
We now have a blocked session. This can be identified by querying sys.dm_exec_requests and looking for a non-zero blocking_session_id;
This DMV is the foundation of monitoring blocking.
It has many columns – some of which we can join to other DMVs for more information – but for this example I’ve pulled just a few pertinent columns back – enough to see that session 61 is blocked by session 56 (as we expect from the previous snip), its wait_type is LCK_M_S (it’s waiting for a shared lock) and it’s been waiting ~410 seconds so far.
But we can’t just start alerting the minute we find a blocked session – transient blocks are expected behaviour in concurrent systems. We need a meaningful period of sustained blocking. This will be dependent on your server and processes – maybe 5 seconds is totally out of the norm, or maybe minutes of blocking is expected and is not actually a problem. We also need a frequency of evaluation – how often we check for blocking.
As an example we’ll say a period of 30 seconds is alert-worthy, and evaluate it every 10 seconds.
The simplest way to implement this – not touching any databases – is a structure akin to the following code snip. We declare four integer variables, then populate them at ten second intervals with the results of a COUNT(*) from the requests DMV. Finally we check if all of them were non-zero – meaning blocking was detected at every interval – 0, 10, 20 and 30 seconds – and alert if so. In this example I’ve commented out email alert code and left a SELECT for demo purpose.
DECLARE @blockedsessions_1 INT = 0;
DECLARE @blockedsessions_2 INT = 0;
DECLARE @blockedsessions_3 INT = 0;
DECLARE @blockedsessions_4 INT = 0;
SELECT @blockedsessions_1 = COUNT(*)
FROM sys.dm_exec_requests
WHERE blocking_session_id <> 0
WAITFOR DELAY '00:00:10'
SELECT @blockedsessions_2 = COUNT(*)
FROM sys.dm_exec_requests
WHERE blocking_session_id <> 0
WAITFOR DELAY '00:00:10'
SELECT @blockedsessions_3 = COUNT(*)
FROM sys.dm_exec_requests
WHERE blocking_session_id <> 0
WAITFOR DELAY '00:00:10'
SELECT @blockedsessions_4 = COUNT(*)
FROM sys.dm_exec_requests
WHERE blocking_session_id <> 0
IF(@blockedsessions_1 <> 0 AND @blockedsessions_2 <> 0 AND @blockedsessions_3 <> 0 AND @blockedsessions_4 <> 0)
BEGIN
SELECT 'Blocking Detected!'
/*
DECLARE @subject VARCHAR(200) = 'Blocking Alert - ' + @@SERVERNAME
DECLARE @email_body VARCHAR(200) = 'Blocking has been detected on ' + @@SERVERNAME + ' - investigate immediately.'
EXEC msdb.dbo.sp_send_dbmail
@profile_name = 'DBA',
@recipients = 'team@company.com',
@subject = @subject,
@body = @email_body
*/END
Running this at the same time as leaving the example blocking session open triggers the alert;
The final step would be to add this script to an Agent job scheduled every 30 seconds.
Now, to reiterate the title, this is a very simple alert. It only checks for instance-wide blocking by any sessions and doesn’t provide any further information. While it can be sufficient for simple systems – one database, not much concurrency – it’s ideally expanded for more functionality. What if we want to…
- Make sure the blocking session is the same each time, to rule out catching multiple transient blocks as a false positive?
- Record a history of blocked sessions?
- Check on a per-database basis?
- Provide more blocker info in the alert – database, query, user?
- Provide blockee info in the alert – who’s getting blocked, is there a chain?
- Split the monitoring and alerting process apart, encapsulated in stored procs?
- What if the blocking session changes during the monitoring period, do we care that it’s a different instance of blocking?
I’ll leave those as an exercise for the reader – to be revisited with a more sophisticated implementation sometime soon.