Introduction
Many applications require the use of a queuing mechanism. With multiple processes adding to the queue and multiple services acknowledging and acting upon the actions or items present in the queue, it's tricky to get this right using a SQL Server table.
The options developers most commonly opt to use are:
- MSMQ - Works well but goes outside the boundaries of administering your database.
- Custom database tables - Works well for many small - mid size applications that don't require fancy call backs.
- SQL Server Service Broker - The fanciest of all and also the "toughest" to setup and maintain.
The simplest in, my opinion, is to use a SQL Server table for the queuing mechanism. However by doing so, we need to solve the concurrency problem that comes with the territory; multiple operators all requesting/polling for an item from the queue. How does the queue serve each operator exactly one, yet to be processed, item from the queue that has not and will not, be picked up by any other operator going against the queue?
The answer lies in the query hints to the query optimizer. Let's start with our sample table. We'll use a simple work_queue table to explore our options:
-- Create table
CREATE TABLE [dbo].[work_queue](
[work_queue_id] [int] IDENTITY(1,1) NOT NULL,
[name] varchar(255) NOT NULL, -- a simple name for each item
[processed_flag] bit NOT NULL, -- 0: Ready for processing, 1: Processed
CONSTRAINT [PK_work_queue] PRIMARY KEY CLUSTERED
(
[work_queue_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) -- Add data to table
INSERT INTO work_queue (name, processed_flag)
SELECT 'A', 0
UNION ALL SELECT 'B', 0
UNION ALL SELECT 'C', 0
UNION ALL SELECT 'D', 0
UNION ALL SELECT 'E', 0
UNION ALL SELECT 'F', 0
UNION ALL SELECT 'G', 0
UNION ALL SELECT 'H', 0
UNION ALL SELECT 'I', 0
UNION ALL SELECT 'J', 0
UNION ALL SELECT 'K', 0
Discovering the Problem
Run the following transaction from 2 query windows in Management Studio. The simulated delay enables duplication of the problem.
DECLARE @work_queue_id int, @name varchar(255)
BEGIN TRANSACTION -- select 1 queue item to process that hasn't been processed yet
SELECT TOP(1)
@work_queue_id = work_queue_id, @name = name
FROM
work_queue
WHERE processed_flag = 0 PRINT 'Processing ' + @name + ', work_queue_id = ' + CONVERT(varchar, @work_queue_id) -- simulate delay
WAITFOR DELAY '00:00:15' -- flag item as processed
UPDATE work_queue SET processed_flag = 1 WHERE work_queue_id = @work_queue_id COMMIT
The result is that the queue flagged the same item twice! While the first transaction was taking place, another transaction found that the same row was available for another process to work with (because it's status hadn't yet been updated to reflect a "processed" state)
The Fix
To fix this behavior, we'll introduce a query hint (UPDLOCK) instructing SQL Server to hold an update lock on the selected item for the duration of the transaction. When SQL Server sees that an update lock is held on a record, it will wait for the lock to be released before it proceeds to serve other select statement.
This fixes the problem but suffers from a performance issue. We don't want SQL Server to just wait until the transaction is committed until it can serve another query call. That has the potential of piling up requests waiting for the queue to grant them an item to work with. Not the most friendliest of environments.
We can tackle the performance issue by introducing yet another query hint (READPAST). READPAST instructs SQL Server to simply skip dirty records (records with locks) when serving a request (query).
The end result looks like this:
Run the following transaction from 2 query windows in Management Studio.
DECLARE @work_queue_id int, @name varchar(255)
BEGIN TRANSACTION -- select 1 queue item to process that hasn't been processed yet
SELECT TOP(1)
@work_queue_id = work_queue_id, @name = name
FROM
work_queue WITH (UPDLOCK, READPAST)
WHERE processed_flag = 0 PRINT 'Processing ' + @name + ', work_queue_id = ' + CONVERT(varchar, @work_queue_id) -- simulate delay
WAITFOR DELAY '00:00:15' -- flag item as processed
UPDATE work_queue SET processed_flag = 1 WHERE work_queue_id = @work_queue_id COMMIT
SQL Server 2008 Alternative
-- SQL Server 2008 alternative --
UPDATE TOP(1)
work_queue WITH (READPAST)
SET
processed_flag = 1
OUTPUT
inserted.work_queue_id
WHERE
processed_flag = 0
The OUTPUT clause returns the results of the items in the current operation. Think of it as an inline equivalent to an after trigger. We used inserted.work_queue_id to get the ID of the item we just updated. The difference between this and the method used for SQL Server 2005 is twofold:
- This is a single statement so it's in a transaction on its own.
- The update already guarantees exclusivity with an UPDLOCK internally on the row.
Conclusion
Working with Queues in SQL Server is fairly simple and easily expandable once you know the pitfalls of doing so. With the introduction of the OUTPUT clause in SQL Server 2008, there's yet an easier/different way to achieve the same results. Check out Working with Queues in SQL Server 2008 using the OUTPUT clause