July 26, 2012 at 4:45 am
We've had a few problems lately with the size of the target queue growing into the multi-millions, and knock-on effects of trying to process all of these messages.
The reason for this isn't currently obvious (I have some ideas, but that's a different issue), but once the queue is huge (10-20 million messages waiting) I need to take action to speed it up. The activation sproc runs a variety of other sprocs and some of them are doing lookups on large tables - in some cases, I can clear some of the data to improve speed (and yes, regular pruning of these tables is also on my to-do list!) but the fact that the queue is processing is locking and blocking my attempts to do this.
The activation procedure, in common with every example and tutorial I have ever seen, uses an infinite loop with a break when the queue is empty - of course, with 20 million records to process, these procedures will not stop unless there's a poisoned message that stops the queue from within.
BEGIN TRY
WHILE (1 = 1)
BEGIN
BEGIN TRANSACTION;
-- Receive all available messages into the table.
-- Wait 5 seconds for messages.
WAITFOR(
RECEIVE
[queuing_order]
,[conversation_handle]
,[message_type_name]
,CAST([message_body] AS NVARCHAR(MAX))
FROM [EventDetailsTargetQueue]
INTO @ReceiveTable
), TIMEOUT 5000;
IF @@ROWCOUNT = 0
BEGIN
COMMIT;
BREAK;
END
ELSE
...etc.
My thought is to change the loop from
WHILE (1 = 1)
to something like
WHILE (DATEDIFF(minute, @StartDate, GETDATE()) < 2)
to cause the activation procedure to terminate on a regular basis, allowing me to stop the queue gracefully in a reasonable time so that I can carry out whatever remedial action is necessary before restarting it.
My question is whether this is a) a good idea and b) will work as I imagine?
I'm also curious as to the operation of the RECEIVE... when there are, say, 10 million messages in the queue, what will my activation procedure actually RECEIVE? Not all 10 million presumably - there are 32 threads processing this queue, I reuse each conversation handle for 1000 messages before retiring it.
Do I need to add a TOP 10000 or some such to the RECEIVE to allow the loop to go around again and check my 2 minute time limit?
Thanks in advance,
Dave.
August 14, 2012 at 6:50 am
What I usually do in this sort of situation is simply to disable the queue (alter queue dbo.[queuename] with status = off). The call to receive will fail on any and all instances already running and any locks are freed to do my maintenance task. Also by disabling the queue I automatically informed Service Broker not to launch any new instances for as long as the queue is disabled. Then, when I'm done or at any time I do want the process to resume I issue another alter queue command to set the queue status back to on and service broker will automatically start launching new instances of the configured procedure.
For this specific purpose I always test for error code 9617 after calling waitfor (receive) in all service broker procedures: all other error codes are reported and logged as errors, only this error code 9617 makes the procedure exit silently.
Actually error 9617 is the same error code that occurs when the queue gets disabled after 5 rollbacks due to a poison message. I don't need this condition reported repeatedly in my logs every time a process is retried, so I ignore this error code and instead set up a monitoring service for disabled queues. This way I am warned exactly once when a poison message does disable any of my queues and my logging system does not get flooded by the error.
August 14, 2012 at 6:57 am
dave.farmer (7/26/2012)
...I'm also curious as to the operation of the RECEIVE... when there are, say, 10 million messages in the queue, what will my activation procedure actually RECEIVE? Not all 10 million presumably - there are 32 threads processing this queue, I reuse each conversation handle for 1000 messages before retiring it.
...
receive returns per call only the messages from a single conversation. So if you start a new conversation at every 1000th message, your receive will never return more than 1000 rows per call.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply