Best way to handle a table-based queue?

  • Jeff Moden - Wednesday, June 7, 2017 5:43 PM

    Here's a runnable example (setup 3 or 4 windows with the same code and a whole lot more rows and run it all at the same time in a loop to see it in real action) that will seriously help you avoid deadlocks on the queue table.  I kept it simple just to demonstrate the principle.  Modify to your heart's content.

    Yes Jeff, it needs modification.

    In real life there is no threshold for execution duration.
    In my real-life case I had tasks which involved parsing 3rd party XML by BizTalk, which could take several hours.
    Of course, I could not afford to wait for hours before repeating an attempt on another kind of tasks, which were executed once in a second.
    To identify a connection I used spid + binary representation of login time.
    Before starting a new task I checked if every connection listed in the table had its counterpart in sysprocesses and reset the task where a corresponding connection could not be found.
    I used StartTime, but it was not just a GETDATE(), it was indicating when any particular task to be started. The engine was taking only items with StartTime in the past, earlier ones come first.
    Top priority tasks which must skip the queue were getting StartTime in the past, hours or even days prior to GETDATE().
    I found such a use of the column a bit inconvenient, and planned to split it to ScheduleTime and StartTime (actual start time)  but never got to implementing it.
    StartTime + QueueID (Identity) were the PK (clustered) in the table. 

    And there is no place for End Time in this table. EndTime is for a TaskLog table, where the records from TaskQueue are moved into by a FOR UPDATE, DELETE trigger.

    There was also a Status column.
    Depending on the resulting status of a process an Item could be scheduled for another process (or the same one, if a repeating attempt is required). There was another TaskMapping table which reflected the business rules.

    There were other bells and whistles in the setup, but general outlook was like this,

    _____________
    Code for TallyGenerator

  • Jeff Moden - Wednesday, June 7, 2017 5:43 PM


    --===== Once done with the task, mark the end date
     UPDATE tgt
        SET EndDt = GETDATE()
       FROM #ItemQueue tgt
      WHERE tgt.ItemQueueId = @ItemQueueId
    ;

    @jeff - I suspect that your final update may need a WHERE clause.

  • I stopped using queue tables eight years ago when 2008R2 came out with filtered indexes (like dBase had 30 years ago).  Let the filtered index be your processing queue.  By having a narrow queue table, it puts hundreds of rows into the same page increasing chances of blocking.  Also, I'm wondering why no one is using update with output since the main advantage of it is to get around this exact problem.  And thirdly, I also despise the use of Service Broker for this simple situation.  Service Broker has some serious architecture to it that is overkill for this.

  • DesNorton - Thursday, June 8, 2017 4:10 AM

    Jeff Moden - Wednesday, June 7, 2017 5:43 PM


    --===== Once done with the task, mark the end date
     UPDATE tgt
        SET EndDt = GETDATE()
       FROM #ItemQueue tgt
      WHERE tgt.ItemQueueId = @ItemQueueId
    ;

    @jeff - I suspect that your final update may need a WHERE clause.

    :blush::blush: It sure does.  Can't believe I missed it.  Thank you for the catch.  I'll go back and fix the code above.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Bill Talada - Thursday, June 8, 2017 5:01 AM

    I stopped using queue tables eight years ago when 2008R2 came out with filtered indexes (like dBase had 30 years ago).  Let the filtered index be your processing queue.  By having a narrow queue table, it puts hundreds of rows into the same page increasing chances of blocking.  Also, I'm wondering why no one is using update with output since the main advantage of it is to get around this exact problem.  And thirdly, I also despise the use of Service Broker for this simple situation.  Service Broker has some serious architecture to it that is overkill for this.

    Using the sample data that's been provided, can you show us a coded example of the filtered index method you speak of?  I could certainly be wrong but I'm pretty sure that even filtered indexes need to work against a table.

    Also, I didn't use OUTPUT because I didn't see any advantage to having the returned ID data in a table variable or temp table or bothering TempDB for a single item in a necessarily RBAR operation.  Is there an advantage that I missed in this particular case?  Are you thinking of portability?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @jeff Moden, Thanks for your extensive answer. I like your suggestions on eliminating the transaction and the session key and using the cte to grab the next record and mark it as 'in process' (in one operation). Very helpful, thanks again!

    @bill Talada: I don't see how you can use a filtered index to process a queue, could you elaborate on that?

  • Sergiy - Thursday, June 8, 2017 12:29 AM

    Most of MS shipped tools carry the birthmarks: inadequate design and careless coding.
    Therefore they all:
    - do not scale;
    - take enormous amount of resources;
    - fail to perform in timely manner.

    They all are good for "home office" kind of environments, but as soon as you get to hundreds of anything (jobs, services, packages, tasks, orchestrations, etc.) those tools tend to commit suicide, and take the whole server down with them.

    So, don't push for Service Broker too hard. You might regret it if succeeded.

    I stopped pushing for it long ago, but not for the reasons you state. Just because some MS tools don't scale well doesn't mean you should avoid them all. And when it comes to Service Broker, I would say it's one of the most underrated tools in the MS arsenal. I used it in a previous job and never had a problem, it processed *millions* of messages per day without a hitch. Do you know of any case where Service Broker failed miserably? I'd like to know.

  • 1) I don't see why READPAST would lead to deadlocking.
    2) I would think you do need a transaction to make the READPAST work properly.  You want the first UPDATE lock in place to prevent other concurrent tasks from attempting to process the same row.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Jeff Moden - Wednesday, June 7, 2017 5:43 PM

    If you end up with something that has a start date without an end date after a certain amount of time, then you have a way to go back to failed runs and re-enter them into the queue just by changing the StartDT to a NULL.  Or, keep the old row so that you keep track of the failures and enter a new row for the same thing.

    So this is probably out of your control (I believe you said something about the actual task processes could be anything), but as Jeff points out his solution avoids blocking with a sort of two-phase commit, but at the cost of forcing an "at most once" semantics. Of course, since you're in such very capable hands, he offers you the way to deal with that through establishing a post-processing monitor to re-queue failed jobs.

    However, I was wondering if there's a way to adopt "at least once" semantics - in other words, not worry so much that the same item could potentially be grabbed by more than one worker. Basically, if you can guarantee that your processes are idempotent, you could probably get away with this. When at least one process successfully completes, the item is flagged as dequeued, but in the meantime a second (or third) worker may have also tried to process it. So you may waste some cpu cycles doing redundant work, but no blocking issues.

    If your processes aren't idempotent, I wonder if this would help avoid the 2nd, 3rd, etc. updates from messing up your state:

    UPDATE ItemQueue SET SessionKey = NEWID() WHERE ItemQueueId = @itemQueueId AND SessionKey IS NULL;

     So you're still allowing redundant processes to fire off, but you limit the state change to the first one to complete.

    Not sure if this helps, but this could possible offer another work around.

  • ScottPletcher - Thursday, June 8, 2017 2:40 PM

    1) I don't see why READPAST would lead to deadlocking.
    2) I would think you do need a transaction to make the READPAST work properly.  You want the first UPDATE lock in place to prevent other concurrent tasks from attempting to process the same row.

    Yes, and yes, and yes.
    Also, don't forget to switch off lock escalation at the table level.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I'd still like to hear someone say why they don't use UPDATE ... OUTPUT to pick an item from the queue without an explicit transaction.

    UPDATE TOP (1) dbo.ItemQueue WITH(READPAST) 
    SET SessionKey = NEWID()
           -- additional columns such as StartTime as needed
    OUTPUT Inserted.* INTO @QueueItem
    WHERE SessionKey IS NULL;

    I'm not concerned with the distinction between OUTPUTting the whole row from the queue table vs just the key value, I want to see comments on any potential pitfalls of using UPDATE (or DELETE) with an OUTPUT clause to pop an item off a queue table in one atomic operation.

  • Scott Coleman - Monday, June 12, 2017 2:50 PM

    I'd still like to hear someone say why they don't use UPDATE ... OUTPUT to pick an item from the queue without an explicit transaction.

    UPDATE TOP (1) dbo.ItemQueue WITH(READPAST) 
    SET SessionKey = NEWID()
           -- additional columns such as StartTime as needed
    OUTPUT Inserted.* INTO @QueueItem
    WHERE SessionKey IS NULL;

    I'm not concerned with the distinction between OUTPUTting the whole row from the queue table vs just the key value, I want to see comments on any potential pitfalls of using UPDATE (or DELETE) with an OUTPUT clause to pop an item off a queue table in one atomic operation.

    I might be missing something but it seems to me that, for the original problem stated in this post, that would require a join to the Table Variable when you wanted to do the final update to the table if the table had start and end date columns as I proposed.  Even without such columns, if this were in a stored procedure as stand-alone functionality and you needed to return the value as an output variable, it would no longer be a single operation.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Thursday, June 8, 2017 7:55 AM

    Bill Talada - Thursday, June 8, 2017 5:01 AM

    I stopped using queue tables eight years ago when 2008R2 came out with filtered indexes (like dBase had 30 years ago).  Let the filtered index be your processing queue.  By having a narrow queue table, it puts hundreds of rows into the same page increasing chances of blocking.  Also, I'm wondering why no one is using update with output since the main advantage of it is to get around this exact problem.  And thirdly, I also despise the use of Service Broker for this simple situation.  Service Broker has some serious architecture to it that is overkill for this.

    Using the sample data that's been provided, can you show us a coded example of the filtered index method you speak of?  I could certainly be wrong but I'm pretty sure that even filtered indexes need to work against a table.

    Also, I didn't use OUTPUT because I didn't see any advantage to having the returned ID data in a table variable or temp table or bothering TempDB for a single item in a necessarily RBAR operation.  Is there an advantage that I missed in this particular case?  Are you thinking of portability?

    This still sounds interesting to me, Bill.  For the life of me, I can't figure out how you'd use a filtered index as a replacement for a queue.  Any chance of some demo code from you on this?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sergiy - Thursday, June 8, 2017 1:15 AM

    In real life there is no threshold for execution duration.

    True enough but, in real life, there is usually an expectation.  At the very least, if similar tasks all took just a second or two to execute and one of them still wasn't done in an hour and you expected them to all run in just a second or two, at least you'd have a method to find the what didn't meet your expectations.

    I used StartTime, but it was not just a GETDATE(), it was indicating when any particular task to be started. The engine was taking only items with StartTime in the past, earlier ones come first.

    Agreed.  I was being lazy and using the QueueItemID as the indication for temporal order.  For simple standalone tasks with no inter task dependencies, there should be a separate column to either identify when the item entered the queue.  At the very least, that would produce the correct order of processing and allow you to calculate latencies.

    For more sophisticated queues that do have inter task dependencies and non linear processing requirements such as having the ability to fork and merge parallel task processing paths and/or respond to the queuing of a high priority task , the queue table would certainly need to be more robust in both what it contains and how it is used.

    As a bit of a sidebar, I find that a lot of people simply make things too complicated and they can frequently avoid the need for such complications by keeping things simple.  For example, I find that a lot of people will build queue tables when all the really need to do is schedule a couple of jobs.

    And there is no place for End Time in this table. EndTime is for a TaskLog table, where the records from TaskQueue are moved into by a FOR UPDATE, DELETE trigger.

    If you have such a need because the queue table was meant to be generic to be able to handle all sorts of different tasks, I absolutely agree. on all points mad and will also add things like the actual error output and maybe even a "step" entry for items having multiple steps.  I've even added parent/child indications so that the entire execution path tree could be viewed.

    I try not to make things that complicated anymore, though.  I'll also offer a disclaimer that the things I do aren't that complicated.  I usually have no need of things like "file listeners" and dependencies are normally resolved at the stored procedure level with code that asks, "Is there anything to do here"?  Almost all of it is ETL, data validation, and post processing.  Almost all of it is done by stored procedures and simple SQL Agjent jobs.  That way, the only thing I really need to worry about is occasionally archiving the run and error logs inherently built into the MSDB database.  The "Task Queue table" for me usually doesn't exist nor does it need to.  I just load up a Temp Table with the names, paths, and dates of files and process them.  Processing also includes moving the successfully processed files off to the archive directories and sending an email as to which files had errors and what the errors were.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, June 12, 2017 5:49 PM

    Scott Coleman - Monday, June 12, 2017 2:50 PM

    I'd still like to hear someone say why they don't use UPDATE ... OUTPUT to pick an item from the queue without an explicit transaction.

    UPDATE TOP (1) dbo.ItemQueue WITH(READPAST) 
    SET SessionKey = NEWID()
           -- additional columns such as StartTime as needed
    OUTPUT Inserted.* INTO @QueueItem
    WHERE SessionKey IS NULL;

    I'm not concerned with the distinction between OUTPUTting the whole row from the queue table vs just the key value, I want to see comments on any potential pitfalls of using UPDATE (or DELETE) with an OUTPUT clause to pop an item off a queue table in one atomic operation.

    I might be missing something but it seems to me that, for the original problem stated in this post, that would require a join to the Table Variable when you wanted to do the final update to the table if the table had start and end date columns as I proposed.  Even without such columns, if this were in a stored procedure as stand-alone functionality and you needed to return the value as an output variable, it would no longer be a single operation.

    This update finds the first queue entry with a NULL SessionKey, updates SessionKey, and returns the selected row in one atomic operation without an explicit transaction.  If you only need QueueItemId, you can use OUTPUT Inserted.QueueItemId INTO @QueueItemId, or you can use SELECT @var = col FROM @QueueItem to unpack the columns from a single-row table variable into as many variables as you need.  None of those actions are going to have locking issues with concurrent processes.  Your suggestion with the SELECT in a CTE and the quirky update to preserve the key value accomplishes the same thing in one statement.

    I did something similar recently where I needed a FIFO queue table to control parallel processes, but had no reason to leave the items in the queue.  I used "DELETE TOP (1) OUPUT Deleted.* ..." to pop items off the queue.  I don't think you can make an UPDATE do the same thing, no matter how quirky.  It worked fine from a C# program in an Command.ExecuteReader method.  Running it from C# also allows use of Thread.Sleep to wait for a retry instead of an active session with WAITFOR DELAY.  If you're really obsessed with efficiency and only need the key value, use OUTPUT Deleted.ItemKey in an ExecuteScalar method and avoid creating the DataReader object.

    A lot of intelligent people have addressed the topic of how to get the next item from the queue without race conditions or excessive blocking.  I was curious why no one suggested using an OUTPUT clause, except another post asking why no one used OUTPUT.  That's my basic question, OUTPUT seems like a good solution for this problem and I'm interested in whether anyone knows of any reason not to do it that way.

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply