Best way to handle a table-based queue?

  • NOTE: This is NOT a Service Broker question. 

    The table defined below is accessed by concurrent jobs that select the next available item and process it. The processing logic may be simple or very complex, depending on the item type, but let's ignore that for the purpose of this question. Item availability is determined by checking for a NULL SessionKey.

    CREATE TABLE ItemQueue(
      ItemQueueId INT IDENTITY PRIMARY KEY,
       ItemId INT NOT NULL,
       ItemType TINYINT NOT NULL,
       SessionKey UNIQUEIDENTIFIER NULL
    );
    CREATE INDEX IX_QueueSessionKey ON ItemQueue(SessionKey);

    Any solution has to avoid deadlocks and make sure no two jobs ever select the same item, I'm thinking of two methods, you're welcome to suggest alternatives as long as it's all plain SQL (i.e. no SSSB, SSIS, CLR, etc).

    Method 1: Select the first row you can get a lock on, process the selected item and update the SessionKey, all in one transaction.
     
    DECLARE @itemQueueId INT;

    BEGIN TRANSACTION;

    SELECT TOP 1 @itemQueueId = ItemQueueId
    FROM  ItemQueue WITH ( UPDLOCK, READPAST )
    WHERE SessionKey IS NULL
    ORDER BY ItemQueueId;

    UPDATE ItemQueue SET SessionKey = NEWID() WHERE ItemQueueId = @itemQueueId;

    COMMIT TRANSACTION;
    /*
    .
    process selected item
    on separate transaction 
    .
    */

    Method 2: Update one row with a locally generated SessionKey, get the ItemId for the updated row and process that item. (an OUTPUT clause could be used too, same idea). 
     
    DECLARE @sessionKey UNIQUEIDENTIFIER = NEWID();
    DECLARE @itemQueueId INT;

    BEGIN TRANSACTION;

    UPDATE TOP(1) ItemQueue
    SET SessionKey = @sessionKey
    WHERE SessionKey IS NULL;

    SELECT @itemQueueId = ItemQueueId 
    FROM ItemQueue
    WHERE SessionKey = @sessionKey;

    COMMIT TRANSACTION;
    /*
    .
    process selected item
    on separate transaction 
    .
    */

    QUESTION: Which method is better? (you decide what "better" means)

  • In either case, unless "do some work here" is for all practical purposes trivial and not in any way time consumptive, I'd keep the updates as separate transactions, and I'd prefer the method that updates with the new session key value before anything else happens.   And that's assuming you also adopt the idea of committing that transaction before doing any other work, however trivial (or not),  It would be "better" in terms of the level of concurrency possible to obtain.   Such may or may not be important to your situation.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Good point, I updated my post so the processing is done on a separate transaction in both cases. That way I think the complexity of the work to be done becomes irrelevant.

  • Sam-263310 - Tuesday, June 6, 2017 9:53 AM

    QUESTION: Which method is better? (you decide what "better" means)

    Have you considered using the create sequence statement? This would let you number the positions in your queue, guarantee that there always unique, and make the code for changing the status of acute item pretty easy.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Sam-263310 - Tuesday, June 6, 2017 9:53 AM

    NOTE: This is NOT a Service Broker question. 

    I have a table (ItemQueue) with items to be processed by one or more jobs. The table has a column (SessionKey, uniqueidentifier) that's initially null. I'm thinking of two methods for selecting items for processing so that no two jobs ever select the same item:

    Method 1: Select the first row you can get a lock on, process the selected item and update the SessionKey, all in one transaction.
     
    DECLARE @itemId INT;

    BEGIN TRANSACTION;

    SELECT TOP 1 @itemId = ItemId
    FROM  ItemQueue WITH ( UPDLOCK, READPAST )
    WHERE SessionKey IS NULL
    ORDER BY ItemId;

    UPDATE ItemQueue SET SessionKey = NEWID() WHERE ItemId = @itemId;

    COMMIT TRANSACTION;
    /*
    .
    process selected item
    on separate transaction 
    .
    */

    Method 2: Update one row with a locally generated SessionKey, get the ItemId for the updated row and process that item. (an OUTPUT clause could be used too, same idea). 
     
    DECLARE @sessionKey UNIQUEIDENTIFIER = NEWID();

    BEGIN TRANSACTION;

    UPDATE TOP(1) ItemQueue
    SET SessionKey = @sessionKey
    WHERE SessionKey IS NULL;

    SELECT @itemId = ItemId 
    FROM ItemQueue
    WHERE SessionKey = @sessionKey;

    COMMIT TRANSACTION;
    /*
    .
    process selected item
    on separate transaction 
    .
    */

    QUESTION: Which method is better? (you decide what "better" means)

    The answer is "NEITHER".  They're both a classic recipe for deadlocks and I'm speaking from some pretty serious experience here.

    Can you post the CREATE TABLE along with its indexes and constraints, please?  I need to know several things about that table and that will be a whole lot easier than asking the proverbial "20 questions".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jcelko212 32090 - Tuesday, June 6, 2017 3:21 PM

    Have you considered using the create sequence statement? This would let you number the positions in your queue, guarantee that there always unique, and make the code for changing the status of acute item pretty easy.

    I apologize for the lack of DDL in my original posting, I've edited the whole thing to make it more clear. I don't quite follow the logic in your answer but then again it's all my fault for not being more explicit. The main problem I'm facing is the concurrency of the jobs accessing the table.

  • Jeff Moden - Tuesday, June 6, 2017 9:27 PM

    Can you post the CREATE TABLE along with its indexes and constraints, please?  I need to know several things about that table and that will be a whole lot easier than asking the proverbial "20 questions".

    Sorry about that, I edited my original posting to include the DDL. There's a PK and an index on SessionKey. I followed sgmunson's advice about separating the selection from the actual processing by putting them on different transactions, so I don't have to worry about the complexity of the processing logic (which doesn't access the ItemQueue table at all). I do worry about potential deadlocks due to locks acquired on the table and the index, but I don't know enough about that (hence my cry for help). If needed I could get rid of the index on SessionKey, the queue can be purged periodically to keep it small.

  • Sam-263310 - Tuesday, June 6, 2017 9:53 AM

    NOTE: This is NOT a Service Broker question. 

    The table defined below is accessed by concurrent jobs that select the next available item and process it. The processing logic may be simple or very complex, depending on the item type, but let's ignore that for the purpose of this question. Item availability is determined by checking for a NULL SessionKey.

    CREATE TABLE ItemQueue(
      ItemQueueId INT IDENTITY PRIMARY KEY,
       ItemId INT NOT NULL,
       ItemType TINYINT NOT NULL,
       SessionKey UNIQUEIDENTIFIER NULL
    )

    Any solution has to avoid deadlocks and make sure no two jobs ever select the same item, I'm thinking of two methods, you're welcome to suggest alternatives as long as it's all plain SQL (i.e. no SSSB, SSIS, CLR, etc).

    Method 1: Select the first row you can get a lock on, process the selected item and update the SessionKey, all in one transaction.
     
    DECLARE @itemId INT;

    BEGIN TRANSACTION;

    SELECT TOP 1 @itemId = ItemId
    FROM  ItemQueue WITH ( UPDLOCK, READPAST )
    WHERE SessionKey IS NULL
    ORDER BY ItemId;

    UPDATE ItemQueue SET SessionKey = NEWID() WHERE ItemId = @itemId;

    COMMIT TRANSACTION;
    /*
    .
    process selected item
    on separate transaction 
    .
    */

    Method 2: Update one row with a locally generated SessionKey, get the ItemId for the updated row and process that item. (an OUTPUT clause could be used too, same idea). 
     
    DECLARE @sessionKey UNIQUEIDENTIFIER = NEWID();

    BEGIN TRANSACTION;

    UPDATE TOP(1) ItemQueue
    SET SessionKey = @sessionKey
    WHERE SessionKey IS NULL;

    SELECT @itemId = ItemId 
    FROM ItemQueue
    WHERE SessionKey = @sessionKey;

    COMMIT TRANSACTION;
    /*
    .
    process selected item
    on separate transaction 
    .
    */

    QUESTION: Which method is better? (you decide what "better" means)

    You have a PK on the table.
    So, that's what you should be using to identify a row in it.

    DECLARE @ItemQueueIdItemQueueId INT, @rc int = 0;

    WHILE @rc = 0 
    BEGIN 
    SELECT TOP 1 @ItemQueueId = ItemQueueId
    FROM  ItemQueue 
    WHERE SessionKey IS NULL;

    UPDATE ItemQueue
    SET SessionKey = NEWID()
    WHERE ItemQueueId= @ItemQueueId
    AND SessionKey IS NULL;
    SELECT @rc = @@ROWCOUNT
    END 

    _____________
    Code for TallyGenerator

  • Sergiy - Wednesday, June 7, 2017 7:36 AM

    You have a PK on the table.
    So, that's what you should be using to identify a row in it.

    DECLARE @ItemQueueIdItemQueueId INT, @rc int = 0;

    WHILE @rc = 0 
    BEGIN 
    SELECT TOP 1 @ItemQueueId = ItemQueueId
    FROM  ItemQueue 
    WHERE SessionKey IS NULL;

    UPDATE ItemQueue
    SET SessionKey = NEWID()
    WHERE ItemQueueId= @ItemQueueId
    AND SessionKey IS NULL;
    SELECT @rc = @@ROWCOUNT
    END 

    Good point there, using the PK to select the item is the right thing to do, I updated the original posting. But the logic you suggest has a little problem: if there are no items to process it will stay in the loop until a new item arrives, and that's a waste of CPU time.

  • Yes, it's not a complete solution, of course.

    You still need to have an endless loop, as you want to pick a new item in the queue as soon as it's arrived, but you may wish to WAITFOR DELAY if there are no items with SessionKey is null.

    Another thing you need to take care of - abandoned sessions.

    If for some reason processing was not finished for an item (error, connection dropped, or something else) the item will stay with assigned SessionKey of the session which is long gone.

    You need to be able to verify if the session which started processing any particular item is still alive and running.

    For that you're gonna need a real SessionKey, not a useless GUID generated by NEWID().

    _____________
    Code for TallyGenerator

  • I feel compelled to ask the question:  why is it NOT a service broker question?

    You're essentially trying to recreate what SB does. Just curious why we are looking to avoid SB in this case.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (4) - Wednesday, June 7, 2017 8:53 AM

    I feel compelled to ask the question:  why is it NOT a service broker question?

    You're essentially trying to recreate what SB does. Just curious why we are looking to avoid SB in this case.

    Service Broker is not allowed where I work. I tried to argue about it a few years ago, you can tell how successful I was. Sigh.....

  • Sam-263310 - Wednesday, June 7, 2017 9:28 AM

    Matt Miller (4) - Wednesday, June 7, 2017 8:53 AM

    I feel compelled to ask the question:  why is it NOT a service broker question?

    You're essentially trying to recreate what SB does. Just curious why we are looking to avoid SB in this case.

    Service Broker is not allowed where I work. I tried to argue about it a few years ago, you can tell how successful I was. Sigh.....

    You're not alone there.  I have the same restriction placed on me.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... first, there is no need for an explicit transaction.  Having an explicit transaction for the two examples is a classic setup on a table for a pot wad of deadlocks.

    Second, there's no need for an session key (GUID, Sequence, or whatever) if you simply use the current identity column in the table as the session id.

    Third, you need to be able to detect failed or long running sessions.  The only way to do that is with a start and end date.

    Last but not least, you want this to be as fast as possible so both indexes are ever-increasing and narrow.

    Here's your table modified to accommodate all of the above and populate it with some fictitious example data. 


    --===== If the test table already exists, drop it make reruns in SSMS easier.
         IF OBJECT_ID('tempdb..#ItemQueue','U') IS NOT NULL
       DROP TABLE #ItemQueue
    ;
    GO
    --===== Create the test table and the given indexes
     CREATE TABLE #ItemQueue
            (
            ItemQueueId INT                 IDENTITY(1,1) PRIMARY KEY,
            ItemId      INT                 NOT NULL,
            ItemType    TINYINT             NOT NULL,
            StartDT     DATETIME            NULL,
            EndDT       DATETIME            NULL
            )
    ;
    CREATE INDEX IX_QueueWIP ON #ItemQueue(StartDT,ItemQueueId)
    ;
    --===== Do the initial population with sufficiently random test data
     INSERT INTO #ItemQueue
            (ItemId,ItemType)
     SELECT TOP 10000
             ItemID   = ABS(CHECKSUM(NEWID())%500)+1000
            ,ItemType = ABS(CHECKSUM(NEWID())%10)+1
       FROM         sys.all_columns ac1
      CROSS APPLY   sys.all_columns ac2
    ;

    Here's a runnable example (setup 3 or 4 windows with the same code and a whole lot more rows and run it all at the same time in a loop to see it in real action) that will seriously help you avoid deadlocks on the queue table.  I kept it simple just to demonstrate the principle.  Modify to your heart's content.  Just don't ever use an explicit transaction (or multi-part transaction if your server is setup that way) on the queue table.  If you end up with something that has a start date without an end date after a certain amount of time, then you have a way to go back to failed runs and re-enter them into the queue just by changing the StartDT to a NULL.  Or, keep the old row so that you keep track of the failures and enter a new row for the same thing.


    --===== Local Variables
    DECLARE  @ItemQueueId   INT
            ,@StartDT       DATETIME --not necessary but gives nice warm fuzzies that it worked.
    ;
    --===== Get an assignment from the queue.  This does an update for the StartDT
         -- and simultaneouly assigns the ItemQueueID to our working variable.
       WITH cteAssign AS
    (
     SELECT TOP (1)
             StartDT
            ,ItemQueueId
       FROM #ItemQueue WITH (READPAST) --Increases concurrency a bit
      WHERE StartDT IS NULL
      ORDER BY ItemQueueId --Forces FIFO
    )
     UPDATE tgt
        SET  @StartDT       = tgt.StartDT       = GETDATE()
            ,@ItemQueueId   = tgt.ItemQueueId
       FROM cteAssign tgt
    ;
    --===== Do some task here. In this case, we'll just display the assignment variables.
     SELECT  ItemQueueId = @ItemQueueId
            ,StartDT     = @StartDT
    ;
    --===== Once done with the task, mark the end date
     UPDATE tgt
        SET EndDt = GETDATE()
       FROM #ItemQueue tgt
      WHERE tgt.ItemQueueId = @ItemQueueId
    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sam-263310 - Wednesday, June 7, 2017 9:28 AM

    Matt Miller (4) - Wednesday, June 7, 2017 8:53 AM

    I feel compelled to ask the question:  why is it NOT a service broker question?

    You're essentially trying to recreate what SB does. Just curious why we are looking to avoid SB in this case.

    Service Broker is not allowed where I work. I tried to argue about it a few years ago, you can tell how successful I was. Sigh.....

    Most of MS shipped tools carry the birthmarks: inadequate design and careless coding.
    Therefore they all:
    - do not scale;
    - take enormous amount of resources;
    - fail to perform in timely manner.

    They all are good for "home office" kind of environments, but as soon as you get to hundreds of anything (jobs, services, packages, tasks, orchestrations, etc.) those tools tend to commit suicide, and take the whole server down with them.

    So, don't push for Service Broker too hard. You might regret it if succeeded.

    _____________
    Code for TallyGenerator

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply