Deadlocking Problems

  • I have a table which contains tasks which is causing me a deadlocking problem.

    A stored procedure determines whether a task (a summarisation of data) is needed to fulfill the request and if so it adds a record to the table. The proc then waits until the task is complete and then continues and returns the data to the client app.

    I have a number of batch jobs which regularly check the table to determine if there are any new tasks which have status of S (for Scheduled).

    If there is a new task, the job would change the status to I (In Progress), execute another procedure to summarise the data (which can take anywhere from sub second to 30 seconds depending on the complexity) and store it in another table and then update the status to C (Complete).

    I have been watching the deadlocks using Profiler and also DBCC TRACEON(1204,3605,-1) and have found that any statement that changes the status of the task can be involved in a deadlock with another statement which either updates the status or a plain select statement.

    Within the batch processing, I handle deadlocks by resetting the status for long running summarisations and letting a batch job try and summarise the data again. However, the client app just reports the deadlock and the user has to resubmit the request.

    How can I avoid deadlocking problems?

    I have tried (without success):

    - SET DEADLOCK_PRIORITY LOW

    - SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

    - using with (rowlock) in the update statements

    - begin transaction .... end transaction around various SQL Statements

    Has anyone got any other ideas?

    Jeremy

  • quote:


    I have been watching the deadlocks using Profiler and also DBCC TRACEON(1204,3605,-1) and have found that any statement that changes the status of the task can be involved in a deadlock with another statement which either updates the status or a plain select statement.


    I may be just shooting arrows in the dark.

    But are the columns that are being used in the where clause used while updating the

    data indexed.

    Pay Respect to People on your way up. For you will meet the same People on your way down.


    He who knows others is learned but the wise one is one who knows himself.

  • Yes there is an index on the status field. There are approx 290,000 rows in the table and I need a quick way of finding if there is a new row in the table with a status of S.

    Jeremy

  • Deadlocks are tough. Can you post the output of the deadlock from the log? One way to avoid is to see if you can avoid all locks by setting your isolation level to read uncommitted or using the "with (nolock)" hint. Use appropriate care of course. Using the updlock hint can also help. Both are bandaids, if possible you want to change your indexes and/or query access order to avoid the issue to start with. Short, quick transactions also help.

    In the case of doing the summary, not holding locks probably not going to hurt you much.

    By number of batch jobs, you mean separate jobs that may run concurrently?

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • I am doubtful if the index is being used while u run u'r update statements.Check out the execution plan.

    Can u by any chance move u'r data with status='C' to another table.This might make u'r table a little light.

    Pay Respect to People on your way up. For you will meet the same People on your way down.


    He who knows others is learned but the wise one is one who knows himself.

  • Andy:

    Yes I have several jobs that run concurrently but I have tried to space them out by using WAITFOR DELAY (nn.nn) so that each second only one job should determine whether there is a new row in the table.

    This is the output from the log for one of the deadlocks:

    Deadlock encountered .... Printing deadlock information

    2003-09-15 09:42:02.40 spid4

    2003-09-15 09:42:02.40 spid4 Wait-for graph

    2003-09-15 09:42:02.40 spid4

    2003-09-15 09:42:02.40 spid4 Node:1

    2003-09-15 09:42:02.40 spid4 KEY: 11:1538416850:4 (1107b80f8a57) CleanCnt:1 Mode: U Flags: 0x0

    2003-09-15 09:42:02.40 spid4 Grant List 1::

    2003-09-15 09:42:02.40 spid4 Owner:0x2aedaf00 Mode: U Flg:0x0 Ref:1 Life:00000001 SPID:61 ECID:0

    2003-09-15 09:42:02.40 spid4 SPID: 61 ECID: 0 Statement Type: UPDATE Line #: 25

    2003-09-15 09:42:02.40 spid4 Input Buf: Language Event: /* Online - Prod 1 */

    .....

    If then prints out the first few lines of the job. This is the second part of the deadlock info:

    2003-09-15 09:42:02.40 spid4 Requested By:

    2003-09-15 09:42:02.40 spid4 ResType:LockOwner Stype:'OR' Mode: X SPID:60 ECID:0 Ec:(0x27E615A0) Value:0x2aef24e0 Cost:(0/140)

    2003-09-15 09:42:02.40 spid4

    2003-09-15 09:42:02.40 spid4 Node:2

    2003-09-15 09:42:02.40 spid4 KEY: 11:1538416850:1 (6406f8177ad1) CleanCnt:1 Mode: X Flags: 0x0

    2003-09-15 09:42:02.40 spid4 Grant List 1::

    2003-09-15 09:42:02.40 spid4 Owner:0x2ae001e0 Mode: X Flg:0x0 Ref:0 Life:02000000 SPID:60 ECID:0

    2003-09-15 09:42:02.40 spid4 SPID: 60 ECID: 0 Statement Type: UPDATE Line #: 84

    2003-09-15 09:42:02.40 spid4 Input Buf: Language Event: /* Online - Prod 2 */

    ....

    and again it prints out the first few lines from the job.

    I have tracked down the procedures with the relevant line numbers which is how I know which bits are involved in the deadlock.

    brokenrulz:

    I hadn't considered different tables before but I'm not sure that it would be that easy to implement. For each summarisation, I have split it into two parts - one which makes sure that the row exists in the table and a second part which checks whether the row has a status of Complete. One reason for this is that I can then do some other processing (like submit other summarisation tasks) before checking whether the first task is complete which means I can multistream any requests.

    I'll bear it in mind incase I cannot find an alternative solution.

    Jeremy

  • brokenrulz,

    I have checked the query plan and the index is being used in the query.

    I accept that having the status column in the index and changing the value the column will create a certain amount of work including 'moving' the index pointer from one part of the index to another part.

    Jeremy

  • quote:


    Andy:

    For each summarisation, I have split it into two parts - one which makes sure that the row exists in the table and a second part which checks whether the row has a status of Complete. One reason for this is that I can then do some other processing (like submit other summarisation tasks) before checking whether the first task is complete which means I can multistream any requests.

    I'll bear it in mind incase I cannot find an alternative solution.

    Jeremy


    Just a curiosity, How do you check if data exists?

    Pay Respect to People on your way up. For you will meet the same People on your way down.


    He who knows others is learned but the wise one is one who knows himself.

  • The client app allows users to make selections from 5 different list boxes. I store these values in the table so that when I ask the question of whether the data exists what I actually do is check whether there is a row in the table with this set of 5 values. If there isn't a row with the 5 values I insert a row but if there is a row already there then the application has already requested that the data be summarised.

    This all came about because of performance problems. We run some very big batch runs every month and we created a table each month with over 1 billion rows. The queries ran quite quickly but it took the best part of a week to run. Of these 1 billion plus rows, less that 10% were every used so there was a lot of processing for very little benefit. We implemented horizontal partioning with a separate table for each year and although this made some improvements there was still a huge processing overhead.

    We came up with the idea of summarising the data 'on demand' which cut the size of the tables by 90% and reduced the processing by about 70% so that the run now takes a couple of days. Some of the queries take longer as they have to wait for the summarisation but most users are happy to get the data 3 days sooner and wait 30 seconds for a report than a couple of seconds.

    I think that an OLAP solution would work well(as it would do all the summarisations) but I have not yet been able to do everything I need to do using MDX so in the meantime I have to persevere with my current solution.

    Jeremy

  • How about serializing the jobs? Either combine into one, or have each check to make sure no other is running before proceeding.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • Jeremy, to me the problem is in your opening statement. The proc starts, sets a flag, the WAITS for another process to run. Since you have two separate processes running in separate transactions, you are asking for a deadlock, aren't you? Help me understand this. If it is crucial to return the information, I would recommend making these stored procedures (sounds like there might be timeout issues). We have some processes that are similar and just use a notification when the background processes are complete.

    Guarddata-

  • Andy:

    I don't serialise the jobs because it would take too long to create the data. The data is split by year and the client app can request data for 8 different years for a single report. I currently run this on a 2 processor box so by running the multiple batch jobs I can complete the summarisation in about half the elapsed time. Early next year I hope to migrate the application to a 4 processor box and so could complete the summarisation in about a quarter of the time compared with single streaming the jobs.

    Guarddata:

    Why am I asking for a deadlock? One process inserts row in the table with a certain status and then waits for another process to change the status. What seems to be happening is that when the second process updates the status and the first process checks the status SQL Server reports a deadlock.

    IMHO this is not a deadlock but a deadheat. Surely SQL Server can serialise these two types of requests? Accoring to BOL:

    If several connections become blocked waiting for conflicting locks on a single resource, the locks are granted on a first-come, first-serve basis as the preceding connections free their locks.

    To me this means that SQL Server should let one process complete before letting the other one continue. This is different to a deadlock where one process is dependant on rows locked by another process and vice versa.

    If I am truely getting deadlocks, then there must be something else going on which either I don't understand (quite possible) or I haven't discovered (also quite possible). If the former then I hope I have come to the right place to learn more. If the latter, how do I discover what is really happening?

    I have reinstated the SET DEADLOCK_PRIORITY LOW option for the batch jobs so that they are more likely to be cancelled than the client application which at least reduces the calls to the help desk but doesn't really solve the problem.

    Jeremy

  • Hmmm... that's rather complicated and I'm not sure I understand everything about the processes correctly. Just an idea - status of the row gets updated after the procedure summarizing the data is finished (to 'C' if successful, to 'S' if there were errors to try again), is that so? Well, did you try to use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED while checking for the status (set it before the check and re-set default as soon as the check is done)? This way the select statement should not get involved in any deadlocks... As long as the row with status gets updated in a separate transaction, not as a part of summarizing or any other task, it should work fine.

    Oh, OK, I realize now that Andy mentioned that already... but since you didn't reply to that specifically, I'll post it anyway.

  • Vladan,

    I always find it difficult to understand everything that is going on with other people's systems when all you get is a few lines in a post. I suspect that if I put the whole scenario in the post, it would be so big that it would put off a lot of people from reading it so I try and just put in the important information.

    I have put in SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED and it seems to be giving some benefits. I need to work out all the places where I need to put it - for someone who has tried to reuse as much code as possible, I seem to have a lot of references to this one table

    Jeremy

  • One other question. if I turn on SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED within a stored procedure and I want to turn it off, what is the best?

    Jeremy

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply