Deadlock Problem on a Heap Table

  • I came across an older article, http://sqlblog.com/blogs/jonathan_kehayias/archive/2008/11/25/anatomy-of-a-deadlock-part-deux.aspx

    , while trying to find a solution to a deadlock problem I am seeing in my environment. My situation seems very close to what is described in the article. However, I have tried a number of things and am still seeing deadlocks.

    This problem is occurring on a SQL Server 2012 Standard instance that is on SP1. Initially the UPSTracking table was a heap with a quarter million forwarded record fetches at the time I first gathered index stats. There was an identity field but it was not defined as a PK. The locking was happening on dbo.UPSTracking.IX_UPSTracking_1, which was a non-clustered index on just the TrackingNumber field. I tried making a clustered PK as follows:

    ALTER TABLE dbo.UPSTracking ADD CONSTRAINT PK_ID PRIMARY KEY CLUSTERED (ID, TrackingNumber)

    I still saw the same blocking. I then dropped that constraint, reapplied the PK constraint just to the ID column and made it non-clustered. I then created a clustered key on TrackingNumber. I still saw the blocking.

    I then reversed the initial clustered PK, placing TrackingNumber first. I also dropped the non-clustered index IX_UPSTracking on just TrackingNumber since that seemed redundant to the new clustered PK and another existing index IX_UPSTracking on TrackingNumber, PackageActivityDateTime. I’m still seeing the blocking. Now the deadlocks are on the IX_UPSTracking index of TrackingNumber,PackageActivityDateTime.

    Please see attached data on the deadlock itself, the indexes and the DDL for the table.

    Any suggestions would be appreciated.

  • 1) BLOCKING is a normal part of transactional processing in SQL Server. It is not something that can be avoided. LONG-TERM blocking is another issue, and is often the result of poor programming, poor schema, poor indexing, etc.

    2) Your current choice of Primary Key will allow DUPLICATE TrackingNumber values. Given that you were able to create a PK on just the TrackingNumber field I bet you do not want duplicates in that table.

    3) Can you post up the code that is involved in the deadlocks you are seeing? Often there are multiple sets of code that access the data in flip-flop order (read/update in one, update/read in another).

    4) If TrackingNumber IS unique (or desired to be so) and the ID vaue from this table isn't needed in subordinate tables and you don't have much need for other nonclustered indexes I would consider making it the clustered PK and fix the code that is messing up the access and causing the deadlock.

    5) That RequiresProcessing field smells to me like a field that has the vast majority of rows with a "complete" value and a few that actually need processing. If so and you don't have an index on that there is your problem. You are table scanning for rows to process and if the only index you have on the table is a clustered PK then no matter what it is you are screwed. You MUST have seeks on nc indexes to be efficient with OLTP processing and to lock the most granular things you can.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (12/23/2016)


    1) BLOCKING is a normal part of transactional processing in SQL Server. It is not something that can be avoided. LONG-TERM blocking is another issue, and is often the result of poor programming, poor schema, poor indexing, etc.

    2) Your current choice of Primary Key will allow DUPLICATE TrackingNumber values. Given that you were able to create a PK on just the TrackingNumber field I bet you do not want duplicates in that table.

    3) Can you post up the code that is involved in the deadlocks you are seeing? Often there are multiple sets of code that access the data in flip-flop order (read/update in one, update/read in another).

    4) If TrackingNumber IS unique (or desired to be so) and the ID vaue from this table isn't needed in subordinate tables and you don't have much need for other nonclustered indexes I would consider making it the clustered PK and fix the code that is messing up the access and causing the deadlock.

    5) That RequiresProcessing field smells to me like a field that has the vast majority of rows with a "complete" value and a few that actually need processing. If so and you don't have an index on that there is your problem. You are table scanning for rows to process and if the only index you have on the table is a clustered PK then no matter what it is you are screwed. You MUST have seeks on nc indexes to be efficient with OLTP processing and to lock the most granular things you can.

    #2 - I have created a PK on just the ID column, on (ID, TrackingNumber), and (TrackingNumber, ID). I also made TrackingNumber a non-unique clustered index. I have not attempted to create a PK clustered index on just TrackingNumber.

    #3 - I thought I added the SQL that was being executed during the deadlock. Sorry. Here it is now.

    #4. I can look into that shortly.

    #5. The RequiresProcessing field has 291,081 rows out of 425,852 that have a 1 for the bit value. So, most of the values in that column do need to be processed. At least, that is true at the moment I checked.

  • Here is the estimated SQL Plan for the INSERTUPDATE proc that is deadlocking. This is the plan I get when I pass into it a TrackingNumber that does not exist in the table and a packageactivitydatetime value that is just a few minutes before the current GETDATE() value. There is a seek on the current clustered PK and on the non-clustered index IX_UPSTracking.

  • Sorry to tell you this, but that code is just horrible.

    1) Why is it passing in datetime values as varchars?? You simply MUST make ALL of those datetimes datatypes. Period.

    2) You have no transactioning and no HOLDLOCK in this sproc, thus you are exposed to bad data operations where in the very small amount of time between one sproc doing the IF EXISTS another does the same and INSERTs. This will cause a PK violation.

    3) Your indexing still allows duplicate TrackingNumbers. You MUST make that the clustered PK and remove the other index with TrackingNumber and PackageActivityDatetime. With that, all datetimes parameters actually datetime parameters instead of varchars AND a WITH (ROWLOCK, HOLDLOCK) hint on the IF EXISTS you should be OK. I say should because you could still have other code that messes up the order of access and also any other indexes could get hosed with all those COALESCE's on the UPDATE.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (12/28/2016)


    Sorry to tell you this, but that code is just horrible.

    1) Why is it passing in datetime values as varchars?? You simply MUST make ALL of those datetimes datatypes. Period.

    2) You have no transactioning and no HOLDLOCK in this sproc, thus you are exposed to bad data operations where in the very small amount of time between one sproc doing the IF EXISTS another does the same and INSERTs. This will cause a PK violation.

    3) Your indexing still allows duplicate TrackingNumbers. You MUST make that the clustered PK and remove the other index with TrackingNumber and PackageActivityDatetime. With that, all datetimes parameters actually datetime parameters instead of varchars AND a WITH (ROWLOCK, HOLDLOCK) hint on the IF EXISTS you should be OK. I say should because you could still have other code that messes up the order of access and also any other indexes could get hosed with all those COALESCE's on the UPDATE.

    Well, this is inherited code. I can go back to the developer(s) and make suggestions.

    1. I don't know why it is passing the datetime values as varchars.

    2. Your concern about PK violations makes sense. I will see about implementing your suggestions.

    3. I did look at the TrackingNumber values in the table and they are all unique. I will go to the developers and confirm, but I don't see why I couldn't make it the PK and have it be clustered. Based on what I saw, the ID column in UPSTracking is not used anywhere else, as you asked about previously. I'm assuming then it was just in this table as a PK value.

    Since I do have an identity column, that's why I created this different clustered PK of (TrackingNumber, ID). Before it was just on ID. However, after the change to the new clustered PK I was still getting deadlocks on the NCI (TrackingNumber, PackageActivityDatetime). Is that why you're telling me to get rid of that NCI so it will use the clustered PK for its searches?

    If I implement your other suggestions but keep the clustered PK of (TrackingNumber, ID) so that is the only index involving TrackingNumber, will that achieve the goal? I'm just hoping to understand why the TrackingNumber has to be the clustered PK by itself. Is it necessary to have TrackingNumber as the only column in the clustered PK because that is the only way SQL Server will know it is searching through unique values, which would change the way the locks are taken? not doubting your expertise, just trying to understand so the next time I see something like this I not only know what to do but why.

  • lmarkum (12/28/2016)


    TheSQLGuru (12/28/2016)


    Sorry to tell you this, but that code is just horrible.

    1) Why is it passing in datetime values as varchars?? You simply MUST make ALL of those datetimes datatypes. Period.

    2) You have no transactioning and no HOLDLOCK in this sproc, thus you are exposed to bad data operations where in the very small amount of time between one sproc doing the IF EXISTS another does the same and INSERTs. This will cause a PK violation.

    3) Your indexing still allows duplicate TrackingNumbers. You MUST make that the clustered PK and remove the other index with TrackingNumber and PackageActivityDatetime. With that, all datetimes parameters actually datetime parameters instead of varchars AND a WITH (ROWLOCK, HOLDLOCK) hint on the IF EXISTS you should be OK. I say should because you could still have other code that messes up the order of access and also any other indexes could get hosed with all those COALESCE's on the UPDATE.

    Well, this is inherited code. I can go back to the developer(s) and make suggestions.

    1. I don't know why it is passing the datetime values as varchars.

    2. Your concern about PK violations makes sense. I will see about implementing your suggestions.

    3. I did look at the TrackingNumber values in the table and they are all unique. I will go to the developers and confirm, but I don't see why I couldn't make it the PK and have it be clustered. Based on what I saw, the ID column in UPSTracking is not used anywhere else, as you asked about previously. I'm assuming then it was just in this table as a PK value.

    Since I do have an identity column, that's why I created this different clustered PK of (TrackingNumber, ID). Before it was just on ID. However, after the change to the new clustered PK I was still getting deadlocks on the NCI (TrackingNumber, PackageActivityDatetime). Is that why you're telling me to get rid of that NCI so it will use the clustered PK for its searches?

    If I implement your other suggestions but keep the clustered PK of (TrackingNumber, ID) so that is the only index involving TrackingNumber, will that achieve the goal? I'm just hoping to understand why the TrackingNumber has to be the clustered PK by itself. Is it necessary to have TrackingNumber as the only column in the clustered PK because that is the only way SQL Server will know it is searching through unique values, which would change the way the locks are taken? not doubting your expertise, just trying to understand so the next time I see something like this I not only know what to do but why.

    The separate and completely unnecessary nonclustered index is just another thing that needs locks/latches/etc.

    Since you have another field in addition to the TrackingNumber as part of the PK the optimizer no longer has an iron-clad GUARANTEE that there can only be ONE ROW with a given TrackingNumber. That can absolutely have an effect on the optimization process and on how things are done for read access and DML. I didn't dig deep enough to know if it is part of your issue here because there are much bigger fish to fry. Then we have the extra 4 bytes of storage that is a waste. Plus it is simply wrong to have both fields as the PK. I could put a billion rows with the same TrackingNumber into that table and not a thing would prevent that ... and your system would be totally hosed.

    While you are talking with the developers please ask them why they get updates that are older than the current record. Stop that from happening and this code gets even simpler because that silly date predicate goes away.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • lmarkum (12/23/2016)


    This problem is occurring on a SQL Server 2012 Standard instance that is on SP1.

    There were some really nasty problems in SQL Server 2012 prior to SP3. Things like rebuilding an index online could corrupt the table and other things that were available in all editions. My recommendation would be to stop whatever you're working on and update to the latest SP/CU.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You need to fix this issue:

    CONVERT(datetime,PackageActivityDateTime, 120) < CONVERT(datetime,@PackageActivityDateTime, 120)

    The column should already be a datetime data type so it should not need to be converted. And why are you only updating data on all rows prior to the current row?

    If you change this to a MERGE it may process better and avoid the issue with your deadlocks altogether. At least that is something I would test to see if it works better. It should not have any performance issues since it is a one row insert/update process.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • When you're talking to the developer, on a side note, ask them - have they ever heard about data normalization?

    That boring concept really helps in preventing deadlocks.

    _____________
    Code for TallyGenerator

  • @lmarkum,

    There are a couple of issues that are just wasting a lot of CPU time.

    The first is the problem of the store procedure variables that involve dates being setup as VARCHAR(20). They should be setup as DATETIME so the proc doesn't have to do any conversions within the proc. Once you've changed the input variables for the date/time columns to DATETIME, remove all the CONVERTs for those columns everywhere in the code.

    A potentially much more serious problem is that of the TrackingNumber itself. Even if it is unique (and I believe it probably is), if it's also not "Ever Increasing", then you end up with doing "mid-index" inserts on the clustered index and that causes page splits during the inserts. [font="Arial Black"]You need to identify if newly inserted rows will have an ever increasing tracking number and get back to us.[/font]

    Another potentially very serious problem with all of this is the idea that a lot of the rows will suffer a lot of updates. Whether or not the TrackingNumber is ever-increasing or not, ALL of the columns are VARCHAR() columns. ANY new information added to a previously NULL entry has the huge potential of causing a very expensive page split (and who the hell in their right mind would make a VARCHAR(1) or even a VARCHAR(3) column???). Since the rows in this table suffer a whole lot of updates because of the nature of package tracking, my recommendation would be to trade-off some disk space for performance by changing all the columns to a fixed width, which would totally eliminate the possibility of page splits. That means ensuring that all columns in the input variables in the proc and in the table itself that are date/time based should be DATETIME. All columns that will contain only whole numbers should be changed to an integer datatype (right sized, of course). All columns that contain character based data should be change to CHAR() rather than VARCHAR() and that includes the wider columns such as the VARCHAR(120) columns. Yep... it's going to take a more disk space but it will prevent any and all page splits caused by UPDATEs. The INSERTs will still be a problem if the TrackingNumber isn't "ever-increasing" [font="Arial Black"]so you really need to get back to us on that[/font]. THAT will determine whether the Clustered index needs to be on the TrackingNumber (which would be extremely beneficial, in this case) or if it needs to be on the IDENTITY column to prevent page splits if the TrackingNumber isn't "ever-increasing".

    Last and certainly not the least, we need to know if there are any Foreign Keys on this table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/30/2016)


    @lmarkum,

    There are a couple of issues that are just wasting a lot of CPU time.

    The first is the problem of the store procedure variables that involve dates being setup as VARCHAR(20). They should be setup as DATETIME so the proc doesn't have to do any conversions within the proc. Once you've changed the input variables for the date/time columns to DATETIME, remove all the CONVERTs for those columns everywhere in the code.

    A potentially much more serious problem is that of the TrackingNumber itself. Even if it is unique (and I believe it probably is), if it's also not "Ever Increasing", then you end up with doing "mid-index" inserts on the clustered index and that causes page splits during the inserts. [font="Arial Black"]You need to identify if newly inserted rows will have an ever increasing tracking number and get back to us.[/font]

    Another potentially very serious problem with all of this is the idea that a lot of the rows will suffer a lot of updates. Whether or not the TrackingNumber is ever-increasing or not, ALL of the columns are VARCHAR() columns. ANY new information added to a previously NULL entry has the huge potential of causing a very expensive page split (and who the hell in their right mind would make a VARCHAR(1) or even a VARCHAR(3) column???). Since the rows in this table suffer a whole lot of updates because of the nature of package tracking, my recommendation would be to trade-off some disk space for performance by changing all the columns to a fixed width, which would totally eliminate the possibility of page splits. That means ensuring that all columns in the input variables in the proc and in the table itself that are date/time based should be DATETIME. All columns that will contain only whole numbers should be changed to an integer datatype (right sized, of course). All columns that contain character based data should be change to CHAR() rather than VARCHAR() and that includes the wider columns such as the VARCHAR(120) columns. Yep... it's going to take a more disk space but it will prevent any and all page splits caused by UPDATEs. The INSERTs will still be a problem if the TrackingNumber isn't "ever-increasing" [font="Arial Black"]so you really need to get back to us on that[/font]. THAT will determine whether the Clustered index needs to be on the TrackingNumber (which would be extremely beneficial, in this case) or if it needs to be on the IDENTITY column to prevent page splits if the TrackingNumber isn't "ever-increasing".

    Last and certainly not the least, we need to know if there are any Foreign Keys on this table.

    I think the person I need to work with on these changes is out until 1/3, so it could be a few more days before I can address some of the questions in this thread and that you specifically are raising, like are the TrackingNumber values always going to be ever-increasing. In the mean time, I will see if I can replicate the current set up in a test environment and then implement some of the changes that have been suggested and see what the results are.

    The issue of the deadlocks is the same as what is in this article: http://sqlblog.com/blogs/jonathan_kehayias/archive/2008/11/25/anatomy-of-a-deadlock-part-deux.aspx

    I have RangeS-S and Range I-N happening, just like described in this article. I can see that from SQL Diagnostic Manager.

    One of the things I was already going to do based on comments from TheSQLGuru is find out if I can make the TrackingNumber the Clustered PK. I have actually already tried three different clustered PK possibilities as well as one non-unique clustered index option and they all failed to stop the deadlocks. I have created a PK on just the ID column, on (ID, TrackingNumber), and (TrackingNumber, ID). I also made TrackingNumber a non-unique clustered index. I have not attempted to create a PK clustered index on just TrackingNumber. I think I'm still having the deadlock issue because of the locks involved for TrackingNumber when SQL Server can't know ahead of time that the values are unique. So I agree that I need to find out if TrackingNumber is guaranteed to be unique, which I would think it would be, but better safe than sorry.

    I totally agree that there are some less than optimal choices for data types. I will certainly press the point of choosing better data types and why that matters for page splits and the work involved in converting data back and forth.

    There are no Foreign Keys on the table.

  • Jeff: "Another potentially very serious problem with all of this is the idea that a lot of the rows will suffer a lot of updates. "

    That actually isn't a problem on this system. The OP already verified that there are in fact no duplicates on TransactionNumber.

    That statement is why I said the solution will be to make JUST THE TrackingNumber THE PRIMARY KEY. That will allow the optimizer to NEVER do a range scan from that silly date predicate because the TrackingNumber will be GUARANTEED to never affect more than one row. The engine will simply to a clustered index seek to get the one row then do a simple predicate on the date. Even better as I stated would be to check if you would ever get a "back-dated" transaction and if not remove that part of the filter altogether.

    As for page splits, there is a huge probability that only relatively recent rows would ever be updated so I think that issue is a pretty small one but I would still almost certainly rather see the identity removed completely and just use the TN as the PK.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • lmarkum (12/30/2016)


    like are the TrackingNumber values always going to be ever-increasing

    You should be able to easily figure that out on your own because you have an IDENTITY column. Sort by the TrackingNumber and see if the values in the IDENTITY column are always increasing in value or if it has "out of place" values, which would indicate that the TrackingNumber is not inserted in an ever-increasing manner.

    If the later is the case, the Clustered Index needs to be on the IDENTITY column to prevent page splits and you need a non-clustered index on the TrackingNumber. Of course, that won't fix the problem with deadlocks all by itself because you need to address the issue of row expansion causing page splits during an UPDATE by changing the datatypes and the other things I suggested.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru (12/30/2016)


    Jeff: "Another potentially very serious problem with all of this is the idea that a lot of the rows will suffer a lot of updates. "

    That actually isn't a problem on this system. The OP already verified that there are in fact no duplicates on TransactionNumber.

    The datatypes used are the problem with the UPDATEs. Even updating one of those bloody VARCHAR(1) columns from NULL to some value will cause the given row to expand and expansion of rows will take a heavy toll in the form of page splits no matter what the CI is. Whether the TrackingNumber has duplicates, is unqiue, or whatever, is a comparatively minor part of the problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 44 total)

You must be logged in to reply to this topic. Login to reply