A Foreign key (FK) is used to establish referential constraint between two tables sometimes referred to as parent-child table or master/detail tables. It indicates that a set of records in the child/detail table should belong to a record in the parent/master table. Normally it's a good idea to create an index on foreign key columns. This will help improve the performance of joins (based on the FK column) between parent-child tables. However, today we will discuss a scenario in which an un-indexed foreign key column can lead to a deadlock.
Before going into details, let's have a brief look at some basic locks used in SQL Server. For the purpose of this discussion we will concentrate on 3 main lock modes.
- Shared (S) – This is used by SQL Server when reading any data (e.g. SELECT). This ensures that other sessions/transactions can read same data at same time but they can't modify it.
- Exclusive (X) – This is used by SQL Server at the time of modifying any data (i.e. any DML operation like UPDATE/DELETE/INSERT). This ensures that no other transaction can read/write that data at same time.
- Update (U) - This is something that can be thought of as being in between 'Shared' and 'Exclusive'. SQL Server takes this lock to indicate that it is going to update that data. This lock just indicates desire to update. 'Exclusive' lock is obtained at time of actual update.
For further details of locks, various other lock modes and their compatibility please refer to this MSDN link: http://msdn.microsoft.com/en-us/library/ms175519(SQL.90).aspx .
Parent – child table with foreign key
Now let's write some code and see the things in action. Use the following script to create two tables with foreign key relationship.
CREATE TABLE tbParent (ParentId INT NOT NULLCONSTRAINT PK_ParentId PRIMARY KEY, Val1 INT NOT NULL, Val2 VARCHAR(10)NOT NULL) CREATE TABLE tbChild (ChildId INT NOT NULLCONSTRAINT PK_ChildId PRIMARY KEY, ParentId INT NOT NULLCONSTRAINT FK_ParentId REFERENCES tbParent(ParentId), CVal1 INT NOT NULL, CVal2 VARCHAR(10)NOT NULL) GO INSERT INTO tbParent(ParentId,Val1,Val2) SELECT 10,111,'abcd' UNION SELECT 20,222,'efgh' UNION SELECT 30,333,'ijkl' GO INSERT INTO tbChild(ChildId,ParentId,CVal1,CVal2) SELECT 1,10,123,'aaaa' UNION SELECT 2,10,456,'bbbb' UNION SELECT 3,10,789,'cccc' UNION SELECT 4,20,234,'dddd' UNION SELECT 5,20,345,'eeee' UNION SELECT 6,20,456,'ffff' UNION SELECT 7,30,567,'gggg' UNION SELECT 8,30,678,'hhhh' UNION SELECT 9,30,789,'iiii' GO
Please note that both the tables have clustered indexes, but the child table does not have a non-clustered index on the FK column (ParentId).
We will use following query to display various locks obtained by SQL Server when we try to delete records from these tables. This query makes use of a dynamic management view sys.dm_tran_locks. This is used to display currently active locks that have been granted or are waiting to be granted. A left join with sys.partitions is used to display the owner object of the resource to which this lock belongs. You can get further details of this DMV at http://msdn.microsoft.com/en-us/library/ms190345.aspx .
SELECT CASE WHEN resource_type = 'OBJECT' THEN OBJECT_NAME(l.Resource_associated_entity_id) WHEN resource_type IN('PAGE','KEY','RID')THEN OBJECT_NAME(p.object_id) ELSE '<not applicable>' END AS OwnerObjectName ,request_session_id,resource_type,resource_description,request_mode ,request_type,request_status,request_reference_count,resource_associated_entity_id FROM sys.dm_tran_locks l LEFT OUTER JOIN sys.partitions p ON l.resource_associated_entity_id = p.hobt_id WHERE request_session_id IN(53,54) –- replace 53/54 with actual session Ids ORDER BY request_session_id,resource_type
Now open two query windows and copy following code in one of them
SELECT @@SPID SET TRANSACTION ISOLATIONLEVEL READ COMMITTED BEGIN TRANSACTION DELETE FROM dbo.tbChild WHERE ParentId = 20 DELETE FROM dbo.tbParent WHERE ParentId = 20 ROLLBACK TRANSACTION
The above code first gets session id that we will have to supply to query used to get locks from sys.dm_tran_locks DMV. The remainder of the code is used to delete a record from both the child and parent tables. Please note that this transaction works under default isolation level of READ COMMITTED. Transaction isolation levels are a separate topic in itself and if you want to read more about them then please refer to this link http://msdn.microsoft.com/en-us/library/ms173763(SQL.90).aspx .
Now copy this code in 2nd query window
SELECT @@SPID SET TRANSACTION ISOLATIONLEVEL READ COMMITTED BEGIN TRANSACTION DELETE FROM dbo.tbChild WHERE ParentId = 10 DELETE FROM dbo.tbParent WHERE ParentId = 10 ROLLBACK TRANSACTION
As we are trying to highlight a deadlock scenario, we need to run the above queries in a fixed order. To do that you can highlight just part of the query and run them one by one (as shown below). To begin with, just highlight 'SELECT @@SPID' portion and run the highlighted query to get session ids for both the query windows. Now open a 3rd window and copy the query to display locks in this new window. Replace 53 and 54 (which are session ids for delete queries on my setup) with those you obtained above.
Now switch to 1st query window that contains code to delete record for ParentId = 20. Highlight up to the statement used for deleting from tbChild. Then run the highlighted code as shown below
Now switch to 2nd query window, highlight similar statements there and run them. You will find that query is blocked. Then switch to 3rd query window and run the query to display locks, currently held by SQL Server. The result should be something like this
If you analyze the results you would find that there are 3 'KEY' locks for tbChild table. 'KEY' refers to rows in an index. Because there is a clustered index on our test tables the 'resource_type' for all row locks will be 'KEY' i.e. a resource locked by SQL Server are rows in a clustered index. If this would have been a heap table then the 'resource _type' would be 'RID' indicating a row in a heap. Notice 'request_mode', which is 'X' and 'request_status' is 'GRANT' for 3 rows. This indicates that 3 rows have been granted exclusive locks. This now looks obvious as we know there are 3 rows pertaining to ParentId = 20 in tbChild table.
SQL Server uses value of 'OBJECT' for 'resource_type' to represent database objects like 'table', 'view' etc. 'IX' lock here indicates that SQL Server has obtained an intent exclusive lock on tbChild table. This means it has an intention of obtaining exclusive lock on that table. Such locks are called 'intent' locks. They help SQL Server to optimize resources used to hold locks by doing something called 'lock escalation'. For e.g. instead of acquiring locks on a very large number of rows, SQL Server lock manager may decide that it's better to acquire one lock at table level. This can be achieved by first acquiring an 'intent' lock and then converting it to full lock (if required). Please refer to http://msdn.microsoft.com/en-us/library/ms184286(SQL.90).aspx for more details.
'PAGE' lock indicates that the resource being locked is a 'data page'. This also has IX i.e. 'intent exclusive lock' in the above result set i.e. there is an intention to acquire 'PAGE' lock on 'data page'.
The main thing to note here is that we have 3 'X' locks in session 53 for 3 rows in tbChild. Now looking at session 54 we find that there are again 3 'X' locks for 'KEY' i.e. 3 rows and similar 'IX' locks for 'OBJECT' and 'PAGE' resource types. Note the 'resource_description' column. This is some kind of resource identifier. When the resource being locked is a 'row' then it contains some kind of row identifier.
Note that all 3 rows have different values from those in session 53, indicating these 3 rows are different rows in tbChild table. This makes sense as query in one session is deleting rows related to ParentId = 10 while the query in other session is deleting rows related to ParentId=20. However you would notice that there is another row with 'U' lock on 'KEY' and with 'request_status' of 'WAIT'. This means one row in tbChild table is waiting for 'U' lock.
What does this represent and where does this lock came from? The answer to this lies in the fact that there is no index on the 'ParentId' column in tbChild table. It means that in absence of this index, the query optimizer is forced to do a table scan to find all the rows that have ParentId=10. This can be further confirmed by looking at query plan for DELETE statement on tbChild table. The estimated query plan for that DELETE statement is like this
As there is a 'Clustered Index Scan', SQL Server has to scan all the rows in tbChild. It scans first 3 rows. During scan it needs a 'U' lock. Then it finds first 3 rows have ParentId=10 and puts 'X' locks on these rows. When it comes to acquire a 'U' lock on the 4th row (which has ParentId=20), it can't do that as the transaction running with SessionId = 53 has already got an 'X' lock. So it has to wait and hence this query is blocked. This can be further confirmed by looking at 'resource_description' column of query displaying lock status results. Note that the row having 'U' lock for session 54 has same 'resource_description' as row having 'X' lock in session 53. I have marked both these rows in above result set.
Now switch to the 1st query window (ParentId=20) and run the statement for deleting from tbParent by highlighting just that statement. You need to be very quick. Just after running that delete statement switch to the 3rd query window and run the query to see current lock status for both the sessions. If you were quick enough you would see the following lock status results.
You will see that there are 4 more locks in session 53. Three of them are obvious and they are IX locks on OBJECT and PAGE for tbParent table and one 'X' on a row in tbParent (i.e. 'KEY' resource_type). However there is another 'KEY' lock on a row of tbChild. This has the 'S' 'request_mode' and the 'request_status' is 'WAIT'. So the transaction in session 53 is waiting to acquire a shared lock on a row in tbChild. If you look at the 'resource_description' for this row you will find that same row has 'X' lock in session 54.
What happened when we ran DELETE statement on tbParent and why is there an 'S' lock needed for a row in tbChild? The answer lies in the FK constraint. SQL needs to validate that we can't delete a row from tbParent that is still being referenced in tbChild. Hence it needs to read (similar to SELECT) from tbChild. But in absence of an index on ParentId column in tbChild table, optimizer chooses to do a scan. This can be further confirmed by looking at query plan for DELETE statement on tbParent.
A 'Clustered Index Scan' on tbChild means SQL Server needs to scan all rows and while doing so it tries to get shared lock for rows having ParentId=10 (and not just ParentId=20). As they are already locked by transaction in session 54, we get avrequest for an 'S' lock being blocked. You can check the 'resource_description' for two related rows i.e. one waiting for an 'S' lock and one having an 'X' lock in another session. The values are the same indicating they are the same rows.
Now if you compare two lock status results, you will observe that session 53 is waiting for a shared lock on a row which is locked by session 54 (with an exclusive lock). On the other hand session 54 is waiting for a 'U' lock on a row that is locked by session 53 (an exclusive lock). Hence now we have a deadlock. As a result SQL Server terminates one of the transactions so that other one can proceed. This is shown as error in one of the query windows.
Now abort any pending transactions in both the query windows. If we go ahead and create a non-clustered index on ParentId column in tbChild then we can avoid scan of tbChild table when querying using ParentId column. So let's try that
CREATE NONCLUSTERED INDEX [IX_Parent] ON [dbo].[tbChild]([ParentId] ASC)
Now check the query plan for DELETE on child and patent tables.
As expected now we are seeing SEEKS instead of SCANS. What this means is, two delete sessions acting on different ParentId values should not block each other and hence avoid a deadlock. Let's try this.
Repeat the DELETE queries in the same order. In the 1st query window (ParentId=20) start a transaction and just run the DELETE statement for tbChild. Then switch to the 2nd query window and start a transaction and run the DELETE for tbChild. You will notice that 2nd query window is not blocked. If you now run the query to display lock status you should see results like this
As you can see now there are 6 'X' locks for 'KEY' resource_type i.e. 6 rows. These are 6 rows because we have got 3 rows in clustered index and 3 in non-clustered index. This is same for both the sessions. Also none of the locks are in a wait state. The SEEK has avoided the sequential scan and hence blocking on rows that are not actually affected by the query.
Now go-ahead and run the DELETE on tbParent (as we did earlier to get deadlock) in the 1st query window. Then switch to the 3rd query window and run a query to display current lock status. The lock status results should be like this
You would find that there is an exclusive lock on only one row in tbParent and no wait for any other rows in tbChild. This is because a SEEK was done and hence no conflicting/common rows were looked at (as both queries use different ParentId). Now complete the transactions in both the query windows and you will find that all works fine without any deadlock.
Summary
Creating an index for foreign key column is a good idea not just from performance point of view but also to avoid potential deadlock when deleting/updating records from parent-child table.
Bibliography/Source Consulted
- SQL Server Books Online 2005