August 4, 2011 at 9:56 am
Hello all
I have a question related to locking and would appreciate if someone could help me with it.
The background:
In the SQL Server project I am currently working on with have a [t_data_entities] table with the following structure
[Id] int PK Identity
[Name] varchar(50) unique
On the daily basis the database loads multiple CSV files from multiple sources where entities are listed by their name. While processing the file the database inserts new entity names into the t_data_entities table. The T-SQL looks something like this
INSERT INTO [dbo].[t_data_entities]
SELECT DISTINCT [t_import_file1].[Entity_Name]
FROM [dbo].[t_import_file1]
LEFT JOIN [dbo].[t_data_entities]
ON [t_import_file1].[Entity_Name] = [t_data_entities].[Name]
WHERE [t_data_entities].[Name] IS NULL
Please note that the import table in the above statement could be [t_import_file2], [t_import_file3] etc as those files come from multiple sources, but they all contain [Entity_Name] column and all new entities go into the [t_data_entities] table
The issue:
As stated before, multiple processes update this table from multiple sources. What has happened recently is that one of those processes actually encountered the error "Cannot insert duplicate key row in object 'dbo.t_data_entities' with unique index 'IX_t_data_entities'." Re-running this process a minute later with exactly the same data didn't cause an error.
I am reasonably sure that this happened because two processes tried to update the table at exactly the same time with the same new entity names, so while both processes were running the "SELECT" portion of the above statement the new entities didn't exist in the table yet, but then by the time the process B got to the "INSERT" portion of the statement, the process A has already inserted those new entities, hence the violation of the unique constraint.
It seems to me that I need to update the above statement(s) with a locking hint on [t_data_entities], but I am not entirely sure what type of lock to use and whether to use it in INSERT or SELECT portion of the statement.
Could someone knowledgeable confirm if my above assertion is correct, and if so suggest the minimal lock that need to be added to the query? An explanation on why the suggested solution is the best would be a nice bonus.
Thanks
August 4, 2011 at 10:48 am
If it's a locking issue (and it could be), you might want to look into TABLOCKX. (Details on table hints are here: http://msdn.microsoft.com/en-us/library/ms187373.aspx)
That will take an exclusive lock on the whole table for the duration of the transaction. That way, processes will have to take turns on it.
However, it's very unusual for SQL locking to allow one insert to complete while another is running in such a way as to cause a data problem like your describing. The default settings for SQL Server are aimed at full ACID compliance, and two inserts shouldn't interfere with each other under normal circumstances. So, before you look into table hints and overriding normal locking, you might want to take a look at the rest of the process and see if something else isn't causing that problem.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 5, 2011 at 2:01 am
tablockx should do the trick but i'd look into the reason why there is 2 seperate processes trying to insert the same record.....
_________________________________________________________________________________SQLGeordieWeb:- Jarrin ConsultancyBlog:- www.chrisjarrintaylor.co.ukTwitter:- @SQLGeordie
August 5, 2011 at 7:54 am
Thanks for the replies
The issue is easily reproducible by opening 2 queries in SSMS and running in them something like the following:
Query 1
WAITFOR TIME '14:27'
INSERT INTO [dbo].[t_data_entities]
SELECT DISTINCT [t_import_file1].[Entity_Name]
FROM [dbo].[t_import_file1]
LEFT JOIN [dbo].[t_data_entities]
ON [t_import_file1].[Entity_Name] = [t_data_entities].[Name]
WHERE [t_data_entities].[Name] IS NULL
Query 2
WAITFOR TIME '14:27'
INSERT INTO [dbo].[t_data_entities]
SELECT DISTINCT [t_import_file2].[Entity_Name]
FROM [dbo].[t_import_file2]
LEFT JOIN [dbo].[t_data_entities]
ON [t_import_file2].[Entity_Name] = [t_data_entities].[Name]
WHERE [t_data_entities].[Name] IS NULL
This will cause the above queries to fire off at exactly the same time, and if there are same entities in table [t_import_file1] and [t_import_file2] one of the queries will error out with the "Cannot insert duplicate key row.." message.
I've played around with locks and found that one of the following two solutions work:
Solution 1: Locking t_data_entities table exclusively at insert level
INSERT INTO [dbo].[t_data_entities] WITH (TABLOCKX)
SELECT DISTINCT [t_import_file1].[Entity_Name]
FROM [dbo].[t_import_file1]
LEFT JOIN [dbo].[t_data_entities]
ON [t_import_file1].[Entity_Name] = [t_data_entities].[Name]
WHERE [t_data_entities].[Name] IS NULL
Solution 2: Locking t_data_entities table at the select level
INSERT INTO [dbo].[t_data_entities]
SELECT DISTINCT [t_import_file1].[Entity_Name]
FROM [dbo].[t_import_file1]
LEFT JOIN [dbo].[t_data_entities] WITH (TABLOCK)
ON [t_import_file1].[Entity_Name] = [t_data_entities].[Name]
WHERE [t_data_entities].[Name] IS NULL
Any reasons to pick one over another?
Thanks
August 5, 2011 at 8:14 am
Same effect, I prefer first one as it shows more clearly that you are locking table which you are inserting to.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply