June 25, 2011 at 3:06 am
We have a job that runs this code:
InsertMore:
WAITFOR DELAY '00:00:05'
insert top(2500) into TableA
select * from TableB B
where B.ID not in (select ID from TableA) -- don't insert same records again
if @@rowcount > 0 goto InsertMore
It takes a long time to run, but is written this way to prevent much locking during business hours.
If a separate session runs the same code, with the hope of it completing faster, is there any chance that duplicate ID rows could get inserted since it only has a non-unique index on ID ?
June 25, 2011 at 3:18 am
Absolutely there's a chance. A good one.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 25, 2011 at 12:22 pm
My initial thoughts went towards explicit transactions and transaction isolation levels. What is the real-world scenario behind the question? And are you equally worried about locking on both tables, or is one one more important than the other?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 25, 2011 at 4:56 pm
TableA, being inerted into, is a live production table, while TableB, being selected from is a temp holding table during a load process. Initially, the process did 1 big insert, but it could take a while and caused blocking, so we modified it to do batches of small inserts.
The problem is that the overall load time now takes longer, although it doesn't cause blocking. We were trying to find a better way, and the other guy suggested running the same process twice at the same time since it checks to see that the records are not already inserted. Seemed to me that you might get duplicates though because the same batch could be getting inserted at the same time.
June 26, 2011 at 2:58 am
homebrew01 (6/25/2011)
Seemed to me that you might get duplicates though because the same batch could be getting inserted at the same time.
Absolutely. You'd need some nasty isolation level to prevent that (likely serialisable)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 26, 2011 at 8:18 am
Serializable would keep you from getting dups. It is super max lockdown though so using it would defeat the purpose of running two instances of your code as it is written.
Snapshot iso is the other level you could consider. It is the only other read-consistent iso level besides serializable. I say consider because I have never looked at it in this light and am having a hard time wrapping my head around what it would do for you in this scenario.
That's all I had on iso levels.
Here is a code change option for you:
What if we introduced a temp table to hold the results of this...
select * from TableB B
where B.ID not in (select ID from TableA)
...to get our working set. Then we could delete 2500 rows from the temp table at a time and use the OUTPUT clause to insert those rows directly into TableA. I think that would solve your transaction issues because the DELETE would open write locks on those rows it was deleting and prevent a second process from reading those. If you had two delete process it would be worth coding them to worth on different sections of the temp table so they have less of a chance of blocking each other. Not sure what that would do to TableA in terms of fragmentation though. Just thinking out loud here, it would need to be run down but you get the idea.
CREATE TABLE dbo.src (id INT)
CREATE TABLE dbo.dest (id INT)
INSERT INTO dbo.src ( id ) VALUES ( 0 ) ;
DELETE
FROM dbo.src
OUTPUT DELETED.id
INTO dest;
SELECT * FROM dest ;
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 26, 2011 at 11:11 am
opc.three (6/26/2011)
Snapshot iso is the other level you could consider. It is the only other read-consistent iso level besides serializable. I say consider because I have never looked at it in this light and am having a hard time wrapping my head around what it would do for you in this scenario.
It'll almost guarantee duplicates.
Snapshot gets you a consistent view of the data at the time of the transactin (snapshot) or statement (read committed snapshot) starting. If one insert starts then the other starts before the first finishes, it will see (for the not exists) a consistent view of the data before the first insert started and will hence happily insert duplicate rows.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 26, 2011 at 12:48 pm
GilaMonster (6/26/2011)
opc.three (6/26/2011)
Snapshot iso is the other level you could consider. It is the only other read-consistent iso level besides serializable. I say consider because I have never looked at it in this light and am having a hard time wrapping my head around what it would do for you in this scenario.It'll almost guarantee duplicates.
Snapshot gets you a consistent view of the data at the time of the transactin (snapshot) or statement (read committed snapshot) starting. If one insert starts then the other starts before the first finishes, it will see (for the not exists) a consistent view of the data before the first insert started and will hence happily insert duplicate rows.
Good to know. Thank you Gail.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 26, 2011 at 4:13 pm
Thanks for the feedback. I'm heading on vacation for a bit, so I'm going to leave it alone for now, and it only runs once a month, so no rush.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply