March 26, 2003 at 8:42 am
I have a process which is summarises data and then inserts this data into a table as follows:
1. Create summarised data in temp table
2. Begin transaction
3. Delete existing rows in permanent table (just in case and allows me to recreate the summarised data if I need to)
4. Insert rows from temp table
5. Commit transaction
This has been working fine as each request to create the summarised data would be unique.
However, I now the situation where I can sometimes get a duplicated request and I have ended up with duplicate rows in the table and reporting double the values.
What I suspect is happening is that the two requests come in at the same time and that the second request start steps 2-5 whilst the first request is still doing steps 2-5.
My understanding of transactions is that the second request should wait until the first one has committed before it starts steps 2-5.
Some of the requests insert only a few hundred rows and execute quickly but some insert 100,000 rows which takes about 7 - 8 seconds. I could issue an exclusive lock on the table but this would prevent other requests from reading the table which I want to avoid.
Any ideas on how to prevent request 2 from inserting rows into the table until request 1 has completed?
Jeremy
March 26, 2003 at 2:44 pm
There could be a better way to do this without transactions or locking tables.
1) Make sure there is a datetime column in the destination table.
2) When inserting data, declare a temporary datetime variable and assign
it the value of GETDATE()
3) Insert the new "set" of rows with the date field set to this temp date. (this way the "set" of data will have the same date in each row.)
- now if this is the second run, you have 2 sets of data in the destination table, but they can be identified uniquely by the datetime field. (up to the limits of the datetime precision)
4) To cleanup the older data use the temp date variable from step 2 and delete all rows that are older than this "newest" insert date
DELETE FROM desttable WHERE dt_inserted < @tmpDate
So now you have a lot less race conditions because any instance will just insert its own version of the rows, then delete anything older than its current "run"
Likewise, your presentation of this data should always do a SELECT MAX(dt_inserted) to only pull the most recent set of rows. That way if one instance is "finishing up" and there are two sets of data, the presentation still only captures the most recent set.
You might want to have a step in between 3 and 4 that verifies that your temp table contains a complete set of data before deleting previous data, but thats up to you. (And probably part of why you were using a transaction before).
-Thanks,
David W. Lovell
Thanks,
David W. Lovell
March 27, 2003 at 1:53 am
David,
Thanks for the idea.
I was doing some more checking after I made the post and I'm not sure what is going on now.
I simulated what I thought was going on using QA with two different windows and it worked as I expected with the second transaction waiting until the first one completed. There must be something else going wrong.
I can put a unique index on the table which will force the second transaction to abort but I guess I have to have another look.
Jeremy
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply