Phantom Reads?

Question

Phantom Reads?

brandon_ledbetter

SSC Enthusiast

Points: 188
More actions
July 12, 2010 at 3:14 pm

#139781

I have two C# processes, which (it is my intent for them to) handle transactions explicitly (not using TransactionScope) using BeginTransaction() and either Commit() or Rollback(). These transactions are running with default isolation level, which the documentation tells me is ReadCommitted.
Process A (C#) is reading an "outbound" queue and inserting to the database: reads an item off the queue, begins a transaction, runs a stored proc to insert some entities into the database (actually the main stored proc calls a nest of other stored procs that do all of the inserts) - these entites span, say 8 or 9 tables, commits, and reads the next item off the queue.
Process B (C#) is an archive process - it's job is to periodically look for data that qualifies to be archived. Decisions about what qualifies to be archived are made on the top-level parent table for these entities, and then each of the other tables is subsequently evaluated for rows that either are orphaned, or will be orphaned by the pending sweep of its parent table. This is all handled as one big transaction - the C# starts the transaction, goes into the "collection" phase of the process (calls a stored proc to build a list of row tokens that qualify to archive from the top-level parent table, and then walks down through the child tables, calling stored procs to build lists of row tokens to archive from them), then goes into the actual "archival" phase of the process (calls stored procedures to select a spefic row by token, insert that row into the archive table, then delete the row from the active table by token), and finally does one big commit.
The problem is that sometimes Process B picks up rows (always child tables, at least two generations down from the top-level parent) that have just been inserted by Process A, and they don't really qualify to be archived (i.e., the parent row still exists in the parent table). This does not make sense to me - if Process A is committing the parent and all generations of children atomically, and Process B is set to ReadCommitted, it should behave as though those rows don't exist.
Reproducing the problem in test is doable, but takes a lot of work, so I need to try to minimize the number of attemts taken to figure this out, so I'm looking for suggestions:
- I know I could be explicit about the Isolation Level
- Do I need to consider using the serializeable transaction isolation level - if so, I'm confused about which process should use it
- Is there something fishy about nested stored proc calls that insert data at various levels of the call stack? Some sort of quasi-commit that maybe doesn't flush the row to disc, but maybe releases a read lock?
- Has my DBA or Server admin changed some default that I can't see that is giving me heartache? (I know I can ask them, but you can't imagine what a pain that will be. Not to mention my inability to trust that the answer will be correct.)
- I have considered breaking the archival process into a transaction per table, but it just seems to me that this should be a performance/blocking concern, not a data concurrency concern...
Thanks in advance for any suggestions you might have!
- Brandon

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

brandon_ledbetter SSC Enthusiast Points: 188 More actions · Answer 1

One small detail to add that might or might not be relevant....the stored procs used in the "collection" phase of Process B is using a "select top 5000", which is something else I have considered getting rid of...it seems like I've run across some articles or posts related to issues with this sort of thing in previous investigations, but of course I can't find them now that I am interested in that topic...

brandon_ledbetter SSC Enthusiast Points: 188 More actions · Answer 2

Maybe I overwhelmed with detail, so I'll try to restate my question more simply:

Process B is running a big select. Process A is inserting data in a transaction. The data involved in the transaction from process A could qualify for the select running in process B, but only while the transaction is open - when the transaction is complete, it should not qualify. What is the best way to make sure that Process B can't see data inserted by Process A unless it is really, truly committed?

I believe that the answer is that Process B's select statement needs to be running in a transaction with isolation level Serializable. Can anyone confirm that I am understanding the documentation correctly - that I am really dealing with what is referred to as "phantom reads", and that this is the proper way to avoid them?