Data loss from hanging transaction?

  • I have a rather annoying problem.

    I am periodically lossing data in SQL 2000. After a 6 hour call with Microsoft they were able to recover some of it from a backup so that's good. But the problem itself was never resolved.

    They were able to tell me that it appears that a transaction is hanging. Durring the hang, my backups get larger and larger and data still 'appears' to be commited correctly. However once the transaction clears up, my data reverts back to the state it was in at the begining of the hang. This happens for all tables in one database on a server with 18 databases.

    The application using that database is a ColdFusion based content management system for our websites.

    Any thoughts?

  • I hit something like this once. It's complex, and may not be what you're dealing with, but sounds very similar. I do not use entirely technical terms below; this is mostly to get the concept across.

    It's (Windows or IIS) connection pooling. A service (an ASP-called DLL, in our case) would issue a query, and (when so configured) Windows would create a connection to the database and add it to the "pool". When the query was done the application would release the connection, BUT the pool would maintain it for the next database connection request. (There's lots of details behind this, but basically if the same login wants to talk to the same database, the pooled connection can be reused.) There's a "drop if unused for X time" setting in the pool. Don't knock it--connection pooling is a tremendous boost for a system.

    Next: there came a time (based on growing database size and complexity) where some submitted queries would exceed the application coonection timeout (defaults to 30 seconds?). The application (not SQL, not the pool) generates the timeout, and proceeds accordingly.

    The problem was compounded on a number of things. First, the wonders of engineering that were these applications did not do error checking; if they got a timeout, they'd just go on, inevitably generate errors in whatever called them, and not clean up the mess they left behind. SQL would eventually finish the query and return the results, which I can only assume disappeared into the inky black void.

    But: some of our applications were running as DTC transactions--meaning that SQL was instructed to "begin transaction" for the query. Lo and behold, with the timeout and lack of error control, the "commit" never gets submitted. Worse, connection pooling does not know about the transaction, so keeps the connection open, and passes it on to the next customer--who unwittinlgy joins into the the transaction. And, since it was a DTC transaction, the isolation level was serializable. (Oh, the angst I suffered when I first fully understood what was killing our servers...)

    The transaction only goes away when the connection closes (via the pool timeout), at which point I assume everything gets rolled back.  (I sure hope you didn't have any critical data in that pipeline!)

    There's  bit more, but that's the gist of it: A transaction gets opened and, improperly managed, does not get closed in a timely fashion. I'd guess what you're seeing is similar to what I had.

       Philip

     

  • So what was the solution in your case?

    Did you rewrite your DTC transactions? Or was their more to it.

    I don't have any DTC transactions that I know of.

  • The solution was to install proper error handling. Detect the timeout condition, issue rollbacks, close the DTC transaction properly. (This, needless to say, was not the fix used; no, the fix used was to have the DBAs go in whenever the problem occured and kill the offending connection. Insufficient development resources to go in and fix old bad code, doncha know.)

    I only found out it was DTC through the use of SQL Profiler. When trying to spot perpetual transactions, on a "what the heck" basis I tossed in the DTC counter, and hit paydirt. Wasn't easy to find, amidst everything else going on on the box, but it was there.

    I spelled out all the above as possible means by which you might be losing data. (If it's transaction backups that are getting larger and larger, it sounds like an uncommitted and still increasing transaction to me--and if the data reverts, that says rollback.) Unending transactions should be fairly easy to detect, if this is the source of your problem; finding what's causing them can be pretty hard.

       Philip

     

  • I was able to discover yesterday that their was a transaction haning with an ABORT?? at the end. I don't understand how this hung because every statement I was able to find in the transaction log has a COMMIT at it's end.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply