Data Loss

  • Potato Powered Server

    I was listening to an Always On lecture and I heard that synchronized mirroring means you haven't lost data. It got me thinking and I wanted to comment on a perspective you might not have thought of.

    Let me give you another example first to set the stage. We all work hard to be sure that our servers are running efficiently and available when our clients need them. So suppose you get a call from the help desk that the application is giving database errors to clients. You perform a check on the server using a remote KVM and see that the server is running fine, but there's no activity.

    After a little investigative work, you realize there's a network problem that is preventing clients from getting to the server. You get it fixed and suddenly clients can connect after being down for 45 minutes. So here's the question: How long were you down?

    From your perspective, the server was never down. You're still counting metrics that show you have 100% uptime for the year. However your users think you've blown 5 9s and are working on going below 4 9s. To them, it was an unproductive hour and they'll have to figure out how to make up the work.

    Who's right?

    Both of you, and it brings up an interesting perspective. If I update a bunch of stuff, work through some data on my client, knowing I have a mirrored database and I'm protected, what happens if my database fails over to the mirror and I have an open transaction?

    The answer is that my partial transaction is rolled back and my application would need to reconnect to the mirror database and restart the transaction. Since most applications I've seen would have trouble with this, have I lost data?

    The way it works makes sense and it's something that you should be careful to explain to any managers. The transaction would have never made it to the mirror because it wasn't committed. So how do I feel?

    I feel like I lost data. Or at least work.

  • I always look at it from the users view. If the user has to rekey data or wait until the mirror reconnects then the user has lost time and work. I can say that technically they didn't lose data because it was on the mirrored drive, but rekeying or waiting for the data on the mirrired drive cost the company time and time is money. And if the users lose time my boss will hear it and I will hear about it.

  • If you take your car in for repair, and the transmission's bad, you don't ask "Who made that (#*$)@ transmission?" -- When, in all probability your car's manufacturer didn't make the transmission but sourced it from someone else and just bolted it in.  You blame the manufacturer, or the dealer, and if you were told that it was someone else's fault other than those two you don't accept it.

    In I.T., particularly in large poorly managed I.T. shops, the reverse is true.  The database guys think the application guys are idiots.  The application guys think the database guys are idiots.  Why the network guys can't keep the network running 100% of the time.  The NOC operator can't type the right command out of the book.  I'm doing MY job HE's not doing HIS job, blah blah blah.  I've come to call this the "Wheel of Blame", and in some places I've worked it occupies more of your time than anything else.

    The poor user, who sees all of I.T. as one entity, doesn't care what part of the system broke or how everything works.

    The question really is:

    If the database is mirrored, and the application(s) don't have the right/proper code to take advantage of it, is the database mirrored? 

     

     

  • If a network blip caused the outage, ask the network guy to call the router manufacturer to find out what happened.  This happend where I worked one time so instead of starting the "wheel of blame" we all got a good laugh out of Cisco's response... cosmic rays.  If the network guys can come up with outlandish responses to their outages, the users will be happy that you found the issue and will be happy to pay to start wrapping the routers and servers in aluminum foil to prevent the issue next time.

    If it happens again, perhaps we need a redundent network connection powered by potatoes?

  • Ian,

    Nice. Reminds of when we told a manager once we had to repair a cable because the tokens kept leaking out of the ring

  • I have so many examples of where it was not our fault but we still took the heat.  I would cite some but it would give away to much confidential information.

    I strive to design applications so that I only have one un-saved item in memory at a time.  Our order taking application was originally designed to take the order to memory and save it all at once.  During a test run the boss had entered a 100 line order.  He was happy that the capacity was working.  He handed me the device which I promptly warm-booted.  I went back into the app and handed him back the device.  "What hundred line order?  I don't see and order in here."   He was furious.  "Better that it happend here in testing than to a customer in the field."

    ATBCharles Kincaid

  • I beleive, in production systems, i should have multiple vendor network links serving my connections, in case of a failure, a manual/auto failure can be initiated. This would though result in momentary loss of connection and require the end user to resubmit the transaction. There is data loss, but i beleive 5 9's are preserved.

    In case of any application failovers like in deadlocks, clusters, mirrors or other reasons which involve data loss due to improper handling or cathing of exceptions in application code, the transaction has to be reinitiated by the end user.

    We will call a data loss in DBMS for a data which was saved and committed to the user and is later not available. a submit from user should communicate a success response only after the required data is written to the database buffers. i beleive, proper error handling can communicate to the end user that the data is not saved and can enable a repeated request from user.

    Swapnil

  • The "application would need to reconnect to the mirror database" is not necessarily true because the new version of MDAC in Visual Studio 2005 contains a mirror-related feature with-in the connection object called "Transparent Client Redirect". When a connection is made to the principal the connection object caches the principal and the mirror.

    "The transaction would have never made it to the mirror because it wasn't committed" is not true either. In High Availability Mode all transactions are sent to the mirror before they are committed, when the application issues a commit, the transaction is committed first on the mirror which sends an acknowledgement to the principal which then issues a commit which then sends an acknowledgement to the application.

    There is no need for the application user to even know there was a failure never mind experience data loss.

  • From what I understand, if the primary is lost, all clients must reconnect. While all "committed" transactions are moved over, any transaction that is halfway done must be restarted (and resubmitted), by the client when they reconnect. The client in the new ADO.NET 2.0 will automatically redirect to the mirror, but they must reconnect.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply