Three Mile Island And Your Databases Yeah, I know. Bear with me. Some of the younger amongst you may have never heard of Three Mile Island. It used to be a nuclear power plant in Pennsylvania. One night back in 1979, two, separate but related, mechanical problems lead to a leak of coolant (treated water) within the containment building. The story of the event is the point of this editorial, not so much the event itself. Here’s the deal that you’re unlikely to see in most accounts of the story. While there were mechanical issues, some of them caused by poor maintenance, and a whole, enormous, actually truly frightening series of human errors occurred, the actual accident, meltdown, and, partial leak to the environment, didn’t even raise the background radiation in the surrounding area. Why? Because the fundamental reactor design was solid. It survived bad maintenance. It survived really poor training, horrific judgment, and frankly, a large degree of stupidity. It was engineered well. Yet, that’s not the story anyone recalls. Why? Communication. See, not only were the people running the reactor bad at their jobs but the people responsible for them were equally bad at their jobs. What was happening in and around the reactor was extremely poorly communicated. So, instead of a story about, “wow, good thing we over-engineer our reactors, we really need better training and maintenance to go with it,” we got “OMG!!!ELEVENTY!!! Nuclear power BAAAADDD!” For an excellent account along these lines, watch this really well-done video. “OK, Grant, and databases?” Something to think about in your environment is how you’re going to communicate during an outage. How do you let management know? What should you tell them? Who is going to tell them? You’re in an emergency. You’re going to be highly distracted. Yet, you’re going to have a bunch of non-technical people relying on you to get them a good message that they can then take to others. This requires two things. First, you have to think the whole communications thing through ahead of time. It could be that Bob, you know Bob, he’s had all those run-ins with HR because of his abrasive communication style, maybe Bob isn’t your spokesperson. So who is? Second, you need to practice your recoveries, failovers, what have, so that when the emergency hits, you know what to do and aren’t looking at the equivalent of an alarm papered over because it’s always going off (). So, yeah, get your backups in line and tested, sure. However, also practice restores. Then, make darned sure you know how to communicate to everyone else when the emergency is truly on. Grant Fritchey Join the debate, and respond to the editorial on the forums |