DR Planning

  • Comments posted to this topic are about the item DR Planning

  • An untested disaster recovery plan is as likely to work as an untested computer program, and for the same reason. Even more so, because of the unpredictable nature and variety of the possible "inputs".
    MarkD

  • I worked at a large institution a few years back that was very good about testing their DR systems.  Every 6 months all of the application owners would have a disaster drill in which all of their data would be recovered into a back up site.  Unfortunately, those who planned the tests were a little more forward thinking than those who had originally planned the backup sites.

    After one drill, it was noted in the lessons learned that the backup site was only 1 mile from the primary data center.  A rogue backhoe cutting fiber would take them both out. 

    Enough application owners raised that concern that the backup center was then moved to an already owned site in a major coastal city in Florida.  Again, everything went well in the drill until it was pointed out that the disaster recover site was only 3 feet above the water table and the primary site, while not on the coast, was in a hurricane prone state.  

    The response this time was to build an entirely new recovery data center with all of the geographic concerns addressed.  The new data center was built away from coasts, well above water tables and even hardened for tornados.  All new equipment was ordered.  State of the art generators were installed.  Satellite were facilities brought online.  A database tape rotation was put into place so that one set went to the DR site and another to Iron Mountain.  We could come up in a day (that was considered acceptable at the time) if needed.

    The 6 month testing window came just as the new facility was brought online.  All of the application owners were lined up for a full week of recovery testing until the very first tape was inserted.  With all of the new equipment, no one had noticed that all of the newly installed tape machines were incompatible with the tapes that the primary site was using.  The new facility and all of the shiny new boxes suddenly became useless for DR due to one missed spec.

     In the past, I have told this story as a humorous example of poor DR planning.  But I also see in it some extraordinary planning.   Having owners buy into and dedicate time to test every 6 months is something I have not seen in other places.  Usually the owners have been OK with "If the DBA says we are good then we are good."  So many interdependencies were discovered that no single person knew about during those tests.  In addition, actually doing it live was the only way to really discover if the order of recovery you THINK you need to do things is really the way it should work.  And finally, I'm not sure the geographic concerns would have been addressed had the application owners not been invested in the DR planning and had the company heads not been serious enough to relocate and build facilities when those concerns were raised.

  • Great story.

    I had a financial company that contracted for DR and once a year we had to drag tapes over and  restore our systems. It was always hard and always with issues. What was worse over time was that the DR hardware spec wasn't changing with the hardware that was in our office. If we would have failed over, it would have been ssssssllllllllloooooooooooooooooooooooooooooooooooooooowwwwwwwwww.

    I suspect this was more a marketing point for our customers than any serious attempt at DR. The issues we wrote up after the test each year were documented, but we never ran another test until the next year

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply