choosing a Disaster Recovery plan

  • Good day,

    Long time developer...newbie DBA...I'm looking for some thoughts on Disaster Recovery.

    Here's our setup: We have a facility in NJ which we would like to be the primary and one in GA which we would like to be the backup. My understanding is our network bandwidth is solid. We've got say a dozen servers, mixed between 2000, 2005, 2008 (though the 2000 will soon be 2008), each with anywhere from a couple databases to a hundred databases, ranging anywhere from a few MB to 150GB.

    Automatic failover is not critical but ideally we can't have data loss.

    I've got a textbook understanding of Log Shipping, Database Mirroring and Replication - I'm looking for some real world practical advice. Am I even heading down the right path looking at SQL Server's native solutions or do folks use some 3rd party solution?

    Here's my thoughts:

    Log shipping: I understand the secondary will always be one log behind the primary, that's ok. I also understand if the primary goes down and you can't get the tail of the log, then we'll have some data loss. But how robust is this? Is the log chain easily / often broken? If the chain is broken, say someone switches the recovery to Simple, now I've got to start over with a full backup of that database, that's tough with the larger databases.

    Database Mirroring: We're turned off by the latency the synchronous mode could introduce. If we choose asynchronous then what's the point? The fact that you can't switch to Bulk Logged recovery when loading data to me is a deal killer.

    Replication: Appears to be a more granular implementation of Database Mirroring.

    Any thoughts would be appreciated...

    Thanks!

  • I personaly like the "business continuity" wording more rather than "disaster recovery".

    So you have anything in between two hundred to a dozen hundred databases?

    First question would be... do all these databases have the same BC requirements?

    The idea is to identify:

    1- Databases that cannot afford to loose any data.

    2- Databases that can afford to loose one hour worth of data.

    3- Databases that can afford to loose one day worth of data.

    A second classification would be:

    1- Databases that cannot be down for more than 1 minute.

    2- Databases business can afford to be down for one entire day.

    3- Databases business can afford to be down for one entire week.

    Critical databases are the ones ranked #1 in both classifications.

    Second question would be... how much tlog gets generated by the critical ones during peak hours?

    Third question would be... what kind of connectivity do you have in place in between primary and secondary site? is it point-to-point? is it internet? what's your providers SLA for that connection? is it redundant?

    Hope this helps to get you started.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Third party or not would also depend on

    1. Whether you're looking for one solution for all your databases or not.

    2. budgetary constraints, if any.

    3. In-house expertise for managing all the different natively available options - if you're a one DBA shop then you might be better off with a 3rd party solution which doesn't require a whole lot of database experience

  • Thanks for the reply PaulB.

    I'd say we have around 500 - 600 databases.

    Good questions...I'd say no they won't all have the same BC requirements.

    There will be som that can not lose any data and others that can lose a days' worth.

    Similarly, there will be some that can not be down for more than a minute and others that can be down for a day.

    I'm not familiar with the log sizes b/c the databases I currently maintain are set to Simple, but I'll soon be responsible for others that are in Full. As for network connectivity, I don't believe it is point to point, but I don't know for sure - I will find out. I assume these questions are to determine how feasible shipping the logs are? As for redundancy, my understanding is we're covered - I believe we have a couple pipes out of each facility - each on a different carrier.

  • Thanks SA.

    My 3rd party or native question was really to get at the robustness of the native solutions given our situation - many databases, some very large, distant geographic locations, etc...

  • Henry Lee (7/24/2009)


    There will be som that can not lose any data and others that can lose a days' worth.

    Similarly, there will be some that can not be down for more than a minute and others that can be down for a day.

    Ok... it looks like your project would include critical and non-critical databases.

    Next task would be to find out how may are in each category.

    Lets address the critical ones after we have an answer to the question implied in the paragraph above.

    In regards to the non-critical ones -can loose one day worth of data and can be down for a day - I would probably consider going with the old-n-good full restore solution. Take a full backup in a daily basis, move it to tape, ship the tapes to the secondary location. If you do not have budget to ship the tapes in a daily basis, send them to an off site location and in case they are needed somebody picks them up and takes the next flyigth down to Georgia. Just in case I would do the exercise three or four times a year.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Thanks again PaulB.

    I don't have any better feel at the moment for how many databases fit in each category - but I get the point of the exercise - the solution will depend on the classification. My worry here is that's a task in itself - to sit down and categorize 500 databases - and then one day you find someone added a mission critical table to a database previously categorized as "can be down for an hour"...Almost makes you want to have just one solution for all databases - but that's neither here nor there - we'll have to work through that.

    For the point of our conversation, let's say we've got 200 of the 500 databases are deemed critical. And some of these are the 50-100GB ones and some of them are highly transactional. If we can keep the size of the logs reasonable through frequent backups (I also see you can compress the backups in 2008 Enterprise - which we don't have but maybe we can make a case for - so that may ease the shipping) is log shipping robust enough? Or are there too many stories out there of the log chain being broken?

  • I also understand if the primary goes down and you can't get the tail of the log,

    That depends. If your mdf is crashed and u're able to connect somehow to the database , you can still get the tail of the log.

    Log shipping is the simplest one to implement. If you cannot afford to lose any record, i suggest u go for clustering or mirroring with synchronous type, Performance will be little hampered in this type of mirroring.

    Or are there too many stories out there of the log chain being broken?

    Log chain do not break automatically unless someone truncated the transaction logs OR someone took a tran log backup manually which was not shipped.



    Pradeep Singh

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply