Replication as a partial DRS

  • Often when we discuss a disaster recovery solution (DRS) in the realm of SQL and databases, we are discussing corruptions of data, failed HDDs, or other "data" related issues. Of course we all know that a complete DRS must include site outages, server malfunctions, and any number of other aspects. I recently read that replication should never be used as part of a DRS, which I assumed was in the sense that it does not replace a backup. The opinion I was given was that in my scenario it was also a poor choice. I wanted to start a separate thread to find out why and to gain a better understanding for myself and others out there.

    Scenario: Application communicates with ServerA to read and write data. ServerA replicates to ServerB with transactional replication. If ServerA becomes unreachable (internet issues, power outage, server problem, etc.) the application can be configured to now point to ServerB. Assuming that the business understands the risks and possibility of some data being lost and not replicated between this switch, but that the priority is to get the application up and running again... Why is this a bad thing?

    Optimally, I know this is not the best set-up. However, for what we have I can see no fundamental problems with it. This is only 1 small part of a large DRS covering several scenarios.

    Now I suck up information from the more experienced 🙂

    Jared
    CE - Microsoft

  • How would you revert to the original server once it's back up and running?

    You would have data in the backup server that has no obvious way of getting back to the original server.

  • Ian Scarlett (1/26/2012)


    How would you revert to the original server once it's back up and running?

    You would have data in the backup server that has no obvious way of getting back to the original server.

    Great question. A manual sync would prove to be sufficient before re-establishing replication on the publications that are transactional only. I did not mention that we also have some publications that are transactional with updateable subscriptions which will sync when brought back online. RedGate SQL Data Compare is also a tool for more complex databases.

    Jared
    CE - Microsoft

  • That does seem an awful lot of work for something that looks tailor made for Database Mirroring.

  • Ian Scarlett (1/26/2012)


    That does seem an awful lot of work for something that looks tailor made for Database Mirroring.

    I'm sorry that this was not in my original description, but it seems the more that I think about this I come up with other information 🙂

    Mirroring would be fine, except that this failover is application specific... Not data failover. For example, in the applications that only read data we need them to be able to access the data at the backup site in the case that the application can not connect to the primary site (for any number of reasons that would not trigger a failover in mirroring). In a mirroring set-up we would not be able to read data off of the mirrored site.

    So, because of this requirement we have to do more work in the case of a larger problem. However a lot less work is needed in smaller issues.

    Jared
    CE - Microsoft

  • Paul Randal has an article for Replication as HA:

    http://www.sqlskills.com/BLOGS/PAUL/category/Replication.aspx

  • Wildcat (1/26/2012)


    Paul Randal has an article for Replication as HA:

    http://www.sqlskills.com/BLOGS/PAUL/category/Replication.aspx

    Ahh... Good to know Paul Randal makes the same arguments I do 🙂

    Jared
    CE - Microsoft

  • That is a nice article from Paul.

    From your requirements, there is one thing that would still bother me about using transactional replication...

    Argument 8: There's no easy way to fail back again

    Correct - for transactional replication - that can be a PITA. With peer-to-peer replication that problem disappears because the changes that happened while one peer node was down are automatically pushed to it when it comes back online.

    BTW. I'm not disagreeing with your thinking on this, just thinking out loud about potential issues.

  • Jared,

    Reading Pauls article what immediately jumps out to me is arguments 5 and 6 and perhaps 8 are pretty good reasons for not going with replication as a first choice DR solution. In most cases 5 and 6 would be clinchers not to use replication. I think also the conclusion I would come to from that article is that it is never the best solution UNLESS you needed the copy database to be available to users, which appears to be your situation.

    This leads us onto is HA (which Paul is talking about) the same as DR? Your question is based on DR, but really I would argue your primary purpose is scaling out your database, using it for DR is a by product of having a copy of your database available.

    DR (to me) means I can get the system back in the event of any event up to total loss of a data center, HA means I have a copy of the database I can fail over to in a reasonably quick period of time. These descriptions obviously overlap.

    This means database mirroring and logshipping fill the DR criteria more, replication and clustering lean more to the HA role.

    My main issue with replication is it is the least likely method of guaranteeing ALL my data will be available on the failover and the whole application will be usable

    ---------------------------------------------------------------------

  • Ian Scarlett (1/27/2012)


    That is a nice article from Paul.

    From your requirements, there is one thing that would still bother me about using transactional replication...

    Argument 8: There's no easy way to fail back again

    Correct - for transactional replication - that can be a PITA. With peer-to-peer replication that problem disappears because the changes that happened while one peer node was down are automatically pushed to it when it comes back online.

    BTW. I'm not disagreeing with your thinking on this, just thinking out loud about potential issues.

    I completely understand. Most of our data is transactional with updateable subscriptions, which I suppose is what he is referring to as peer to peer. This is because we will write to 2 different sites and have to eventually have the data on both servers. So, although there are cases where it would be a PITA, most of that type of stuff is peer-to-peer and will sync when it comes back up.

    Jared
    CE - Microsoft

  • george sibbald (1/27/2012)


    Jared,

    Reading Pauls article what immediately jumps out to me is arguments 5 and 6 and perhaps 8 are pretty good reasons for not going with replication as a first choice DR solution. In most cases 5 and 6 would be clinchers not to use replication. I think also the conclusion I would come to from that article is that it is never the best solution UNLESS you needed the copy database to be available to users, which appears to be your situation.

    This leads us onto is HA (which Paul is talking about) the same as DR? Your question is based on DR, but really I would argue your primary purpose is scaling out your database, using it for DR is a by product of having a copy of your database available.

    DR (to me) means I can get the system back in the event of any event up to total loss of a data center, HA means I have a copy of the database I can fail over to in a reasonably quick period of time. These descriptions obviously overlap.

    This means database mirroring and logshipping fill the DR criteria more, replication and clustering lean more to the HA role.

    My main issue with replication is it is the least likely method of guaranteeing ALL my data will be available on the failover and the whole application will be usable

    I see what you mean about high availability. I suppose there is certainly overlap in my case, but I typically think of high availability as something that allows us to do maintenance operations without system downtime, as opposed to something out of our control. However, in this case it may be both. There have certainly been times where we do our maintenance and restart a server while the application writes to the secondary. When it comes back up, the updateable subscription syncs the data. I suppose this can be considered a very "lightweight" DR, but moreso a high availability option.

    Jared
    CE - Microsoft

  • SQLKnowItAll (1/27/2012)


    Ian Scarlett (1/27/2012)


    That is a nice article from Paul.

    From your requirements, there is one thing that would still bother me about using transactional replication...

    Argument 8: There's no easy way to fail back again

    Correct - for transactional replication - that can be a PITA. With peer-to-peer replication that problem disappears because the changes that happened while one peer node was down are automatically pushed to it when it comes back online.

    BTW. I'm not disagreeing with your thinking on this, just thinking out loud about potential issues.

    I completely understand. Most of our data is transactional with updateable subscriptions, which I suppose is what he is referring to as peer to peer. This is because we will write to 2 different sites and have to eventually have the data on both servers. So, although there are cases where it would be a PITA, most of that type of stuff is peer-to-peer and will sync when it comes back up.

    peer to peer replication was introduced with SQL2005 (IIRC), its not the same as updateable subscriptions. I'm no expert in replication so thats about all I know, peer to peer may even offer better DR capabilities, though I think it is still prone to the same weaknesses as other forms of replication with regard to automatically replicating all data.

    ---------------------------------------------------------------------

  • Well, since I am reading the 70-432 training kit... I will look it up! 🙂 I suppose I was taught the ways in which things are done at the company and got stuck into those patterns. Time to see if there are better solutions for the situation.

    Jared
    CE - Microsoft

  • Peer to peer could be used as a DR solution. It was originally implemented as a multi-site database setup where shared resources weren't possible but two way communication (near realtime) was a requirement.

    You could quite easily run half your users on each server and in the event of failure switch a dns alias to look at the other server. Data loss should be minimal since it will get queued in the distribution database ready for when the server came back online. This would need to be scaled for this occurance as it could be many hours for a hardware or network failure.

  • I just did a quick "click" on replication to create a new publisher and went through a bit of the process to see that peer-to-peer is not an option. I know I'm being lazy, but is this something else not in the wizard or something?

    Jared
    CE - Microsoft

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply