Evaluating SQL HA options for Disaster Recovery

Question

Evaluating SQL HA options for Disaster Recovery

jsqldba

SSChampion

Points: 11257
More actions
June 24, 2011 at 8:59 am

#243269

I'm trying to create a comparison of the various HA options for SQL Disaster Recovery, but a little confused about some specifics on handling the failover of each option. We need to mirror approx 50 databases (perhaps more in the future ) of mixed applications, including SharePoint.
The HA options that I am evaluating are:
Log Shipping
Database Mirroring
SQL Replication
IPStor (SAN hardware) Replication
So it seems straight way that SQL replication is not an option due to requirements that all tables have primary keys.
Log Shipping
Log Shipping sound like a feasible option but from what I have read this will have the largest performance hit on the server and also on network traffic. Is that correct?
Failover is a manual process and sounds like it will take the longest time to perform a failover compared to the other methods.
FailBACK to the primary server (once the failure is corrected) is not really clear. I would presume that you must backup the standby server and restore back to the original primary, then the log shipping must be re-established between the original primary and the standby. But how much downtime for the users would this involve? At what point would you have to take them offline and at what point can you bring them online again?
Database Mirroring
Database Mirroring sounds like a good option but I don't know if it's scalable for our needs. I have read that MS recommends no more than 10 dbs running mirroring on a server for performance reasons. Can anyone confirm that this is true?
Also, not all applications are mirroring-aware (requires ADO.NET 2.0 or higher) so it may not work for certain applications. For SharePoint, the databases must be detached in SharePoint Admin and reattached with the new server name. I'm not clear on whether you could simply rename (or use DNS alias) on the standby server in order to redirect client applications. Can anyone clarify this?
Failover seems to be a fast although manual process (regardless of whether you are using "automatic" failover). The databases failover automatically but security still needs to be updated on the primary and then the applications redirected to the standby server.
Failback seems to be an advantage with database mirroring, as it is more automated. The databases resync themselves automatically rather than requiring a rebuild like log shipping.
IPStor Replication
IPStor replication seems to be the best although the most expensive solution (naturally). With IPStor replication, the data is replicated with the SAN hardware so that there is no impact on server performance.
Failover is manual, but can be simplified by doing a server rename or DNS alias on the standby server. To failover, you mount the standby SAN disks to the standby server, attach the databases, transfer sql security, then rename the server.
Failback seems to be the fastest and is basically a reversal of the failover routine. The SAN hardware resyncs the data and then you do reversal of the server renaming.
Of the options available, ranking is as follows:
Cost: (cheapest to most expensive)
Log shipping
Database mirroring
IPStor replication
Performance: (slowest to fastest)
Log Shipping
Database Mirroring
IPStor replication
Failover/Failback times: (slowest to fastest)
Log Shipping
Database Mirroring
IPStor Replication
Administrative complexity: (most to least)
Log Shipping
Database Mirroring
IPStor Replication
Administrative complexity: (most to least)
Log Shipping
Database Mirroring
IPStor Replication
Data loss exposure: (most to least)
Log Shipping
IPStore Replication
Database Mirroring (running in high safety)
Any clarifications and comment would be GREATLY APPRECIATED.
THANKS

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

calvo SSChampion Points: 12930 More actions · Answer 1

I think the most important question to answer when trying to decide HA options is, what is the business requirement?

Is there an acceptable amount of downtime? Does the secondary server have to be in a remote location? What is your budget? What are you trying to protect against?

These answers will help you make your final decision.

One important HA option you've left out is a clustered instance. Clustering is relatively easy to implement, low maintenance, instant and automatic failover/failback, no application connection string changes all with no additional license requirements (something that is not included in your comparison).

If the requirement is to protect against database corruption, then clustering isn't quite the answer you're looking for. (still worth a look)

______________________________________________________________________________________________
Forum posting etiquette.[/url] Get your answers faster.

Steve Jones - SSC Editor SSC Guru Points: 736280 More actions · Answer 2

No idea on IPStor.

However both database mirroring and log shipping do the same thing: ship transactions from the log to the other server. Log shipping works with log backups, so a granularity of one minute, and no automated failover. On the plus side, you can send to multiple secondaries, and since you are most likely already making log backups, this can be fairly simple. Mirroring sends transactions more one at at time in real time.

Mirroring has automatic failover with updated clients, but it doesn't matter. You can potentially use a DNS entry or local host entries to move clients the same as with log shipping. In both cases, you will have disconnects from any active clients when failover occurs. The automatic ADO.NET 2.0 clients just retry the connection, but if something was in progress, it's lost.

Failback is harder on log shipping. Mirroring can be reversed, so the transactions flow from the second server (previously the mirror) back to the first (previously the primary). Log shipping is as you described, since you must get a database(s) into restoring mode, so a full backup is needed. PIA for any number of databases.

I thought that mirroring was more like 25 dbs/instance. The 10 limit seems to be more for x86. I would hope you are moving to x64. http://sqlcat.com/technicalnotes/archive/2010/02/10/mirroring-a-large-number-of-databases-in-a-single-sql-server-instance.aspx

Ultimately you have to find some balances. You don't necessarily need a one size fits all solution. Your RPO/RTO/SLA for some dbs might need mirroring, others might include log shipping. I wouldn't hesitate to mix them up, and it's not a much more complicated environment to have both. Log shipping is dirt simple in terms of how it works. Backup, copy, restore.

Note that DR is different than HA. You can try to encompass HA in DR, but it gets expensive quickly.

I wouldn't necessarily mix in replication, which has a much higher admin effort, unless you need to spread out load for something like reporting.

LutzM SSC Guru Points: 107049 More actions · Answer 3

Steve mentioned the disconnects from any active clients when failover occurs for mirroring.

That's a major difference to a cluster configuration where the IP (or the system name) doesn't change. Since you mentioned mixed applications, you would need to include the effect of a failover to the client in terms of disconnects/reconnects/connection strings.

As far as I'm aware there's no easy way to implement mirroring without touching the connection strings of existing application(s). Or am I missing something?

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

george sibbald SSC Guru Points: 104210 More actions · Answer 4

Log Shipping
Log Shipping sound like a feasible option but from what I have read this will have the largest performance hit on the server and also on network traffic. Is that correct?

I would say this has the least hit on the server. No asynchronous IO or two phase commit you can get with database or SAN mirroring. If the logs are large can be a problem over the network.

Failover is a manual process and sounds like it will take the longest time to perform a failover compared to the other methods.

can be as simple as restore database with recovery - seconds

FailBACK to the primary server (once the failure is corrected) is not really clear. I would presume that you must backup the standby server and restore back to the original primary, then the log shipping must be re-established between the original primary and the standby. But how much downtime for the users would this involve? At what point would you have to take them offline and at what point can you bring them online again?

no downtime, no outage. In a planned failover can be a quick failback if you take a tail log backup. Unplanned you will have to reestablish from a full backup.

Database Mirroring
Database Mirroring sounds like a good option but I don't know if it's scalable for our needs. I have read that MS recommends no more than 10 dbs running mirroring on a server for performance reasons. Can anyone confirm that this is true?

X86 I found somewhere in the low 20s. 10 is not a hard and fast number. Depends on a lot of factors. But there is a limit.

Also, not all applications are mirroring-aware (requires ADO.NET 2.0 or higher) so it may not work for certain applications. For SharePoint, the databases must be detached in SharePoint Admin and reattached with the new server name. I'm not clear on whether you could simply rename (or use DNS alias) on the standby server in order to redirect client applications. Can anyone clarify this?

We use DNS aliasing

Failback seems to be an advantage with database mirroring, as it is more automated. The databases resync themselves automatically rather than requiring a rebuild like log shipping.

again it depends on the circumstances of the failover but automatic failback is more likely and if you are using high safety mode very simple.

IPStor Replication
Failover is manual, but can be simplified by doing a server rename or DNS alias on the standby server. To failover, you mount the standby SAN disks to the standby server, attach the databases, transfer sql security, then rename the server.

A full server rename at the OS level is time consuming and involves reboots. Replicating system database will give you problems with encryption, orphaned SQL system users and permissions. Consider only replicating user databases and treating it like glorified database mirroring.

SAN replication is simpler for you but is moving complexity to the storage and intel teams.

Failback seems to be the fastest and is basically a reversal of the failover routine. The SAN hardware resyncs the data and then you do reversal of the server renaming.

this is the MAIN advantage of SAN replication, along with not having to do inital full database backups and restores to get the process going. Its HA and DR as well.

Our SAN replication has a 20% IO performance hit.

NOTE though, if the primary disk gets corrupted, or a database deleted, what stops that getting replicated to the secondary disks? 🙂

---------------------------------------------------------------------

muthukkumaran Kaliyamoorthy SSCoach Points: 19026 More actions · Answer 5

@OLDCHAPPY

Read this link & white paper too. This will give an basic idea.

Choose the good SQL server Disaster Recovery plan[/url]

Muthukkumaran Kaliyamoorthy
https://www.sqlserverblogforum.com/

jsqldba SSChampion Points: 11257 More actions · Answer 6

Thanks to everyone for the replies and useful information. Here are some replies to your questions and comments...

I think the most important question to answer when trying to decide HA options is, what is the business requirement?
Is there an acceptable amount of downtime? Does the secondary server have to be in a remote location? What is your budget? What are you trying to protect against?

RPO = 1 hour, RTO = 24 hours.

The standby server will be located in a remote office about 30 miles away.

Budget is not defined at the moment and not really a constraint.

This DR is to protect some business critical applications in case of total loss of primary data center or catastrophic failure of our primary SAN.

One important HA option you've left out is a clustered instance.

We're running the primary server on a 2 node clustered server for HA (both nodes in same physical location). However, our SAN has no HA protection.

No idea on IPStor.

This is a pretty good write-up for those interested: http://www.sanstor.info/7IPStorDRSolutionDatasheet.pdf

We are currently using IPStor to replicate databases for SAP and provide failover to the remote site. I am not an expert on this as this is managed and implemented by a different team. I'm looking at the option to piggyback on their implementation for our needs.

I thought that mirroring was more like 25 dbs/instance. The 10 limit seems to be more for x86. I would hope you are moving to x64. http://sqlcat.com/technicalnotes/archive/2010/02/10/mirroring-a-large-number-of-databases-in-a-single-sql-server-instance.aspx

We are on 64bit servers. I would be very interested to have some more harder evidence on the recommended limit of mirrored database on one 64bit server. We have about 50 to start off with, and possibly another dozen or two in the future (depends on what applications are deemed business critical in the future).

Ultimately you have to find some balances. You don't necessarily need a one size fits all solution. Your RPO/RTO/SLA for some dbs might need mirroring, others might include log shipping. I wouldn't hesitate to mix them up, and it's not a much more complicated environment to have both. Log shipping is dirt simple in terms of how it works. Backup, copy, restore.

Very true but ideally, it would be preferable to have one solution to implement. Mixing different DR solutions will be a huge headache during a live failover event.

Note that DR is different than HA. You can try to encompass HA in DR, but it gets expensive quickly.

Yep. Not really seeking HA in the classic definition but rather evaluating the HA options as potential DR mechanisms.

As far as I'm aware there's no easy way to implement mirroring without touching the connection strings of existing application(s). Or am I missing something?

Redirect by DNS alias is what I have in mind for any solution. We have quite a mix of applications and they use a variety of connection methods.

(re: Log Shipping)
I would say this has the least hit on the server. No asynchronous IO or two phase commit you can get with database or SAN mirroring. If the logs are large can be a problem over the network.

But to get a low RPO you would have to log ship very frequently, say every minute. Doesn't this impact server performance with nearly constant log backups?

(re: Log Shipping)
no downtime, no outage. In a planned failover can be a quick failback if you take a tail log backup. Unplanned you will have to reestablish from a full backup.

What's the difference between a planned failover and an unplanned one? I'm not sure what you mean.

Our SAN replication has a 20% IO performance hit.
NOTE though, if the primary disk gets corrupted, or a database deleted, what stops that getting replicated to the secondary disks?

Zero performance impact with IPStor (check out the doc link above).

For corruption problems, with IPstor you can roll back to timemarks until you get to the point before the corruption.

george sibbald SSC Guru Points: 104210 More actions · Answer 7

planned would be a DR test, so you can prepare for it and therefore fail over in a controlled manner (you don't want to risk data in a test)

Unplanned is an actual DR event. where your primary is unavailable.

---------------------------------------------------------------------

Steve Jones - SSC Editor SSC Guru Points: 736280 More actions · Answer 8

It sounds like you have thought things through well and I'd go with your gut.

re: mirroring: it's mostly a load of transactions issue. You can mirror 100 databases on x86 if they are all relatively low transaction databases, meaning few changes at a time. It's also a reason why you might stagger maintenance stuff because it all gets sent across the wire, and loading all the reindexes or stats changes at one time can end up being a bottleneck.

re: log shipping: log backups aren't necessarily a huge performance it on the server. It ends up being the same amount of records being backed up, just spread over few or more files. The copy (more often) isn't much overhead, either.

It's more an administrative issue, having to deal with more logs, and potentially "catching" them up after an event.

I know you want one solution, and I agree it's easier, but mirroring limits you since it's 1:1 (for now). Log shipping, while more manual, gives you options, like sending logs to two places, which can be very attractive for DR.