SQLServer HA RTO 100%

Question

Post reply

SQLServer HA RTO 100%

DBA_Learner

SSCarpal Tunnel

Points: 4228
More actions
January 7, 2015 at 6:17 am

#299972

Can someone please clarify me about how to design HA environment with RTO 100%.
Is there any possibility to design a environment, where applications connecting to node 1 and when failover occurs on to node 2, can still applications continue processing w/o any interruption ?
In other words, when any application is inserting data of 1000 records onto Node 1 during maintenance window, in the middle of transaction processing if i failover to node 2, can the application still continue to insert data without any interruption ?

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 1

DBA_Learner (1/7/2015)
Can someone please clarify me about how to design HA environment with RTO 100%.

Um, good luck with that.

That means no maintenance windows, no downtime at all, for any reason, not even 1ms. 5 nines is considered extremely hard requiring massive hardware investments

Is there any possibility to design a environment, where applications connecting to node 1 and when failover occurs on to node 2, can still applications continue processing w/o any interruption ?
In other words, when any application is inserting data of 1000 records onto Node 1 during maintenance window, in the middle of transaction processing if i failover to node 2, can the application still continue to insert data without any interruption ?

No. Any form of failover (at least of SQL HA/DR technologies) drops connections and requires that they be re-established. The only thing I know which doesn't is vMotion, but that has to be a controlled process with both VM hosts available, not something that can be done in response to a failure.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 2

The only way to achieve anything like this is a combination of code and database design, regardless of your data platform. You're going to have to have an application server that is able to respond to loss in connection with an appropriate retry on uncommitted transactions on Server 2 when server 1 goes offline without reporting that retry to the client software. Plus, since you don't want a single point of failure, you need to have your client software able to communicate to multiple app servers and have the app servers communicate to each other. Further, your databases may have to be distributed across multiple servers through sharding in order to ensure additional redundancy.

As Gail says, this requires massive outlays for hardware and software as well as tons of time and effort. There is no single magic bullet. I've worked through the Hadoop infrastructure, with it's scale out paradigm, but even it has to have controllers and those controllers have to be redundant or you once again have a single point of failure.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 3

Grant Fritchey (1/7/2015)
Further, your databases may have to be distributed across multiple servers through sharding in order to ensure additional redundancy.

In multiple data centers, with redundant network connections (not two connections from the same provider), UPS and generator backup, preferably in different parts of the world so that large localised disasters (eg earthquake) can't take everything down.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 4

GilaMonster (1/7/2015)
Grant Fritchey (1/7/2015)
Further, your databases may have to be distributed across multiple servers through sharding in order to ensure additional redundancy.
In multiple data centers, with redundant network connections (not two connections from the same provider), UPS and generator backup, preferably in different parts of the world so that large localised disasters (eg earthquake) can't take everything down.

Exactly. And even then, additional programming to take into account changes in latency, the possibility of extended multi-stage transactions and a transaction manager, or I should say, managers, because again, no single point of failure.

I've only read about setups like this.

Zero downtime, zero loss, and zero recovery time are frequent requests from the business when there is no understanding of exactly what that will entail. This is made worse by sales and marketing which suggests that it's possible without supplying the details of what is actually necessary.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

DBA_Learner SSCarpal Tunnel Points: 4228 More actions · Answer 5

Thanks Grant and Gila for valuable information. I completely agree on your points.

The problem with my organization is lack of knowledge about RTO. More over we have is virtual infrastructure where we can't even easily design clusters atleast to have active active clustering and to have DB Synchronization.

As Grant said, we need to have applications simultaneously connects to different servers at same time, we can think in that direction. But, still its expensive design.

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 6

DBA_Learner (1/7/2015)
The problem with my organization is lack of knowledge about RTO.

Then you need to educate them, they're not going to gain knowledge otherwise. Someone who know about the realities needs to have a chat with the relevant business people about what is feasible, practical and affordable.

Grant's scenario (redundant connections, redundant app servers, redundant data centres, etc) could easily cost tens of millions of dollars in hardware, software and development time. It's easy to say 'we need X', but when the financial and temporal realities come into play, it can often turn into 'We want X, but we can make do with Y because we can't afford X'

Start with a serious and realistic discussion with the business owners about what they actually need for RPO and RTO, not what they would want in a perfect world.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Perry Whittle SSC Guru Points: 234065 More actions · Answer 7

sounds like what they're looking for is Oracle RAC

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

DBA_Learner SSCarpal Tunnel Points: 4228 More actions · Answer 8

DBA_Learner

SSCarpal Tunnel

Points: 4228

January 8, 2015 at 1:07 pm

#1768885

Thanks Gila...