November 13, 2014 at 3:06 pm
We currently have an issue that i'm trying to work out an acceptable solution to:
Our production environment has 3 datacenters (4 instances in dc1, 4 instances in dc2, and 2 instances in dc3)
Our DR environment has 3 datacenters (same instance distribution as production)
In production we have SQL transactional replication set up for a few specific databases on each instance which works as expected.
We currently utilize a block level SAN replication tool (recover point in this case) to replicate our backup, data, and log LUNS over to our DR environment, excluding system db's.
While i've addressed the issue with mounting databases upon failover i'm running into the issue with databases coming over as marked for replication which doesn't allow us to re-establish our transactional setup in DR without dropping the impacted database (offlining the server, changing the file name, deleting the "suspect" DB per MS instructions).
Anyone have any ideas as to how to best approach this issue?
November 18, 2014 at 3:53 am
Hi, a solution that I have implemented and tested very successfully is as follows. Might not be exactly your scenario but it may give you some ideas.
My Scenario
One Production Server with 10 Publications and upto 15 Subscribers
One DR server in a different DC
EMC Clarion SAN attached to both servers (with block Replication).
The Challenge
Perform a failover and continue with Replication still running to all Subscribers
The Method
Very simple. Setup each server independently on each site with exactly the same version of SQL. However on the production server you must also place the system databases and ErrorLogs\Default Trace folder on the replicated LUNS. TempDB can be on a locally attached drive or SSD as long as the same drive letter exists on the DR server. So for example you have the following C(OS - Local), D(TempDB - Local), E(SQL SystemDBs - SAN Replicated), F(SQL TLogs - SAN Replicated), G(SQL DataFiles - SAN Replicated)
The DR server should will be entirely installed on Local drives but the same drive letters for SystemsDBs and TempDB as Prod. i.e C(OS - Local), D(TempDB - Local), E(SQL SystemDBs - Local).
On failover you shut down the Prod server (or it dies if unplanned). Then on the DR server you will need to rename it to be the same as the same name as the Prod server (by dropping out of domain, renaming and adding back into domain). On the DR server, switch the local E: SystemDB drive to something like Q: so its out of the way. Then fail over your SAN drives (once they are synchronized) and they will appear on the DR server. Make them match the Prod Server drive letters.
Start the server up and you should have a fully functioning replicated server. You've essentially tricked it into thinking it was simply down for a small duration. This approach reduced my Publisher Server failover time from 48 hours to 45 mins as I no longer needed to resynch all publications (some of which were almost 1TB in size) 🙂
Just a few points to note.
Make sure both Prod and DR server SQL services are running under the same service accounts (maybe not necessary but just in case).
After failover validate your Publications using sp_publication_validation or tablediff etc
Always keep you DR server uptodate with the same SQL Server patches as your Prod (as its a fully functioning SQL Server on the local disks).
hope this helps
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply