Azure SQL Database comes with a host of disaster recovery options. The most robust of these options involves replicating the database across different geographic regions using Geo-Replication. In this article, I'll discuss how you can monitor the state of Geo-Replication using Azure SQL Database Dynamic Management Views (DMV's).
Azure SQL Database Disaster Recovery in Brief
Disaster recovery in Azure SQL Database comes in four flavours. The simplest disaster recovery option is Point in Time Restore, which I covered in depth in a previous blog post. It comes built in to Azure SQL Database, and automatically backs up your database daily and allows you to restore the database from between 7 and 35 days depending on the Service Tier of your database.
Geo-Restore is similar to Point in Time Restore, except that the database is backed up daily and stored in a different geographical location to the original database. Storing the database in this manner mitigates data loss that would happen in an entire data centre were to go offline. Like Point in Time Restore, Geo-Restore comes built-in to Azure SQL Database.
The two most resilient Azure SQL Database disaster recovery options are Standard Geo-Replication and Active Geo-Replication. Both of these options replicate changes from a primary to a secondary database asynchronously. Transactions are first applied to the primary database, then buffered and applied to the secondary database. Because of this architecture, committing the changes to the secondary database does not block writing to the primary database. The major differences between Standard Geo-Replication and Active Geo-Replication is the service tier availability, and the fact that the secondary database for Standard Geo-Replication cannot be read, while the secondary database for Active Geo-Replication is read-only (making it a great option for a reporting database, if you have already shelled out for the Premium Service Tier).
The table below shows the different disaster recovery options and their associated Service Tiers.
Disaster Recovery Type | Basic Tier | Standard Tier | Premium Tier |
Point-in-time restore | Any restore point within the past 7 days | Any restore point within the past 14 days | Any restore point within the past 35 days |
Geo-restore | RTO* < 24 hours RPO† < 24 hours | RTO* < 24 hours RPO†< 24 hours | RTO* < 24 hours RPO† < 24 hours |
Standard geo-replication | Not included | RTO* < 2 hours RPO† < 30 minutes | RTO* < 2 hours RPO† < 30 minutes |
Active geo-replication | Not included | Not included | RTO* < 1 hour RPO† < 5 minutes |
Figure 1 - Disaster Recovery Options with Service Tiers (Taken from Azure SQL Database Business Continuity)
* Recovery Time Objective (RTO) - Maximum downtime before the application is fully functional after a failure.
† Recovery Point Objective (RPO) - Maximum amount of most recent data changes (time interval) the application could lose before it is fully functional after a failure.
Geo-Replication DMV's
If you were going to go down the path of using one of the Geo-Replication options, you would probably want to be able to check in on the Continuous Copy relationship between your primary and secondary databases. Fortunately, Azure SQL Database comes with couple of DMV's that allow you to monitor the status of your Geo-Replication.
In order to demonstrate these DMV's and the information that they provide, I have added set up my PDH_OrderData database to use Standard Geo-Replication. I'm not going to go through the steps necessary to set up Standard Geo-Replication because they can all be found here - http://azure.microsoft.com/blog/2014/09/03/azure-sql-database-standard-g..., but it is really just as simple as clicking the Geo-Replication link and following the prompts.
Figure 1 - Geo-Replication Setup
The first DMV we'll have a look as is sys.dm_operation_status. This DMV shows the status of all databases on the server and the operations that have happened to those databases, including creating a Continuous Copy relationship (which is another name for Geo-Replication). A sample query with this view is shown below. The results show that a "CREATE CONTINUOUS DATABASE COPY" operation has been performed against this database. This DMV exists in the Master database.
Figure 2 - sys.dm_operation_status
The next DMV that we'll look at is sys.dm_database_copies, which also exists in the master database. This DMV shows the partner server and the partner database for the Geo-Replicated database. Notice that is_offline_secondary is true, which indicates that this is Standard Geo-Replication because the secondary database is not active. If this was Active Geo-Replication, is_offline_secondary would be false because the secondary database is read-only.
Figure 3 - sys.dm_database_copies
The final DMV that we are going to look at is sys.dm_continuous_copy_status, which exists in the database that is the Geo-Replication source. This DMV shows one row for every Geo-Replication target. With Active Geo-Replication, there can be up to 4 Geo-Replication targets so this DMV would be a good place to look to make sure that all of your Geo-Replication Targets are online.
Figure 4 - sys.dm_continuous_copy_status
And that's it. With Disaster Recovery in SQL Azure, Standard and Active Geo-Replication are the most robust options. There are 3 DMV's in place: dm_operation_status, dm_database_copies and dm_continuous_copy_status that you could use to make sure that your Geo-Replication sources and targets are running as they should be. Stay tuned for more posts on Azure SQL Database.
References
http://azure.microsoft.com/blog/2014/07/12/spotlight-on-sql-database-active-geo-replication/
http://azure.microsoft.com/blog/2014/09/03/azure-sql-database-standard-geo-replication/
http://msdn.microsoft.com/en-us/library/azure/hh852669.aspx
http://msdn.microsoft.com/en-us/library/azure/dn741329.aspx
http://msdn.microsoft.com/en-us/library/411b2e71-4421-4ef5-900d-5af068750899