Now and then my colleagues and I find our selves in the middle of a discussion about SQL Server Mirroring versus the more traditional Failover Clustering. Both are features provided by SQL Server for achieving high availability, buth they both have pro’s and con’s that could make a decision hard. It’s most often these pro’s and con’s we are discussing for different setups.
One of the features that mirroring gives us, almost never make it to our discussions. And since this feature is actually pretty nice and powerful, I thought it was worth sharing – and perhaps we will remember to take it into account the next time we battle about HA
The feature is called “Automatic Page Repair”, and is documented here: http://msdn.microsoft.com/en-us/library/bb677167.aspx
If a page corruption has been detected on the prinical server, it can automatically read that specific page from the mirror database and fix the principal. How cool is that?!” Using a failover cluster, the database files are only located in one location, and if something happens with the underlying storage system corrupting your data, the only way of fixing it is to restore a backup as soon as possible after you identify the corruption. No need to say, that this is not optimal.
Demo
To show you how this works, I have created a mirror between two local instances on my computer. I won’t go into details about the creation of the mirror, but now I have this on my computer:
The mirrored database is MyData, and sqlmaster\sql2008 instance is currently the principal.
I created a dummy table using this script:
use MyData go create table TestData (id int identity, col1 char(2000)) go insert into TestData (col1) values (replicate('a', 2000)) go 4096
Now I will simulate a disk failure on the sqlmaster\sql2008 server, so I stop the service and open the .mdf file in a hex editor. Somewhere down the file, this show up:
All the a’s are the data column, and the part in the middle is probably a page header. The details are not important for this test, but I’ll replace a bunch of the data with zeros:
Save the file, and start up the sql server.
Now I perform a table scan by simply issuing a select * from… and this is what happens:
Msg 824, Level 24, State 2, Line 1 SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xb4f55a1a; actual: 0xa4a264f1). It occurred during a read of page (1:840) in database ID 5 at offset 0×00000000690000 in file ‘C:\DemoData\MyData.mdf’. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
That is a message we normally don’t want to see. It states that page 840 is corrupt, but because the database is part of a mirror, some cool magic has happened. If I run the select again, I get a similar error – but this time for page 841. And third time I execute the select *, it succeeds! What has happened is the “Automatic Page Repair” mechanism kicking in. SQL Server provides a DMV called sys.dm_db_mirroring_auto_page_repair to see if repairs like that have happened, so let’s have a look at that:
SELECT DB_NAME(database_id) as DatabaseName, * FROM sys.dm_db_mirroring_auto_page_repair;
Here we can actually see that page_id 840 and 841 were repaired automatically.
The automatic page repair feature does not prevent an error to occur, but when a query tries to access a corrupt page, it will be repaired automatically so it is ready to use for the next query. So the impact of disk errors will be limited.
If you are using mirroring in your setup, try to have a look at this DMV. If a lot of errors show up here, it might be an indication of bad disks, and you should take action to fix it.
So if you find yourself needing arguments in favor of mirroring, this is a feature worth remembering.