Blog Post

Avoiding a split brain in database mirroring

,

You’re utilizing the database mirroring high-availability feature, configured to be in the High Safety mode with automatic failover, which means that the mirroring configuration includes having a witness server to ensure that the system can automatically switch over to the partner in the event of a failure of the principal.

Under this scenario, if the partner server loses its connection to the principal, it will contact the witness server to figure out what to do. In order for automatic failover to occur, both the partner and witness servers have to agree that the principal server is out of commission. There are three possible outcomes:

  1. If the witness is still in contact with the principal, then an automatic failover will not occur. The principal server will remain online, serving the application. However, the transaction log will start to accumulate transactions that cannot be removed until the connection has been re-established and these transactions have been sent to the partner.
  2. If the witness has also lost its connection with the principal, then the partner and witness servers will reach a quorum and the partner database will be brought online, becoming the new principal (specifically, it will undo any un-committed transactions and then bring the database online).
  3. If the partner server is also disconnected from the witness server, then automatic failover will not occur.

The purpose of this last outcome is to prevent what is known as a split brain – a situation where both of the servers are online, serving up results to application requests. If this were to occur, then there would be two different versions of the database, and reconciling these differences would become problematic.

Now let’s consider what would happen if the principal server were to lose its connection to both the partner and witness servers, however the principal server is still online (let’s just say that the partner and witness servers are each in separate remote locations that has had a service interruption – such as a change to a network switch that has isolated these servers from the principal). With the principal server still being online, and potentially able to service at least some users, what does it need to do to ensure that the mirror does not end up in a split brain situation?

Let’s first look at what the possible scenarios are now:

  1. The partner server is in communication with the witness server, and they have reached a quorum where the partner server has been brought online.
  2. The partner server is not in communication with the witness server, and thus it is still in a partner state.

Since the principal server has lost communication with both servers, it can’t know in which situation the mirror is in. Therefore, it needs to protect the mirror from potentially being in a split brain situation. It does this by assuming that the partner server has been elevated to become the mirror’s principal, and so it goes into the partner state. Specifically, this means putting the database into recovery in order to be able to receive transactions from the new principal server.

Is this what actually happens? Well, it’s easy enough to test this out. On three different instances of SQL (they can even be on the same server), establish a mirror in high safety mode with a witness server. After it is established, ensure that SQL Agent is shut down (if running, it will restart SQL), and use task manager to end the processes for sqlservr.exe on the secondary and witness instances. Now look at the status of the database in SSMS, and you will see that its status is “Principal, Disconnected / In Recovery). Alternatively, you can run this following query to see what the status is:

SELECT  sd.name, sd.state_desc,
        sdm.mirroring_role_desc,
        sdm.mirroring_state_desc,
        sdm.mirroring_witness_state_desc
FROM    sys.DATABASES sd
        JOIN sys.database_mirroring sdm ON sd.database_id = sdm.database_id;

To wrap up, mirroring prevents a split-brain situation by:

  1. Both the partner and witness servers have to be in communication with each other, and have lost communication with the principal, in order for the partner server to be elevated to the mirror principal.
  2. If the principal server becomes disconnected from both the partner and witness servers, it goes into partner status and puts its database into recovery.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating