January 27, 2016 at 6:37 am
We have two nodes in our Always on setup running SQL 2014 SP1 with,just now, 1 database setup for always on.
If I do a forced failover from Node 1 to Node 2 and then back again from Node 2 to Node one everything works perfectly.
However, if I stop the SQL Server services on the Primary node the secondary node does not pick up. The dashboard says the role on the secondary gets stuck into resolving. As soon as I start the services on the Primary node it is all fine.
Where do I even start to debug this issue?
These are the message I see in the dashboard when it doesn't function:
This secondary replica is not connected to the primary replica. The connected state is DISCONNECTED
The role of this availability replica is unhealthy. The replica does not have either the primary or secondary role.
At least one availability database on this availability replica has an unhealthy data synchronization state. If this is an asynchronous-commit availability replica, all availability databases should be in the SYNCHRONIZING state. If this is a synchronous-commit availability replica, all availability databases should be in the SYNCHRONIZED state.
January 27, 2016 at 6:48 am
When you say all is fine after starting the primary what do you mean? What state is each database immediately after that point and which ends up primary?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
January 27, 2016 at 7:10 am
Orlando Colamatteo (1/27/2016)
When you say all is fine after starting the primary what do you mean? What state is each database immediately after that point and which ends up primary?
On Server 1 the Always On database is the Primary, in the Dashboard all is fine. I then Stop the SQL Server Services on Server 1 and the Always on db is no longer available and the dashboard shows resolving and never switches over to the second server. Once I start SQL Server on Server 1 everything come available once again just fine.
January 27, 2016 at 7:29 am
Sounds like you need to look into your failover mode settings coupled with your sync mode.
Also, from a data loss perspective are you running in async or synchronous mode and if you are async is your secondary caught up?
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
January 27, 2016 at 10:04 am
It is set to Synchronous commit. There aren't any data changes as we only have a test database with a few test tables with less than 20 rows in it. Data isn't changing.
January 27, 2016 at 10:18 am
The replicas' being set to synchronous commit isn't sufficient. Both the primary and secondary have to have failover mode set to AUTOMATIC.
What does this query show?
SELECT replica_server_name,
availability_mode_desc,
failover_mode_desc
FROM sys.availability_replicas
Cheers!
January 27, 2016 at 11:23 am
Yes, it is set to failover mode Automatic.
I found the issue and fixed it.
Upon failover I changed it to the recommended setting in this link. Basically it means it will try to bring it online 60 times in a span of 1 hour... so... if it doesn't bring it online in 1 minute it will try to bring it online again.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply