An issue exists in SQL Server 2012 SP1 CU6 Availability Group Primary databases involved in Service Broker conversations. If you reused conversations and failover mode is set to manual, depending on how SQL Server had been shutdown previously, the transmission queue starts growing. Messages are sent, received by the target which sends acknowledges but where before these acknowledged messages would disappear from sys.transmission_queue, now they remain and build up. The transmission status shows nothing indicating the message has not been sent but it has. Service Broker makes no attempt to re-transmit them which is a blessing but they accumulate in the transmission queue.
The Cause
The problem occurs when the following events happen to the Availability Group Primary / Service Broker database:
- The SQL Service is stopped / restarted, even cleanly using the configuration manager.
- The owning clustered SQL service node is rebooted.
- The clustered service is moved to another node using the Failover Cluster Manager in Windows or using PowerShell.
- Any event that causes a restart of the SQL Service if there is an Availability Group Primary / Service Broker database with failover mode = manual
The Service Broker class responsible for clearing the transmission queue is not being invoked.
To fix for the problem
Here is the short list of items that need to be undertaken to solve the issue:
- Take the database out of the availability group on the primary.
- End all connections to the database , single user mode.
- Disable then re-enable the Service Broker option in the database.
- Add the database back to the Availability Group, reinitialize any standby servers.
The Problem Definition
Taking down applications that reference the database and the re-initialization of all secondary servers is the problem. When the database is out of the availability group (AG), it is unavailable for applications that connect using the AG Listener IP address. Secondary copies of the AG database are unavailable until re-initialized and synchronizing, disaster recovery and asynchronous reporting data may be unavailable for a while.
Problem Reproduction
The reproduction below is in a virtual lab environment, it uses scripts freely available on msdn to build a test environment with an Availability Group Database as part of a Service Broker dialog.
The lab environment
The SQL Service Active Directory Account is a member of the System Administrator role in SQL Server on all instances. Each of the SQL Servers has a local share C:\Share
- PBVAD – Domain controller, Windows Server 2012 R2
- PBV001 – SQL Server 2012 SP1 CU6 – Default instance, Service Broker Target
- PBV002 & PBV003 – SQL Server 2012 SP1 CU6 – Default instances, Service Broker Initiators. The servers are part of a windows cluster with a file share witness.
Here is the code:
01 – Setup The Service Broker Target – PBV001
------------------------------------------ --Creating the Target Database-- ------------------------------------------ --Create a Service Broker endpoint USE master; GO IF EXISTS (SELECT * FROM master.sys.endpoints WHERE name = N'InstTargetEndpoint') DROP ENDPOINT InstTargetEndpoint; GO CREATE ENDPOINT InstTargetEndpoint STATE = STARTED AS TCP ( LISTENER_PORT = 4022 ) FOR SERVICE_BROKER (AUTHENTICATION = WINDOWS ); GO --Create the target database, master key, and user IF EXISTS (SELECT * FROM sys.databases WHERE name = N'TargetDB') DROP DATABASE TargetDB; GO CREATE DATABASE TargetDB; GO USE TargetDB; GO CREATE MASTER KEY ENCRYPTION BY PASSWORD = N'<EnterStrongPassword1Here>'; GO CREATE USER TargetUser WITHOUT LOGIN; GO --Create the target certificate & backup the certificate CREATE CERTIFICATE InstTargetCertificate AUTHORIZATION TargetUser WITH SUBJECT = 'Target Certificate', EXPIRY_DATE = N'12/31/3010'; GO BACKUP CERTIFICATE InstTargetCertificate TO FILE = N'C:\Share\InstTargetCertificate_Node3.cer'; GO --Create message type for request and reply CREATE MESSAGE TYPE [//RequestMsg] GO CREATE MESSAGE TYPE [//ReplyMsg] GO --Create a contract for the above created message CREATE CONTRACT [//SampleContract] ( [//RequestMsg] SENT BY INITIATOR, [//ReplyMsg] SENT BY TARGET ) --Create Queue & Service for target USE TargetDB; GO CREATE QUEUE TargetQueue_TargetDB; GO CREATE SERVICE [//TargetService] AUTHORIZATION TargetUser ON QUEUE TargetQueue_TargetDB([//SampleContract]); GO
Once complete, copy the target certificate in C:\Share to the corresponding folder on server PBV002
02 – Setup Service Broker Initiator – PBV002 (PBV003 setup later during create of AG)
--------------------------------------------- --Creating the Initiator Database-- -------------------------------------------- --Create a Service Broker endpoint USE master; GO IF EXISTS (SELECT * FROM sys.endpoints WHERE name = N'InstInitiatorEndpoint') DROP ENDPOINT InstInitiatorEndpoint; GO CREATE ENDPOINT InstInitiatorEndpoint STATE = STARTED AS TCP ( LISTENER_PORT = 4022 ) FOR SERVICE_BROKER (AUTHENTICATION = WINDOWS ); GO --Create the initiator database, master key, and user IF EXISTS (SELECT * FROM sys.databases WHERE name = N'InitiatorDB') DROP DATABASE InitiatorDB; GO CREATE DATABASE InitiatorDB; GO USE InitiatorDB; GO CREATE MASTER KEY ENCRYPTION BY PASSWORD = N''; GO CREATE USER InitiatorUser WITHOUT LOGIN; GO --Create the initiator certificate CREATE CERTIFICATE InstInitiatorCertificate AUTHORIZATION InitiatorUser WITH SUBJECT = N'Initiator Certificate', EXPIRY_DATE = N'12/31/3010'; BACKUP CERTIFICATE InstInitiatorCertificate TO FILE = N'C:\Share\InstInitiatorCertificate_Node1.cer'; GO CREATE MESSAGE TYPE [//RequestMsg] GO CREATE MESSAGE TYPE [//ReplyMsg] GO --Create a contract for the above created message CREATE CONTRACT [//SampleContract] ( [//RequestMsg] SENT BY INITIATOR, [//ReplyMsg] SENT BY TARGET ) --Create the initiator queue and service CREATE QUEUE InitiatorQueue_InitiatorDB; CREATE SERVICE [//InitiatorService] AUTHORIZATION InitiatorUser ON QUEUE InitiatorQueue_InitiatorDB; GO --Create references to target objects CREATE USER TargetUser WITHOUT LOGIN; CREATE CERTIFICATE InstTargetCertificate AUTHORIZATION TargetUser FROM FILE = N'C:\Share\InstTargetCertificate_Node3.cer' GO --Create routes DECLARE @Cmd NVARCHAR(4000); SET @Cmd = N'USE InitiatorDB; CREATE ROUTE Instiator_TO_Target_Route WITH SERVICE_NAME =N''//TargetService'', ADDRESS = N''TCP://PBV001:4022'';'; EXEC (@Cmd); SET @Cmd = N'USE msdb CREATE ROUTE Inst_Local_InitiatorRoute WITH SERVICE_NAME = N''//InitiatorService'', ADDRESS = N''LOCAL'''; EXEC (@Cmd); GO CREATE REMOTE SERVICE BINDING TargetBinding TO SERVICE N'//TargetService' WITH USER = TargetUser; GO
Once complete, copy the new initiator certificate from C:\Share on PBV002 to the corresponding folder on PBV001
003 – Complete Target Server Setup – PBV001
------------------------------------------------------- --Complete the Target Conversation Objects ------------------------------------------------------- --Create references to initiator objects USE TargetDB GO CREATE USER InitiatorUser WITHOUT LOGIN; CREATE CERTIFICATE InstInitiatorCertificate AUTHORIZATION InitiatorUser FROM FILE = N'C:\Share\InstInitiatorCertificate_Node1.cer'; GO --Create routes DECLARE @Cmd NVARCHAR(4000); SET @Cmd = N'USE TargetDB; CREATE ROUTE Target_To_Initiator_Route WITH SERVICE_NAME = N''//InitiatorService'', ADDRESS = N''TCP://PBV002:4022'';'; EXEC (@Cmd); SET @Cmd = N'USE msdb CREATE ROUTE Inst_Local_TargetRoute WITH SERVICE_NAME = N''//TargetService'', ADDRESS = N''LOCAL'''; EXEC (@Cmd); GO GRANT SEND ON SERVICE::[//TargetService] TO InitiatorUser; GO CREATE REMOTE SERVICE BINDING InitiatorBinding TO SERVICE N'//InitiatorService' WITH USER = InitiatorUser; GO
04 – Send Message from Initiator – PBV002
-------------------------------------------------- --Beginning Conversation, send test message -- -------------------------------------------------- USE InitiatorDB; GO DECLARE @InitDlgHandle UNIQUEIDENTIFIER; DECLARE @RequestMsg NVARCHAR(100); BEGIN TRANSACTION; BEGIN DIALOG @InitDlgHandle FROM SERVICE [//InitiatorService] TO SERVICE N'//TargetService' ON CONTRACT [//SampleContract]WITH ENCRYPTION = Off; SELECT @RequestMsg = N'<RequestMsg:2>Message for Target service.</RequestMsg:2>'; SEND ON CONVERSATION @InitDlgHandle MESSAGE TYPE [//RequestMsg] (@RequestMsg); SELECT @RequestMsg AS SentRequestMsg; COMMIT TRANSACTION; GO
At this point you should be able to send a message from the initiator to the target. The target receives the message, acknowledges it and it is removed from the initiator transmission queue.
Next perform the following actions to create an Availability Group and add the InitiatorDB to it.
- Backup the InitiatorDB on PBV002, restore WITH NORECOVERY to PBV003.
- Create an Availability Group with a Listener and add the InitiatorDB on PBV002 & PBV003, synchronous, manual failover.
- Change the TargetDB Service Broker route to point to the Availability Group Listener name of the Initiator.
- Send a test message. Again message is sent, received by target, acknowledged and removed from initiator transmission queue.
Now, restart the SQL Service of the Availability Group Primary/Service Broker Initiator using the configuration manager. Service Broker Messages are still sent, received and acknowledged but they are no longer removed from the initiator transmission queue. If you are sending a lot of messages and reusing conversations then this is a serious problem that needs to be fixed quickly and involves downtime
Conclusion
An update that enabled you to stop and restart the SQLAG primary service using the SQL configuration manager, without the issue occurring, is needed. If you could do this without breaking service broker and having to reinitialize all AG standby’s after, it would be a big step forward.
Ideally a fix is needed for crash recovery scenario’s too where logs are rolling forward/back. Just the restarting of the service is enough to trigger the issue at the moment.
Since writing this article I found a post about the same issue - http://www.sqldiablo.com/2012/04/15/service-broker-alwayson-availability-groups-odd-transmission-queue-behavior/
He doesn't state the problem is specific to manual failover mode which it is now. It mentions a fix was pending but it hasn't materialized yet as of SQL Server 2012 SP1 CU6.
UPDATE 2013/12/23 - A FIX WILL BE INCLUDED IN SQL SERVER 2012 SP1 CU 9 DUE MARCH 2014