Availability Group Listener (AGL) behavior question

  • Hello - rather new to this feature and looking for advice. So, I have existing CNAME records in our test environment that are used in the connection strings as defined in the repos etc. The standard string is more or less

    "Data Source=tcp:demoSqlPrincipal.xyz.here;Initial Catalog=DemoDb;Integrated Security=true;Failover Partner=demoSqlMirror.xyz.here;"

    where Data Source  and FailoverPartner were pointing to physical server but now point to the AGL

    I plan to arrange for the dev teams to update the strings for AGL specific switches thus removing the need for FailoverPartner.

    During weekend patching and the servers failover gracefully - I still get a slew of login failures. My expectation was that the AGL would seemlessly direct traffic right away to the new primary.

    1. is it best-practice to use the AGL in the connection string itself ? I prefer a "forever" or "future-proof" CNAME for the the application team to use in Cnx strings. I will need to ask them to change it for multisubnet=True and remove the FailoverPartner? Does anyone keep the FailoverPartner switch ? Doubt it.

    2. what do we think is cached at the application host ? is it caching the physical that the CNAME eventually resolves to? I would think that all application host servers just know the name of the AGL or the corresponding IP, right ?

  • CNAME to the Listener, no need for Failover Partner anymore.

    There could be a whole host of reasons as to the login failure, not limited to, but

    You're on a multi subnet environment, you set RegisterAllProvidersIP to 0, so it failed to an alternative subnet and DNS took its usual 15 minutes to update itself (if you didn't change the TTL), so the cached IP record hadn't updated while attempting to connect.

    On a multi subnet environment and you haven't added MultiSubnet=True to the connection string, although if you didn't do this I'd be shocked it is working normally as I would expect you to have failures frequently if it read the none primary IP address first. (NOTE: before you add this check the provider you use as most do not support MultiSubet, e.g if you use the old SQLOLEDB or SQL Native Client, they don't support it, so adding it wont do you any good, you need to use a provided like MSSQLOLEDB the new version to use it)

    The AG was still failing over or in transition, it could be a database was still in the Analysis/Redo/Undo phases of recovery, where it was reading the log to bring the DB up to date, just because the transactions are hardened to the log, doesn't mean they have been pushed to the data file, especially if you have a redo backlog. Recovery must complete before the DB accepts connections again.

  • bukester wrote:

    Hello - rather new to this feature and looking for advice. So, I have existing CNAME records in our test environment that are used in the connection strings as defined in the repos etc. The standard string is more or less

    "Data Source=tcp:demoSqlPrincipal.xyz.here;Initial Catalog=DemoDb;Integrated Security=true;Failover Partner=demoSqlMirror.xyz.here;" where Data Source  and FailoverPartner were pointing to physical server but now point to the AGL

    I plan to arrange for the dev teams to update the strings for AGL specific switches thus removing the need for FailoverPartner.

    During weekend patching and the servers failover gracefully - I still get a slew of login failures. My expectation was that the AGL would seemlessly direct traffic right away to the new primary.

    1. is it best-practice to use the AGL in the connection string itself ? I prefer a "forever" or "future-proof" CNAME for the the application team to use in Cnx strings. I will need to ask them to change it for multisubnet=True and remove the FailoverPartner? Does anyone keep the FailoverPartner switch ? Doubt it. 2. what do we think is cached at the application host ? is it caching the physical that the CNAME eventually resolves to? I would think that all application host servers just know the name of the AGL or the corresponding IP, right ?

    If the AG failed over, and users could not connect, are the logins present on both the primary and secondary?

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Logins are present always - been running Mirrored dbs for a decade. The dbs are on the "dedicated Mirror" instance for a day or so on the weekend and I know the logins are present.  Thanks for replying.

  • bukester wrote:

    Logins are present always - been running Mirrored dbs for a decade. The dbs are on the "dedicated Mirror" instance for a day or so on the weekend and I know the logins are present.  Thanks for replying.

    The users will be present in the databases, but unless you have created the logins on both servers or set up  to sync them, the logins are not created by mirroring or AG's.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • First - thanks so much for replying - big help.

    Ant-Green wrote:

    CNAME to the Listener, no need for Failover Partner anymore.

    Cool. Thats how I designed it. The CName has been reworked by the network guys to manage firewalls. The AGL is less flexible in that its a dedicated IP, subnet masking lives with the cluster and physical servers. If that needs to be rearranged, I don't want to have to do a Pull Request across every dev team to rearrange the furniture. However I would go to setting ConnectionStrings to AGL if there was an advantage.

    Ant-Green wrote:

    There could be a whole host of reasons as to the login failure, not limited to, but

    Good point. I know and I found that the new patching group put both (virtual) nodes in the same group... PLUS the host server was patched right after! Its a customer-facing demo environment but in this location there is little activity, so I did a failover and I saw a much more graceful switch than the weekend.

    Ant-Green wrote:

    You're on a multi subnet environment, you set RegisterAllProvidersIP to 0, so it failed to an alternative subnet and DNS took its usual 15 minutes to update itself (if you didn't change the TTL), so the cached IP record hadn't updated while attempting to connect.

    Sorry, I was vague - I am not on MultiSubnet environment... yet. We have standard edition and at this point, the budget is for us to migrate from Mirroring but with the same redundancy within the datacenter alone. We now have 3 locations but no replicas beyond the local. That said, MS says to use the "MultiSubnet=true" parm even if not exactly relevant:

    "We recommend this setting for both single and multi-subnet connections to availability groups listeners and to SQL Server Failover Cluster Instance names. Enabling this option adds additional optimizations, even for single-subnet scenarios."

    Rereading another MS doc I see:

    "Therefore, if you have legacy clients that need to connect to an availability group listener and cannot use the MultiSubnetFailover property, we recommend that you change RegisterAllProvidersIP to 0."

    So we are legacy in that no strings have MSNF=True and even when we do set that up, it will be irrelevant until we do have cross-datacenter failovers. So, making a leap of faith with the recommended setting to true. Similarly, RegisterAllProviderIP to 0 will be our default setting for now.

    Sound about right?

    • This reply was modified 9 months ago by  bukester.
    • This reply was modified 9 months ago by  bukester.
  • Legacy there means your drivers.

    If your driver doesn't support MSNF then set it to false if you are in a multi subnet setup.

    But if your driver does support it leave it as true, even if you are not in a multi subnet setup.

    Else if you are a single subnet environment anyway, then that setting plays no part in how IP's are registered etc.

     

    If you do go into the need to span multiple locations, then you're going to have to rearchitect the solution again, especially if you are on standard edition as you can only have 2 replicas in an AG.

    So think about your long term goals here as you may want to look at using Enterprise edition, or probably doing an FCI locally, and then an AG on top of the FCI and a standalone in the DR location using Standard, so you can do the DR piece.

    But doing it as a FCI & AG you lose automatic failover to the DR side so that would all be a manual process you would need to put in place for when the poop hits the fan.

     

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply