Check AAG is the primary before running a SQL Agent job

  • Hi All,

    I am trying to setup SQL so that if the primary becomes the secondary, the SQL agent will check this before running jobs so they don't fail. This is mainly due to monthly patching so there can be a small window where this is the case. So far I have run:

    USE master;

    GO

    IF OBJECT_ID('dbo.fn_hadr_group_is_primary', 'FN') IS NOT NULL

    DROP FUNCTION dbo.fn_hadr_group_is_primary;

    GO

    CREATE FUNCTION dbo.fn_hadr_group_is_primary (@AGName sysname)

    RETURNS bit

    AS

    BEGIN;

    DECLARE @PrimaryReplica sysname;

    SELECT

    @PrimaryReplica = hags.primary_replica

    FROM sys.dm_hadr_availability_group_states hags

    INNER JOIN sys.availability_groups ag ON ag.group_id = hags.group_id

    WHERE ag.name = @AGName;

    IF UPPER(@PrimaryReplica) = UPPER(@@SERVERNAME)

    RETURN 1; -- primary

    RETURN 0; -- not primary

    END;

    Which runs and creates the function fine, however the next step I run is:

    -- Detect if this instance's role is a Primary Replica.

    -- If this instance's role is NOT a Primary Replica stop the job so that it does not go on to the next job step

    DECLARE @rc int;

    EXEC @rc = master.dbo.fn_hadr_group_is_primary N'my-ag';

    IF @rc = 0

    BEGIN;

    DECLARE @name sysname;

    SELECT @name = (SELECT name FROM msdb.dbo.sysjobs WHERE job_id = CONVERT(uniqueidentifier, '$(ESCAPE_NONE(JOBID))'));

    EXEC msdb.dbo.sp_stop_job @job_name = @name;

    PRINT 'Stopped the job since this is not a Primary Replica';

    END;

    and get the following error: "Msg 8169, Level 16, State 2, Line 33

    Conversion failed when converting from a character string to uniqueidentifier."

    Can some please help with this. Or alternatively if you have a better way to make sure the job only runs when the server is the primary it would be greatly appreciated.


    Thanks,

    Kris

  • Why don't you just perform this check in the first step of the job? ( and "fail reporting success" )

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I would also switch to using  the built in check function "sys.fn_hadr_is_primary_replica" instead also.

    https://learn.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-hadr-is-primary-replica-transact-sql?view=sql-server-ver16

    I have seen cases in the past where custom primary checking functions have caused issues especially when metadata is being updated and causing dumps to be generated.  Switching to the MSFT supplied function mitigated the dumps entirely.

     

    If sys.fn_hadr_is_primary_replica ( @dbname ) <> 1   
    BEGIN
    -- If this is not the primary replica, exit (probably without error).
    --THROW/RAISERROR something to cause an exception to quit the step, handle the error gracefully
    END
    -- If this is the primary replica, continue to do the work needed.
  • Did you not read my error message?


    Thanks,

    Kris

  • No need to launch "EXEC msdb.dbo.sp_stop_job @job_name = @name;"  if you just fail the step "fail with success"

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Kris-155042 wrote:

    Did you not read my error message?

    Yes, and we are confused as to what you are trying to do.

    If a job is already executing, and a planned failover occurs, where and how are you going to test if the job is running?  Let's say the first step of the job is performing a backup, and it's currently executing.  How do you plan on getting a failover to test if a job is running?  You won't be able to do that within an already executing job.  And, depending upon what the job is doing, do you really want to stop it in the event of a failover?  Such as a backup?  There may be no need to stop some jobs.

    If there is some kind of ETL, as one example, they will likely fail in the event of a failover.  I'm not sure there is a good way to stop something like that gracefully in the event of a failover.

    Like some of the posters have pointed out, if the job would be affected by a failover, set the first step to test for the primary. On success, go to the next step.  On failure, quit the job with success.

    As for your code, '$(ESCAPE_NONE(JOBID))' should not be in quotes.  The string '$(ESCAPE_NONE(JOBID))'  will not convert to a uniqueidentifier.

    So, you possibilities appear to be, and there may be more:

    1. Add a first step that tests for primary, go to next step if it's is, end the job if it is not.
    2. Execute jobs such as ETL's that would be affected by a failover from a third server that connects to the listener.  There are a number of ways to do this.  It's up to you to find the one that works best.

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • As an example, I have a job the runs nightly index rebuilds on the primary. If  the primary fails over and becomes the secondary then all those jobs fail. I want the job to check if it's the primary. If yes, go to step two. If no, stop the job and record it as a success for step 1.

    These exact same jobs should be on the secondary so for the whole time it has been secondary only step 1 has been running successfully. But once failover happens then step 2 will run successfully. Does that make more sense.


    Thanks,

    Kris

  • Kris-155042 wrote:

    As an example, I have a job the runs nightly index rebuilds on the primary. If  the primary fails over and becomes the secondary then all those jobs fail. I want the job to check if it's the primary. If yes, go to step two. If no, stop the job and record it as a success for step 1.

    These exact same jobs should be on the secondary so for the whole time it has been secondary only step 1 has been running successfully. But once failover happens then step 2 will run successfully. Does that make more sense.

    Somewhat.   Are all of these jobs related to maintenance?  Can you elaborate the jobs, and what they do?

    Reindexing and updating statistics need to be done on the primary.  Backups, DBCC, cleaning up logs, etc. may need to be done on all of the nodes.

    In the event of a failover when anything other than reindexing and updating stats is executing, I would recommend a "re-try" be put in place when a failover occurs.  As an example, the failover on a synchronous commit auto failover with databases totaling about 3 TB took less than a minute.  In the SQL jobs, all of the steps would go to the last step in the event of a failure.  The last step checked for failover, and what step failed, and would then re-start the job at the failed step.

    This was somewhat tricky, if a job failed as a result of something other than a failover, you did not want to re-start the job.

    Here is what I can't figure out.  Reindexing starts, the AG fails over.  How are you going to test if the job failed on the secondary, and then start it on the primary?

    Lastly, I think you are making this more difficult that it needs to be.  Let's assume that these jobs are all maintenance jobs.

    Do you know the patching schedule for your servers?  You need to know this.  Once you know the patching schedule, you can create a second job schedule that runs these jobs earlier or later than the patching.  Or, I created a calendar table. One of the columns was "prod patching" and "non-prod patching". The patching schedule was specific Saturdays from 2 am till 7 am.  At 11 PM every Friday, a job would execute and check if it was a patching date.  If so, it disabled the various jobs.  At 7:30 AM, another job would run and re-enable the jobs.  We used a calendar table because there were many days when the schedule needed to be changed for various reasons.

    Changing the subject a bit, how are you performing reindexing, and are you sure that this is necessary?  Can you elaborate on this methodology?

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Hi Michael,

    No, they are not all maintenance plans. It's actually irrelevant what the job does. We have maintenance jobs, jobs that pick up csv files and import them. All sorts of things. I was just using rebuilding indexes as an example. I do know the patching schedule but again not relevant. We have vendors that do upgrades etc for us (otherwise they won't support the app), and the issue came up because they only rebooted one server and didn't failback. As I can't rely on vendors to do the right thing, this would mitigate the issue.

    I now have this (as an example). The index rebuild works when run independently but when I put the part that is meant to check if it is the primary it does nothing regardless if it's the primary or not.

    IF (SELECT ars.role_desc from sys.dm_hadr_availability_replica_states ars inner join sys.availability_groups ag on ars.group_id = ag.group_id

    where ag.name = 'My_AAG'

    and ars.is_local = 1) = 'PRIMARY'

    BEGIN

    USE My_DB

    DECLARE @TableName VARCHAR(255)

    DECLARE @sql NVARCHAR(500)

    DECLARE @fillfactor INT

    SET @fillfactor = 80

    DECLARE TableCursor CURSOR FOR

    SELECT OBJECT_SCHEMA_NAME([object_id])+'.'+name AS TableName

    FROM sys.tables

    OPEN TableCursor

    FETCH NEXT FROM TableCursor INTO @TableName

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @sql = 'ALTER INDEX ALL ON ' + @TableName + ' REBUILD WITH (FILLFACTOR = ' + CONVERT(VARCHAR(3),@fillfactor) + ')'

    EXEC (@sql)

    FETCH NEXT FROM TableCursor INTO @TableName

    END

    CLOSE TableCursor

    DEALLOCATE TableCursor

    END

    ELSE

    BEGIN

    RETURN;

    END


    Thanks,

    Kris

  • STEP1:

    if sys.fn_hadr_is_primary_replica ( dbname ) = 1
    begin
    print 'OK'
    end
    else
    begin
    RAISERROR('DB dbname is not primary', 16, -1);
    end

    Set step one to "quit job reporting success" as fail action.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I still think you are doing too much of the wrong work.

    You need to determine when the patching will occur, and either automatically create a new schedule for the jobs that may be executing during that period, or disable them and start them after the failover.

    Or, set up a third server that is for the sole purpose of executing SSIS packages, ETL, etc.  and connect to the listener for these.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Hi Johan,

    You are an absolute legend. That worked perfectly. 🙂

    I don't want to push the friendship but, what if I wanted to do it for the whole instance rather than per database. If this is too hard or not possible, don't worry about it, just thought I'd ask as I do have some jobs that check every database in the instance. Thank you SOOOOO much for sorting this out for me.


    Thanks,

    Kris

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply