Check AAG is the primary before running a SQL Agent job

Question

Post reply

Check AAG is the primary before running a SQL Agent job

Kris-155042

Hall of Fame

Points: 3460
More actions
August 13, 2023 at 11:50 pm

#4260420

Hi All,
I am trying to setup SQL so that if the primary becomes the secondary, the SQL agent will check this before running jobs so they don't fail. This is mainly due to monthly patching so there can be a small window where this is the case. So far I have run:
USE master;
GO
IF OBJECT_ID('dbo.fn_hadr_group_is_primary', 'FN') IS NOT NULL
DROP FUNCTION dbo.fn_hadr_group_is_primary;
GO
CREATE FUNCTION dbo.fn_hadr_group_is_primary (@AGName sysname)
RETURNS bit
AS
BEGIN;
DECLARE @PrimaryReplica sysname;
SELECT
@PrimaryReplica = hags.primary_replica
FROM sys.dm_hadr_availability_group_states hags
INNER JOIN sys.availability_groups ag ON ag.group_id = hags.group_id
WHERE ag.name = @AGName;
IF UPPER(@PrimaryReplica) = UPPER(@@SERVERNAME)
RETURN 1; -- primary
RETURN 0; -- not primary
END;
Which runs and creates the function fine, however the next step I run is:
-- Detect if this instance's role is a Primary Replica.
-- If this instance's role is NOT a Primary Replica stop the job so that it does not go on to the next job step
DECLARE @rc int;
EXEC @rc = master.dbo.fn_hadr_group_is_primary N'my-ag';
IF @rc = 0
BEGIN;
DECLARE @name sysname;
SELECT @name = (SELECT name FROM msdb.dbo.sysjobs WHERE job_id = CONVERT(uniqueidentifier, '$(ESCAPE_NONE(JOBID))'));
EXEC msdb.dbo.sp_stop_job @job_name = @name;
PRINT 'Stopped the job since this is not a Primary Replica';
END;
and get the following error: "Msg 8169, Level 16, State 2, Line 33
Conversion failed when converting from a character string to uniqueidentifier."
Can some please help with this. Or alternatively if you have a better way to make sure the job only runs when the server is the primary it would be greatly appreciated.
Thanks,
Kris

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Johan Bijnens SSC Guru Points: 135369 More actions · Answer 1

Why don't you just perform this check in the first step of the job? ( and "fail reporting success" )

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Ant-Green SSC Guru Points: 113507 More actions · Answer 2

I would also switch to using the built in check function "sys.fn_hadr_is_primary_replica" instead also.

https://learn.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-hadr-is-primary-replica-transact-sql?view=sql-server-ver16

I have seen cases in the past where custom primary checking functions have caused issues especially when metadata is being updated and causing dumps to be generated. Switching to the MSFT supplied function mitigated the dumps entirely.

If sys.fn_hadr_is_primary_replica ( @dbname ) <> 1   
BEGIN  
-- If this is not the primary replica, exit (probably without error).  
--THROW/RAISERROR something to cause an exception to quit the step, handle the error gracefully
END  
-- If this is the primary replica, continue to do the work needed.

Kris-155042 Hall of Fame Points: 3460 More actions · Answer 3

Kris-155042

Hall of Fame

Points: 3460

August 14, 2023 at 1:49 pm

#4260913

Did you not read my error message?

Thanks,

Kris

Johan Bijnens SSC Guru Points: 135369 More actions · Answer 4

No need to launch "EXEC msdb.dbo.sp_stop_job @job_name = @name;" if you just fail the step "fail with success"

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Michael L John One Orange Chip Points: 27055 More actions · Answer 5

Kris-155042 wrote:

Did you not read my error message?

Yes, and we are confused as to what you are trying to do.

If a job is already executing, and a planned failover occurs, where and how are you going to test if the job is running? Let's say the first step of the job is performing a backup, and it's currently executing. How do you plan on getting a failover to test if a job is running? You won't be able to do that within an already executing job. And, depending upon what the job is doing, do you really want to stop it in the event of a failover? Such as a backup? There may be no need to stop some jobs.

If there is some kind of ETL, as one example, they will likely fail in the event of a failover. I'm not sure there is a good way to stop something like that gracefully in the event of a failover.

Like some of the posters have pointed out, if the job would be affected by a failover, set the first step to test for the primary. On success, go to the next step. On failure, quit the job with success.

As for your code, '$(ESCAPE_NONE(JOBID))' should not be in quotes. The string '$(ESCAPE_NONE(JOBID))' will not convert to a uniqueidentifier.

So, you possibilities appear to be, and there may be more:

Add a first step that tests for primary, go to next step if it's is, end the job if it is not.
Execute jobs such as ETL's that would be affected by a failover from a third server that connects to the listener. There are a number of ways to do this. It's up to you to find the one that works best.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Kris-155042 Hall of Fame Points: 3460 More actions · Answer 6

As an example, I have a job the runs nightly index rebuilds on the primary. If the primary fails over and becomes the secondary then all those jobs fail. I want the job to check if it's the primary. If yes, go to step two. If no, stop the job and record it as a success for step 1.

These exact same jobs should be on the secondary so for the whole time it has been secondary only step 1 has been running successfully. But once failover happens then step 2 will run successfully. Does that make more sense.

Thanks,

Kris

Michael L John One Orange Chip Points: 27055 More actions · Answer 7

Kris-155042 wrote:

As an example, I have a job the runs nightly index rebuilds on the primary. If the primary fails over and becomes the secondary then all those jobs fail. I want the job to check if it's the primary. If yes, go to step two. If no, stop the job and record it as a success for step 1.
These exact same jobs should be on the secondary so for the whole time it has been secondary only step 1 has been running successfully. But once failover happens then step 2 will run successfully. Does that make more sense.

Somewhat. Are all of these jobs related to maintenance? Can you elaborate the jobs, and what they do?

Reindexing and updating statistics need to be done on the primary. Backups, DBCC, cleaning up logs, etc. may need to be done on all of the nodes.

In the event of a failover when anything other than reindexing and updating stats is executing, I would recommend a "re-try" be put in place when a failover occurs. As an example, the failover on a synchronous commit auto failover with databases totaling about 3 TB took less than a minute. In the SQL jobs, all of the steps would go to the last step in the event of a failure. The last step checked for failover, and what step failed, and would then re-start the job at the failed step.

This was somewhat tricky, if a job failed as a result of something other than a failover, you did not want to re-start the job.

Here is what I can't figure out. Reindexing starts, the AG fails over. How are you going to test if the job failed on the secondary, and then start it on the primary?

Lastly, I think you are making this more difficult that it needs to be. Let's assume that these jobs are all maintenance jobs.

Do you know the patching schedule for your servers? You need to know this. Once you know the patching schedule, you can create a second job schedule that runs these jobs earlier or later than the patching. Or, I created a calendar table. One of the columns was "prod patching" and "non-prod patching". The patching schedule was specific Saturdays from 2 am till 7 am. At 11 PM every Friday, a job would execute and check if it was a patching date. If so, it disabled the various jobs. At 7:30 AM, another job would run and re-enable the jobs. We used a calendar table because there were many days when the schedule needed to be changed for various reasons.

Changing the subject a bit, how are you performing reindexing, and are you sure that this is necessary? Can you elaborate on this methodology?

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Kris-155042 Hall of Fame Points: 3460 More actions · Answer 8

Hi Michael,

No, they are not all maintenance plans. It's actually irrelevant what the job does. We have maintenance jobs, jobs that pick up csv files and import them. All sorts of things. I was just using rebuilding indexes as an example. I do know the patching schedule but again not relevant. We have vendors that do upgrades etc for us (otherwise they won't support the app), and the issue came up because they only rebooted one server and didn't failback. As I can't rely on vendors to do the right thing, this would mitigate the issue.

I now have this (as an example). The index rebuild works when run independently but when I put the part that is meant to check if it is the primary it does nothing regardless if it's the primary or not.

IF (SELECT ars.role_desc from sys.dm_hadr_availability_replica_states ars inner join sys.availability_groups ag on ars.group_id = ag.group_id

where ag.name = 'My_AAG'

and ars.is_local = 1) = 'PRIMARY'

BEGIN

USE My_DB

DECLARE @TableName VARCHAR(255)

DECLARE @sql NVARCHAR(500)

DECLARE @fillfactor INT

SET @fillfactor = 80

DECLARE TableCursor CURSOR FOR

SELECT OBJECT_SCHEMA_NAME([object_id])+'.'+name AS TableName

FROM sys.tables

OPEN TableCursor

FETCH NEXT FROM TableCursor INTO @TableName

WHILE @@FETCH_STATUS = 0

BEGIN

SET @sql = 'ALTER INDEX ALL ON ' + @TableName + ' REBUILD WITH (FILLFACTOR = ' + CONVERT(VARCHAR(3),@fillfactor) + ')'

EXEC (@sql)

FETCH NEXT FROM TableCursor INTO @TableName

END

CLOSE TableCursor

DEALLOCATE TableCursor

END

ELSE

BEGIN

RETURN;

END

Thanks,

Kris

Johan Bijnens SSC Guru Points: 135369 More actions · Answer 9

STEP1:

if sys.fn_hadr_is_primary_replica ( dbname ) = 1
begin
     print 'OK'
end
else
begin
RAISERROR('DB dbname is not primary', 16, -1);
end

Set step one to "quit job reporting success" as fail action.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Michael L John One Orange Chip Points: 27055 More actions · Answer 10

I still think you are doing too much of the wrong work.

You need to determine when the patching will occur, and either automatically create a new schedule for the jobs that may be executing during that period, or disable them and start them after the failover.

Or, set up a third server that is for the sole purpose of executing SSIS packages, ETL, etc. and connect to the listener for these.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Kris-155042 Hall of Fame Points: 3460 More actions · Answer 11

Hi Johan,

You are an absolute legend. That worked perfectly. 🙂

I don't want to push the friendship but, what if I wanted to do it for the whole instance rather than per database. If this is too hard or not possible, don't worry about it, just thought I'd ask as I do have some jobs that check every database in the instance. Thank you SOOOOO much for sorting this out for me.

Thanks,

Kris