Job schedule with availability groups ( 4 nodes )

Question

Job schedule with availability groups ( 4 nodes )

johnnyrmtl

Hall of Fame

Points: 3654
More actions
September 24, 2018 at 7:21 am

#379580

Hello,
We seem to be having an issue of a SSIS job running on all nodes instead of only the first node when not in a failover mode.
It is spawning errors from the secondary nodes and we have even created a step in the job to check the active node as such.
DECLARE @role int
    DECLARE @status tinyint
    SET @role = (SELECT [role] FROM [sys].[dm_hadr_availability_replica_states] hars INNER JOIN [sys].[availability_databases_cluster] adc ON hars.[group_id] = adc.[group_id] WHERE hars.[is_local] = 1 AND adc.[database_name] ='SSISDB')
    IF @role <> 1
    BEGIN
    RAISERROR('Database is secondary.', 11, 1);
    END
The logs seem to say that it is running at the same time on some of the nodes which in turn produces object errors because of staging tables not existing.
Any ideas ?
Much appreciated

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Travis Halsell SSC Eights! Points: 896 More actions · Answer 1

What does your step behavior look like? For example, after you run the is primary check in your first step, do you have to proceed to next step even on failure?

I run similar checks on my secondaries but I don't make it it's own step. I have it begin the execute if primary, else, end.

DECLARE @ServerName NVARCHAR(256) = @@SERVERNAME DECLARE @RoleDesc NVARCHAR(60)

SELECT @RoleDesc = a.role_desc FROM sys.dm_hadr_availability_replica_states AS a JOIN sys.availability_replicas AS b ON b.replica_id = a.replica_id WHERE b.replica_server_name = @ServerName

IF @RoleDesc = 'PRIMARY' BEGIN --Run your SSIS execution script here END

johnnyrmtl Hall of Fame Points: 3654 More actions · Answer 2

I apologize if unclear 🙂
The job is set up on the primary node and replicated on the secondary nodes ( 4 of them ). It seems like we are getting a failure mail from the secondary nodes ( random 2 to 4 ) although it completes on the primary.

I was not the one who set this up and have it as a first step for all jobs on the HA nodes.
Therefore STEP 1 is the check as above and then

ON SUCCESS
go to next step

ON FAILURE
quit the job reporting success

My question is why is even going to next step 2 when it reports the node as secondary and sending us a FAIL
is the logic correct ?

Thanks

Travis Halsell SSC Eights! Points: 896 More actions · Answer 3

If you run your check manually in a query window on a secondary, does it properly raise the error? I would start looking at the actual SQL at this point now that you have confirmed proper step behavior is in place. The below SQL should provide a solution for what you are trying to do. This properly throws the error if it is ran on a secondary. if it is ran on the primary, it will not raise the error.

DECLARE @ServerName NVARCHAR(256) = @@SERVERNAME DECLARE @RoleDesc NVARCHAR(60)

SELECT @RoleDesc = a.role_desc FROM sys.dm_hadr_availability_replica_states AS a JOIN sys.availability_replicas AS b ON b.replica_id = a.replica_id WHERE b.replica_server_name = @ServerName

IF @RoleDesc = 'SECONDARY' BEGIN RAISERROR('Database is secondary.', 11, 1) END

johnnyrmtl Hall of Fame Points: 3654 More actions · Answer 4

One more thing to add is that node 4 has an error with AG synchronisation
can this be a factor with the job?

Travis Halsell SSC Eights! Points: 896 More actions · Answer 5

An error synchronizing should not affect whether the instance is considered a primary or a secondary under asynchronous mode.

johnnyrmtl Hall of Fame Points: 3654 More actions · Answer 6

thalsell - Tuesday, September 25, 2018 6:57 AM
An error synchronizing should not affect whether the instance is considered a primary or a secondary under asynchronous mode.

Strange that the job is indicating a failure message from the other nodes even with the HA check step.

Travis Halsell SSC Eights! Points: 896 More actions · Answer 7

Have you manually ran the checkstep SQL in a query window on a secondary yet? I am really leaning towards that step not being good. It should properly raise an error in the results. If it doesn't you know it doesn't work. What I provided above works.

johnnyrmtl Hall of Fame Points: 3654 More actions · Answer 8

We have to rethink this thanks a lot for your help !

will report back

johnnyrmtl Hall of Fame Points: 3654 More actions · Answer 9

johnnyrmtl - Tuesday, September 25, 2018 8:02 AM
We have to rethink this thanks a lot for your help !
will report back

Since I did not write the job and do not have access to all the nodes , Is a way to check from the primary node or HA logical node where the job ran and if multiple times ?

Travis Halsell SSC Eights! Points: 896 More actions · Answer 10

As far as I know you need some level of access to view the agent history / job activity monitor on each node.

Since they are all trying to execute the same SSIS package... If you have access to the server that hosts the package, you can right click the package -> reports -> view all executions.

johnnyrmtl Hall of Fame Points: 3654 More actions · Answer 11

Travis Halsell - Thursday, September 27, 2018 9:38 AM
As far as I know you need some level of access to view the agent history / job activity monitor on each node.
Since they are all trying to execute the same SSIS package... If you have access to the server that hosts the package, you can right click the package -> reports -> view all executions.

I completely agree and the admin does not seem to understand my reasoning.
I indeed did view that report and I see two identical executions ( start times ) of the job one failed and one succeeded but not sure on which node it failed.

Travis Halsell SSC Eights! Points: 896 More actions · Answer 12

I believe you can see the server that attempted to execute the package if you dig a little deeper in the report

johnnyrmtl Hall of Fame Points: 3654 More actions · Answer 13

Travis Halsell - Thursday, September 27, 2018 10:04 AM
I believe you can see the server that attempted to execute the package if you dig a little deeper in the report

I drilled down within the execution info to both the one that worked and the failure and they both point to the primary mode 🙁

Travis Halsell SSC Eights! Points: 896 More actions · Answer 14

Do you have 2 jobs on the primary executing the same package at the same time?