September 24, 2018 at 7:21 am
Hello,
We seem to be having an issue of a SSIS job running on all nodes instead of only the first node when not in a failover mode.
It is spawning errors from the secondary nodes and we have even created a step in the job to check the active node as such.
DECLARE @role int
DECLARE @status tinyint
SET @role = (SELECT [role] FROM [sys].[dm_hadr_availability_replica_states] hars INNER JOIN [sys].[availability_databases_cluster] adc ON hars.[group_id] = adc.[group_id] WHERE hars.[is_local] = 1 AND adc.[database_name] ='SSISDB')
IF @role <> 1
BEGIN
RAISERROR('Database is secondary.', 11, 1);
END
The logs seem to say that it is running at the same time on some of the nodes which in turn produces object errors because of staging tables not existing.
Any ideas ?
Much appreciated
September 24, 2018 at 10:12 am
What does your step behavior look like? For example, after you run the is primary check in your first step, do you have to proceed to next step even on failure?
I run similar checks on my secondaries but I don't make it it's own step. I have it begin the execute if primary, else, end.
DECLARE @ServerName NVARCHAR(256) = @@SERVERNAME
DECLARE @RoleDesc NVARCHAR(60)
SELECT @RoleDesc = a.role_desc
FROM sys.dm_hadr_availability_replica_states AS a
JOIN sys.availability_replicas AS b
ON b.replica_id = a.replica_id
WHERE b.replica_server_name = @ServerName
IF @RoleDesc = 'PRIMARY'
BEGIN
--Run your SSIS execution script here
END
September 24, 2018 at 10:57 am
I apologize if unclear 🙂
The job is set up on the primary node and replicated on the secondary nodes ( 4 of them ). It seems like we are getting a failure mail from the secondary nodes ( random 2 to 4 ) although it completes on the primary.
I was not the one who set this up and have it as a first step for all jobs on the HA nodes.
Therefore STEP 1 is the check as above and then
ON SUCCESS
go to next step
ON FAILURE
quit the job reporting success
My question is why is even going to next step 2 when it reports the node as secondary and sending us a FAIL
is the logic correct ?
Thanks
September 24, 2018 at 11:15 am
If you run your check manually in a query window on a secondary, does it properly raise the error? I would start looking at the actual SQL at this point now that you have confirmed proper step behavior is in place. The below SQL should provide a solution for what you are trying to do. This properly throws the error if it is ran on a secondary. if it is ran on the primary, it will not raise the error.
DECLARE @ServerName NVARCHAR(256) = @@SERVERNAME
DECLARE @RoleDesc NVARCHAR(60)
SELECT @RoleDesc = a.role_desc
FROM sys.dm_hadr_availability_replica_states AS a
JOIN sys.availability_replicas AS b
ON b.replica_id = a.replica_id
WHERE b.replica_server_name = @ServerName
IF @RoleDesc = 'SECONDARY'
BEGIN
RAISERROR('Database is secondary.', 11, 1)
END
September 25, 2018 at 6:46 am
One more thing to add is that node 4 has an error with AG synchronisation
can this be a factor with the job?
September 25, 2018 at 6:57 am
An error synchronizing should not affect whether the instance is considered a primary or a secondary under asynchronous mode.
September 25, 2018 at 7:16 am
thalsell - Tuesday, September 25, 2018 6:57 AMAn error synchronizing should not affect whether the instance is considered a primary or a secondary under asynchronous mode.
Strange that the job is indicating a failure message from the other nodes even with the HA check step.
September 25, 2018 at 7:54 am
Have you manually ran the checkstep SQL in a query window on a secondary yet? I am really leaning towards that step not being good. It should properly raise an error in the results. If it doesn't you know it doesn't work. What I provided above works.
September 25, 2018 at 8:02 am
We have to rethink this thanks a lot for your help !
will report back
September 27, 2018 at 9:32 am
johnnyrmtl - Tuesday, September 25, 2018 8:02 AMWe have to rethink this thanks a lot for your help !will report back
Since I did not write the job and do not have access to all the nodes , Is a way to check from the primary node or HA logical node where the job ran and if multiple times ?
September 27, 2018 at 9:38 am
As far as I know you need some level of access to view the agent history / job activity monitor on each node.
Since they are all trying to execute the same SSIS package... If you have access to the server that hosts the package, you can right click the package -> reports -> view all executions.
September 27, 2018 at 9:51 am
Travis Halsell - Thursday, September 27, 2018 9:38 AMAs far as I know you need some level of access to view the agent history / job activity monitor on each node.Since they are all trying to execute the same SSIS package... If you have access to the server that hosts the package, you can right click the package -> reports -> view all executions.
I completely agree and the admin does not seem to understand my reasoning.
I indeed did view that report and I see two identical executions ( start times ) of the job one failed and one succeeded but not sure on which node it failed.
September 27, 2018 at 10:04 am
I believe you can see the server that attempted to execute the package if you dig a little deeper in the report
September 27, 2018 at 11:15 am
Travis Halsell - Thursday, September 27, 2018 10:04 AMI believe you can see the server that attempted to execute the package if you dig a little deeper in the report
I drilled down within the execution info to both the one that worked and the failure and they both point to the primary mode 🙁
September 27, 2018 at 11:22 am
Do you have 2 jobs on the primary executing the same package at the same time?
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply