October 16, 2018 at 4:19 pm
Hello experts,
I'm hoping you can help me with a maddening job failure alert. Sorry for the long-winded post, but I feel the need to include all the relevant info for the problem description to make sense.
Thanks for any help.
We have a job that runs every morning. The first step of the job executes an SSIS package. Every day I get an error alert with a message like the following (same every time except the elapsed time usually changes):
Microsoft (R) SQL Server Execute Package Utility Version 13.0.4224.16 for 64-bit Copyright (C) 2016 Microsoft. All rights reserved. Started: 9:00:00 AM Execution ID: 2011. To view the details for the execution, right-click on the Integration Services Catalog, and open the [All Executions] report Started: 9:00:00 AM Finished: 9:00:36 AM Elapsed: 35.782 seconds
When I check the Agent history log, though, I see two entries for Step 1 of the Agent job.
First I see something like this:Message
Microsoft (R) SQL Server Execute Package Utility
Version 13.0.4224.16 for 64-bit
Copyright (C) 2016 Microsoft. All rights reserved.
Started: 9:00:00 AM
Execution ID: 2011.
To view the details for the execution, right-click on the Integration Services Catalog, and open the [All Executions] report
Started: 9:00:00 AM
Finished: 9:00:36 AM
Elapsed: 35.782 seconds
Then I see something like this:
Message
Executed as user: DOMAIN\AgentServiceAccount. The step succeeded.
Finally, as far as I can tell, the data import works as expected and no one has reported anything wrong with the import. Also, this stored procedure error check is sort of legacy code since we have Red Gate SQL Monitor. But Red Gate doesn't report a job failure in this case, adding to the mystery.
I did check the standard reports for All Executions as the error suggests, but I don't see any error messages or warnings. And the summary totals say 0 Failed, 7 Succeeded.
My first question is, how can I track down what failure the SSIS package that triggers an alert email, but does not list failures in the All Executions reports? My gut is telling me not to ignore this as some kind of false positive because the alert is being generated for some reason, even if I don't yet know what that reason is.
FYI, We have a custom job that searches for failed jobs within the previous hour using the following SQL (approx. because there is more code but I included just the query and the necessary variable declaration from the stored procedure parameters).
declare @NumHours int = 1
select
run_date = cast(convert(varchar, T1.run_date)
+ ' ' + stuff(stuff(right('000000'
+ convert(varchar,T1.run_time),6),5,0,':'),3,0,':')
as datetime
),
failure_date = convert(datetime,rtrim(run_date))
+ (run_time * 9 + run_time % 10000 * 6 + run_time % 100 * 10 + 25 * run_duration) / 216e4 ,
substring(T2.name,1,100) as 'job_name',
T1.step_id as 'step_id',
T1.step_name as 'step_name',
T1.message as 'message'
from msdb..sysjobhistory T1
join msdb..sysjobs T2
on T1.job_id = T2.job_id
where T1.run_status != 1
--AND T1.step_id != 0
and datediff(hh,cast(convert(varchar, T1.run_date)
+ ' ' + stuff(stuff(right('000000'
+ convert(varchar,T1.run_time),6),5,0,':'),3,0,':')
as datetime
), getdate()) <= @NumHours
and T1.instance_id not in (select instance_id from failed_jobs_log)
- webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html
October 16, 2018 at 4:24 pm
webrunner - Tuesday, October 16, 2018 4:19 PMHello experts,I'm hoping you can help me with a maddening job failure alert. Sorry for the long-winded post, but I feel the need to include all the relevant info for the problem description to make sense.
Thanks for any help.We have a job that runs every morning. The first step of the job executes an SSIS package. Every day I get an error alert with a message like the following (same every time except the elapsed time usually changes):
Microsoft (R) SQL Server Execute Package Utility Version 13.0.4224.16 for 64-bit Copyright (C) 2016 Microsoft. All rights reserved. Started: 9:00:00 AM Execution ID: 2011. To view the details for the execution, right-click on the Integration Services Catalog, and open the [All Executions] report Started: 9:00:00 AM Finished: 9:00:36 AM Elapsed: 35.782 seconds
When I check the Agent history log, though, I see two entries for Step 1 of the Agent job.
First I see something like this:
Message
Microsoft (R) SQL Server Execute Package Utility
Version 13.0.4224.16 for 64-bit
Copyright (C) 2016 Microsoft. All rights reserved.Started: 9:00:00 AM
Execution ID: 2011.
To view the details for the execution, right-click on the Integration Services Catalog, and open the [All Executions] report
Started: 9:00:00 AM
Finished: 9:00:36 AM
Elapsed: 35.782 secondsThen I see something like this:
Message
Executed as user: DOMAIN\AgentServiceAccount. The step succeeded.Finally, as far as I can tell, the data import works as expected and no one has reported anything wrong with the import. Also, this stored procedure error check is sort of legacy code since we have Red Gate SQL Monitor. But Red Gate doesn't report a job failure in this case, adding to the mystery.
I did check the standard reports for All Executions as the error suggests, but I don't see any error messages or warnings. And the summary totals say 0 Failed, 7 Succeeded.
My first question is, how can I track down what failure the SSIS package that triggers an alert email, but does not list failures in the All Executions reports? My gut is telling me not to ignore this as some kind of false positive because the alert is being generated for some reason, even if I don't yet know what that reason is.
FYI, We have a custom job that searches for failed jobs within the previous hour using the following SQL (approx. because there is more code but I included just the query and the necessary variable declaration from the stored procedure parameters).
declare @NumHours int = 1
select
run_date = cast(convert(varchar, T1.run_date)
+ ' ' + stuff(stuff(right('000000'
+ convert(varchar,T1.run_time),6),5,0,':'),3,0,':')
as datetime
),
failure_date = convert(datetime,rtrim(run_date))
+ (run_time * 9 + run_time % 10000 * 6 + run_time % 100 * 10 + 25 * run_duration) / 216e4 ,
substring(T2.name,1,100) as 'job_name',
T1.step_id as 'step_id',
T1.step_name as 'step_name',
T1.message as 'message'from msdb..sysjobhistory T1
join msdb..sysjobs T2
on T1.job_id = T2.job_idwhere T1.run_status != 1
--AND T1.step_id != 0
and datediff(hh,cast(convert(varchar, T1.run_date)
+ ' ' + stuff(stuff(right('000000'
+ convert(varchar,T1.run_time),6),5,0,':'),3,0,':')
as datetime
), getdate()) <= @NumHours
and T1.instance_id not in (select instance_id from failed_jobs_log)- webrunner
message is clear on what to do
"To view the details for the execution, right-click on the Integration Services Catalog, and open the [All Executions] report"
Its also possible to query the catalog tables to retrieve the error details and messsages through t-sql
one such example is https://gallery.technet.microsoft.com/scriptcenter/Identificando-erros-de-3a021c87
October 16, 2018 at 4:44 pm
Is the email coming from the package?
October 17, 2018 at 5:52 am
Nothing in your post suggests that you have actually checked the All Executions report in the Integration Services catalog. Did you track this down?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 17, 2018 at 8:35 am
Joe Torre - Tuesday, October 16, 2018 4:44 PMIs the email coming from the package?
Hi Joe, thanks. No, the email is coming from the legacy stored procedure. On the SSIS side there is no email alert configured within the package. Just an alert email that would be sent if the Agent job that runs the package fails. But the Agent job doesn't fail, it shows the green success check marks for all the job steps.
- webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html
October 17, 2018 at 8:43 am
My first question is, is the task actually failing? Is it not doing what it's supposed to be doing?
Second, I'd check that actual Agent job isn't set to send an e-mail on success / completion, because from what you've so far posted, if it were me I'd be saying everything's working hunky-dory.
I'm betting the job is set to either Alert by e-mail or on job success / completion.
October 17, 2018 at 8:57 am
Phil Parkin - Wednesday, October 17, 2018 5:52 AMNothing in your post suggests that you have actually checked the All Executions report in the Integration Services catalog. Did you track this down?
Sorry - my post was long so in it I buried this line:
I did check the standard reports for All Executions as the error suggests, but I don't see any error messages or warnings. And the summary totals say 0 Failed, 7 Succeeded.
Specifically, I did this:
1. In the Integration Services Catalog, I expanded down to Projects and then right-clicked the project name.
2. I clicked Reports > Standard Reports > All Executions.
3. I see 0 Failed,7 succeeded for the past week's execution information.
4. I also clicked into the All Messages list and don't see any failures.
I was able to see an error on dev:
"SSIS_CSV_Import:Error: The SSIS logging provider has failed to open the log. Error code: 0x80070005.
Access is denied."
That was because I had not correctly specified the log file location. But in production I didn't see any errors.
Additionally, I was able to track down some T-SQL queries that can be used to search the Integration Services Catalog data. I did find some old errors back from when I was implementing the package in production, but nothing recent.
use SSISDB;
select OPR.object_name, MSG.message_time, MSG.message
from catalog.operation_messages as MSG
inner join catalog.operations as OPR on OPR.operation_id = MSG.operation_id
where MSG.message_type = 120 -- Error
order by MSG.message_time desc;
use SSISDB;
select OPR.object_name, MSG.message_time, MSG.message
from catalog.operation_messages as MSG
inner join catalog.operations as OPR on OPR.operation_id = MSG.operation_id
where MSG.message_type = 110 -- Warning
order by MSG.message_time desc;
Thanks again for any help - I'm sure I'm missing something here!
-webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html
October 17, 2018 at 9:02 am
jasona.work - Wednesday, October 17, 2018 8:43 AMMy first question is, is the task actually failing? Is it not doing what it's supposed to be doing?
Second, I'd check that actual Agent job isn't set to send an e-mail on success / completion, because from what you've so far posted, if it were me I'd be saying everything's working hunky-dory.
I'm betting the job is set to either Alert by e-mail or on job success / completion.
Thanks, I didn't think to check the Agent job alerts. However, the alert is set to send email when the job fails. But it is also set to write to the Windows Application event log so I will check that log as well to see if there is any useful info there (if it actually logged any warnings or errors).
To answer your first question second, yes, the job is doing what it is supposed to, at least according to my colleagues who get a report based on the import.
My main concern at this point is my ignorance of SSIS (I'm an "accidental DBA"), which I fear is causing me to miss some small but real issue lurking in the package.
Maybe paranoia, but I read in a good debugging book once, "If you didn't fix it, it isn't fixed." :hehe:
Thanks,
webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html
October 17, 2018 at 9:09 am
webrunner - Wednesday, October 17, 2018 9:02 AMjasona.work - Wednesday, October 17, 2018 8:43 AMMy first question is, is the task actually failing? Is it not doing what it's supposed to be doing?
Second, I'd check that actual Agent job isn't set to send an e-mail on success / completion, because from what you've so far posted, if it were me I'd be saying everything's working hunky-dory.
I'm betting the job is set to either Alert by e-mail or on job success / completion.Thanks, I didn't think to check the Agent job alerts. However, the alert is set to send email when the job fails. But it is also set to write to the Windows Application event log so I will check that log as well to see if there is any useful info there (if it actually logged any warnings or errors).
To answer your first question second, yes, the job is doing what it is supposed to, at least according to my colleagues who get a report based on the import.
My main concern at this point is my ignorance of SSIS (I'm an "accidental DBA"), which I fear is causing me to miss some small but real issue lurking in the package.
Maybe paranoia, but I read in a good debugging book once, "If you didn't fix it, it isn't fixed." :hehe:Thanks,
webrunner
Understood. I recently started dabbling in SSIS packages to import some files. In my case, the job *always* reports an error (and it is technically an error,) even though the job does what it's supposed to be doing.
In my case, I'm still working on getting a file move task to skip the current file (which is held open by another process and can't be moved.)
In your case, though, as nothing is throwing an error, it's more likely somewhere something is set to send an e-mail for some reason...
The trick, of course, is going to be tracking down said something...
October 17, 2018 at 9:11 am
New thought after reading through the posts again.
Is it possible SQL Monitor is sending the e-mail?
October 17, 2018 at 9:14 am
Make a separate post if you want some help with that 🙂 I've written a script in the past which detects locks on files ...
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 17, 2018 at 9:17 am
jasona.work - Wednesday, October 17, 2018 9:11 AMNew thought after reading through the posts again.Is it possible SQL Monitor is sending the e-mail?
Thanks - no, SQL Monitor is not sending the email. In fact, I figured it would but it also doesn't detect a failure.
The email is sent by a custom stored procedure on the SQL Server that checks for failed jobs (see query in my first post for how it checks).
So it's possible there is something wrong in that procedure that is either generating a false positive failure email, or that there is a minor error or warning in the SSIS package that triggers the failure procedure but doesn't rise to the level of causing the package or Agent job to fail. In the past I have seen things like columns out of sync, etc. which aren't good but somehow the package goes through to completion.
- webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html
October 17, 2018 at 10:07 am
check it , if job is created via mant. plan
October 17, 2018 at 10:43 am
Question: Where is the "failed_jobs_log" table coming from? I checked on one of my servers (all SQL 2014, mind, so if it's new in SQL 2016 I won't have this) and it doesn't exist in either SSISDB or MSDB.and T1.instance_id not in (select instance_id from failed_jobs_log)
Further on that, how does information get put into the table in question?
Maybe *that* part is failing, so you don't get a match on instance_id, resulting in the e-mail being sent?
October 17, 2018 at 10:56 am
jasona.work - Wednesday, October 17, 2018 10:43 AMQuestion: Where is the "failed_jobs_log" table coming from? I checked on one of my servers (all SQL 2014, mind, so if it's new in SQL 2016 I won't have this) and it doesn't exist in either SSISDB or MSDB.and T1.instance_id not in (select instance_id from failed_jobs_log)
Further on that, how does information get put into the table in question?
Maybe *that* part is failing, so you don't get a match on instance_id, resulting in the e-mail being sent?
Thanks, the failed_jobs_log table is a custom table I built - sorry I wasn't more clear about that. Information gets into it using a proc that queries the msdb database for jobs with a failed status within the previous hour, and it runs every 30 minutes. The msdb query is something like this.
select
run_date = cast(convert(varchar, T1.run_date)
+ ' ' + stuff(stuff(right('000000'
+ convert(varchar,T1.run_time),6),5,0,':'),3,0,':')
as datetime
),
failure_date = convert(datetime,rtrim(run_date))
+ (run_time * 9 + run_time % 10000 * 6 + run_time % 100 * 10 + 25 * run_duration) / 216e4 ,
substring(T2.name,1,100) as 'job_name',
T1.step_id as 'step_id',
T1.step_name as 'step_name',
T1.message as 'message'
from msdb..sysjobhistory T1
join msdb..sysjobs T2
on T1.job_id = T2.job_id
where T1.run_status != 1
--AND T1.step_id != 0
and datediff(hh,cast(convert(varchar, T1.run_date)
+ ' ' + stuff(stuff(right('000000'
+ convert(varchar,T1.run_time),6),5,0,':'),3,0,':')
as datetime
), getdate()) <= @NumHours
- webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply