T Log backup jobs failure

Question

Post reply

T Log backup jobs failure

Kazmerelda

SSCarpal Tunnel

Points: 4667
More actions
October 31, 2011 at 8:28 am

#390725

Hi All,
I have done alot of searching on this and tried various things but t logs for one particular cluster keep failing with the following error:
Message
Executed as user: XXX. Microsoft (R) SQL Server Execute Package Utility Version 10.0.4000.0 for 64-bit Copyright (C) Microsoft Corp 1984-2005. All rights reserved. Started: 12:35:01 Progress: 2011-10-31 12:35:02.61 Source: {38C7D3F9-4904-44C3-9C8E-546B80355F4E} Executing query "DECLARE @GUID UNIQUEIDENTIFIER EXECUTE msdb..sp_ma...".: 100% complete End Progress Error: 2011-10-31 12:35:59.55 Code: 0xC00291EC Source: Clean Up Maintenance Plans Execute SQL Task Description: Failed to acquire connection "XXX". Connection may not be configured correctly or you may not have the right permissions on this connection. End Error Error: 2011-10-31 12:35:59.55 Code: 0xC0024104 Source: {67370B42-17F6-4AB8-A971-6B821F66FA2B} Description: The Execute method on the task returned error code 0x80131904 (Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.). The Execute method must succeed, and indicate the result using an "out" parameter. End Error Warning: 2011-10-31 12:35:59.55 Code: 0x80019002 Source: OnPostExecute Description: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED. The Execution method succeeded, but the number of errors raised (2) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors. End Warning Error: 2011-10-31 12:36:02.51 Code: 0xC0024104 Source: Reporting task for {02446F5D-6A1B-4408-86B3-7EC99924871E} Description: The Execute method on the task returned error code 0x80131904 (Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.). The Execute method must succeed, and indicate the result using an "out" parameter. End Error DTExec: The package execution returned DTSER_FAILURE (1). Started: 12:35:01 Finished: 12:36:02 Elapsed: 61.292 seconds. The package execution failed. The step failed.
It started happening randomly about 3 weeks ago, no reason nothing had changed on the server. It is intermittently, no rhyme or reason on the timings.It occasionally happens on a Full or Diff backup but not as regularly as the T Logs are failing (at least 2-4 times a day) We have performed the following steps to eliminate the problem:
*Recreated all backups in Litespeed (which we are using), slightly adjusted the times just in case. This problem stopped happening for a couple of days, then started again and was worse.
*I have recreated one of the plans in Management Studio, as yet there are no failures but I am still monitoring this.
*When recreating the backup plans for the second time, I created a new server connection in case it was this. The backups are still failing.
*I have checked the server connections, everything is fine, all users being used ar fine.
For the majority of the failures it seems to be at the clean up maintnenace task point, rarely is it ealier.
We do have some SSIS packages running on the 4 instances on this cluster, and I am trying to look at error handling generally on this.
Can anyone offer any advice/pointers on where to look? I have literally tried all ideas from researching and am at a loss as to why they are still failing. Backups are being done, it;s not failing on that part.
Many thanks in advance!

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 736455 More actions · Answer 1

First, turn on more logging from plans or SQL agent. Use a log file.

By default what you have doesn't provide enough information to determine the issue.

A few questions. Are you backing up to a shared clustered resource? I find people sometimes don't do this.

Second, are you backing up to new files all the time? I sometimes see people backing up to the same file, or appending to a file, and the file gets locked by some other process.

Kazmerelda SSCarpal Tunnel Points: 4667 More actions · Answer 2

Hi Steve,

Thank you I am still learning about what information to provide on here, so apologies if I didn't give enough and I will try to provide more. What information is the best to give?

Yes we are backing up to a shared cluster resource. The drives for the backups are unique for each instance and as much as we can there is no overlap with any other backups timings wise for that instance. We are also backing up to a new file each time, there is no overwrite at all.

I will check litespeed for the logging options.

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 3

Have you tried logging a call with Quest support? I usually find them very helpful.

John

Steve Jones - SSC Editor SSC Guru Points: 736455 More actions · Answer 4

When you say unique to each instance, do you mean each instance on each node, or do you have multiple instances on the virtual clustered instance? The jobs on Node A, should match exactly the jobs on Node B. In other words, all jobs should be through the virtual instance, not on each node.

You can log more in the job step itself. Use and output file to capture more info.