January 24, 2011 at 2:29 pm
Over the weekend, the maintenance cleanup task (to delete old backup files) that proceeds our daily backups stopped working. As a result, all of the space on the backup disks was consumed, and consequently our backups started failing.
Interestingly, the task continued to run, and report success even though it was no longer deleting old files as it should. The only errors reported by SQL Server were backup failures when the disk space ran out.
I'm aware that the maintenance cleanup task executes a system proc, 'master.dbo.xp_delete_file', and I'm also aware that this proc won't delete any ol' file. It will only delete backup files created by SQL Server.
My question is - Can anyone provide information detailing exactly what constraints 'master.dbo.xp_delete_file' follows?
One factor I'm anticipating to be important, is that the SQL Server (2005) is a two node cluster. And the active node changed immediately prior to the when the maintenance cleanup task stopped working. The new active node took over the reigns happily, and continued executing the maintenance plans as it should. Hourly transaction log backups and daily full backups continued without a hitch. However, as detailed above, the old files were no longer being deleted. As time passed, and the new node built up a history of backup files, the maintenance cleanup task started to delete them - but only files created by the new node. Unfortunately, as a result of the old node's files, the disk filled up - and the backups crashed and burned.
Has anyone else noticed this pattern of behaviour? Is it just coincidence, or will 'master.dbo.xp_delete_file' only delete files created by a specific server (as opposed to SQL Server instance)?
January 24, 2011 at 3:34 pm
That sounds like a permissions issue to me - which would explain why files from the other node cannot be deleted.
A couple of things to check:
1) Are you backing up to a network share, a backup drive in the same cluster group as SQL Server or to a share from a different cluster group?
2) What account is identified as the owner of the files that can be deleted, and the files that cannot be deleted?
3) What login/user is the owner of the maintenance plan and agent job?
4) What login/user is running SQL Server and SQL Server Agent?
Somewhere in there is probably an issue with the SID, where the SID is different on each node and therefore those files from the other node cannot be deleted.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
January 24, 2011 at 4:53 pm
Hi Jeffery
Thanks for your reply.
1) Are you backing up to a network share, a backup drive in the same cluster group as SQL Server or to a share from a different cluster group?
The backups are all targeted at 'local' SAN disks. These disks are resources that are assigned to the active node (which ever one that may be). Access to the disks is identical between the nodes using simple drive letters. X:\, Y:\...etc
2) What account is identified as the owner of the files that can be deleted, and the files that cannot be deleted?
Thanks! I hadn't thought to look at the 'owner' of the file. Interestingly, and I can't explain why - but they are different. Files created by the currently active node are owned by the domain account that runs the SQL Server service (which is the same as the account that runs the Agent service). Whilst files that were created by the other node are 'owned' by the current nodes local administrator group.
3) What login/user is the owner of the maintenance plan and agent job?
The owner of the maintenance plan jobs (as listed under SQL Agent - Jobs) is me (well....my domain account).
4) What login/user is running SQL Server and SQL Server Agent?
Both nodes of the cluster have exactly the same accounts assigned to the run the various SQL services. The account is a domain account.
I was surprised to see the different file owners, and it seems as though that is central to why the files cannot be deleted - but I can't see why they would be different. There are no 'local server' accounts involved, they are all domain accounts which should result in the same SID's. At least it is something I can investigate further.
Thanks again,
Dwayne
January 24, 2011 at 5:23 pm
Solved.
With 'differing ownership' as the clue I found the cause of the problem. It turns out that on one node, the account running the SQL services is a member of the local servers administrators group, whilst on the other node it isn't - don't ask why, thats another problem I'll need to look into :-).
The issue is that when files are created by an administrator, ownership is assigned to the administrator group not the specific account which creates the file. So, when the backups were done (on the first node), although they were created by my service account, they were owned by the local administrator group. When the nodes changed, even though the same account was running the SQL services, it didn't have permissions to delete the backup file because it was not a member of the administrator group on that node.
So - Jeffery was right - Configuration/permissions error.
Thanks for your help!
Dwayne
January 24, 2011 at 8:06 pm
I am glad you were able to solve the issue - and interesting that being in the administrators group changes the owner.
I would recommend that you change the owner of the maintenance plan to something other than your domain account. If your account is ever locked out - your jobs will not run.
To change the owner, you need to update the owner sid in the msdb database, table is dtspackages90 (I think this is the one, don't have my script handy right now).
After changing the owner, open the maintenance plan and save it again. This will update the agent job and set the owner to the maintenance plan owner. I set the owner to sa on my systems.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply