Back-up jobs taking forever to run

Question

Post reply

Back-up jobs taking forever to run

mattjk

SSChasing Mays

Points: 608
More actions
November 19, 2008 at 5:49 pm

#383425

Hi,
We are experiencing an issue with backup jobs taking an excessively long amount of time to run, and I was hoping someone herre might have some insight as to where the problem is.
The environment: 4 x SQL instances (1 x 2005, 3 x 2008) running on one server, all performing file-based backups via the network to disks located in a different server.
Each instance (and individual databases) have different backup schedules, but a general rule-of-thumb is that each databases does a complete backup once a week, differential each night, and transaction log every 15-30 minutes. All backup jobs were created using Maintenance Plans (wizard), and each job consists of the backup step followed by a cleanup step which deletes back-up files older than x weeks.
The problem is that every back-up job executed by 3 of the instances takes a looong time to complete. Testing has shown the actual back-up performs at a "normal" pace, but the job then sits for many minutes processing the clean-up step (deleting old backup files). As soon as I remove the clean-up step, or temporarily move most/all of the old backup files to another folder (so there are far less files to process in the clean-up step), the cleanup step & hence job completes quickly.
The more files to process, the slower it gets - this makes sense, but at the moment we're talking about < 500 files to process for each back-up job, which I wouldn't think is excessive?
Another oddity is that ONE of our 2008 instances is unaffected - it's back-up jobs (with the same clean-up steps) execute in seconds. If anything this instance should take longer - it has the most frequent back-up jobs (so more backup files to process during clean-up step) and most data to backup (it's databases are far more active than the other instances). The only difference between this instance and the others is that it is the default instance, rather than named.
To give an example:
Default instance, backing up ~30MB of TLOG, with approx 1,000 TRN files in backup folder to process in clean-up step - takes about 10 seconds.
Other instances, backing up ~500K of TLOG, with approx 500 TRN files in backup folder to process in clean-up step - takes about 25 MINUTES.
Can anyone think of why our clean-up steps are taking so long to run? And why one of our instances wouldn't be affected, while the other 3 are?
Thanks.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Wilfred van Dijk SSCrazy Eights Points: 9003 More actions · Answer 1

Try defrag your disks

Wilfred
The best things in life are the simple things

Mike John SSCertifiable Points: 7216 More actions · Answer 2

I suspect there may be some sort of authentication issue causing timeouts while checkinmg permissions. I would check the account dteails the jobs are under, permissions to those accounts, access to domain server etc etc. Only a hunch but it may help.

Mike John

Wilfred van Dijk SSCrazy Eights Points: 9003 More actions · Answer 3

Other thing I was thinking of:

If you delete a backup, SQL has to do some maintenance in the msdb database. So deleting an outdated backup results in:

- a DOS (yeah, I know) delete statement

- in some SQL DELETE/UPDATE statements (?)

Maybe msdb is causing the problem? Try rebuilding the indexes in msdb

Wilfred
The best things in life are the simple things

mattjk SSChasing Mays Points: 608 More actions · Answer 4

Firstly, apologies for the ridiculously slow reply - I completely forgot that I posted about this!

A couple of answers below, but I have also taken this up in a different thread as I've done more diagnosis / troubleshooting in the meantime. New thread here: http://www.sqlservercentral.com/Forums/Topic635989-357-1.aspx

Wilfred - Try defrag your disks

Thanks for the idea but it's not a disk performance issue.

Mike - I suspect there may be some sort of authentication issue causing timeouts while checkinmg permissions. I would check the account dteails the jobs are under, permissions to those accounts, access to domain server etc etc. Only a hunch but it may help.

As per my newer post, I've tried giving the SQL service accounts Full Control permissions to the share they're targetting (and even Domain Admin rights very temporarily) and this didn't help.

Wilfred - Other thing I was thinking of:
If you delete a backup, SQL has to do some maintenance in the msdb database. So deleting an outdated backup results in:
- a DOS (yeah, I know) delete statement
- in some SQL DELETE/UPDATE statements (?)
Maybe msdb is causing the problem? Try rebuilding the indexes in msdb

I haven't investigated this in details but I don't think it is the issue:

1) Further diagnosis (see new thread) points at specifically the xp_delete_file stored proc call being the problem

2) The affected SQL instances are all pretty small-fry and on over-powered hardware so I doubt msdb performance is an issue. e.g. msdb on all instances is 10-20mb and the server has 4GB of RAM, 20-30% CPU load accross 2 CPUs, and disks on a NetApp SAN.