Log Shipping Secondary Jobs Hang

Question

Log Shipping Secondary Jobs Hang

SwedishOrr

SSCrazy

Points: 2955
More actions
May 25, 2010 at 9:29 am

#219353

Hello all,
I've got log shipping set up for several databases from Server1 to Server2. I consistently have the problem of Copy and Restore jobs just hanging on Server2. I mean, they are in a constant state of running. So far the only way I can fix this is to redo the entire log shipping setup for each database.
Is there a way I can figure out why my DBs are doing this on a daily basis? Furthermore, there's no consistency with regards to the DBs whose jobs hang. Also, is there a way to resolve this without having to redo the log shipping setup?
Any help is greatly appreciated. Thanks.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Jayakumar Krishnan SSCarpal Tunnel Points: 4710 More actions · Answer 1

Even I've come accross this same issue, but in my case only the LS Restore job took more time to complete and there is no issues on LS Copy job. I've tried two work arounds as below.

1. As you said redo the entire setup

2. shrinking the Transactional Log on the primary. When you see the DBCC Loginfo output and if you see unused (status=0) more virtual log files inside the Transaction log in the database then that might cause the LS Restore job to take more time to recover the database on the secondary for each Tlog backup restore,which may lead into the keep the job always in run state. So try to shrink the Tlog on the Priamry and check if needed after shrink, take a full backup and restore on the secondary without breaking the Log shipping.

Hope this helps...

Thanks
Jay
http://www.sqldbops.com

Mark Shepherd-435962 Mr or Mrs. 500 Points: 568 More actions · Answer 2

I think you need to find out why the restore of the log file is taking so long.

Do a manual transaction log restore (i.e. restore log abc from disk = '<logshipping file folder/abc.log>' with norecovery) and look at the mb/sec for the restore. If this is very slow then you may have a disk or network problem depending on where you are restoring the logs from.

If it is slow then get this sorted, try different log file locations, on different local storage or network shares.

Check also that you don't have a lot of old log files sitting in the log shipping folder for the transaction logs.

This will cause the restore job to scan and try to apply all log files until it gets to the latest file to restore. You will also see high CPU utilization while this is occurring.

Hope this helps.

SwedishOrr SSCrazy Points: 2955 More actions · Answer 3

Mark, can you elaborate on how I can look at the mb/sec for the restore? Thanks.

Mark Shepherd-435962 Mr or Mrs. 500 Points: 568 More actions · Answer 4

When you do a log restore manually you should get something like this.

Processed 2 pages for database 'Testoct2', file 'test_Log' on file 1.

RESTORE LOG successfully processed 2 pages in 0.074 seconds (0.214 MB/sec).

Minaz Amin SSChampion Points: 11052 More actions · Answer 5

Try restoring tlog manually and check the profiler why it takes such a long time. Another question do you have any anti virus install on the server 2 where you are coping and restoring the tlog?

what is your OS and SQL Server SP pack?

Let us know

"More Green More Oxygen !! Plant a tree today"

SwedishOrr SSCrazy Points: 2955 More actions · Answer 6

Several days ago I ran into the issue again and had to redo log shipping for about 40 databases. Fun.

When the crud hit the fan, there were tons of errors in the windows event log for .NET framework 2.0 and sqllogship.exe. The errors ran the gamut of possible errors it seems. One that stood out as particularly numerous was an out of memory exception thrown by .NET framework 2.0.

I went back to my server, checked on the number of jobs running for log shipping copy or restore, and the number was 162. Each of these jobs was set to kick off at the same exact time.

I put together a script to stagger the job start times so that they don't all run at the same time. Since implementing that, I have not run into this issue again.

I suspect that the number of log shipping jobs trying to run at once was causing the pain and errors. Does this sound reasonable? Or is this nonsensical?

I'm a bit hesitant to try to replicate this issue, as this is a production environment.