Shipping 2ndary DB now suspect

Question

Shipping 2ndary DB now suspect

Jake Shelton

Hall of Fame

Points: 3109
More actions
March 24, 2012 at 5:45 am

#104136

Hi all,
We have a DB whose 15-min interval log shipping recovery job has been failing for 2 weeks (wasn't me, I'm the whistle-blower, not the negligent!) and is now marked suspect. I spotted the following error no's:
1) 3013(saw this in the Monitor)
2) 22029(saw this in the restore job)
Having googled both, I concluded that a particularly large log had been overtaken during the copy process by a subsequent, smaller one with a higher LSN, and thus when SQL later tried to restore the earlier, larger one, the restore process failed as the log chain was now broken. Now, I presume the only way to fix is to:
- Disable all three jobs (Backup/Copy/Restore);
- Run SP_Resetstatus against the suspect secondary DB;
- Take a FULL backup of the primary DB;
- Put secondary DB into single-user mode;
- Restore full backup to the secondary;
- Put secondary DB back into multi-user mode;
- Re-enable the three jobs.
but I have some concerns in this regard...
1) Have I got the above steps correct in content and sequence?
2) The restore job isn't running, BUT when I ran a CheckDB with no_info/All_errors, SQL replied that the DB WAS in the middle of a restore!
3) SP_Resetstatus has a pretty bad success rate, judging by comments on Google - given the above, what if this fails? Attach the backup file instead of restoring it?
Thanks in advance guys,
Jake

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Jake Shelton (3/24/2012)
Hi all,
We have a DB whose 15-min interval log shipping recovery job has been failing for 2 weeks (wasn't me, I'm the whistle-blower, not the negligent!) and is now marked suspect. I spotted the following error no's:
1) 3013(saw this in the Monitor)
2) 22029(saw this in the restore job)

Not many people memorise error numbers. Do those have a message?

Having googled both, I concluded that a particularly large log had been overtaken during the copy process by a subsequent, smaller one with a higher LSN, and thus when SQL later tried to restore the earlier, larger one, the restore process failed as the log chain was now broken.

SQL will never restore log backups out of sequence. It can't. A later log being restored would throw an error saying that an earlier log is missing, besides the log shipping job simply won't restore out of sequence.

1) Have I got the above steps correct in content and sequence?

Without more details on what's wrong, I can't say about correct, but if you're going to restore a full backup, just drop the secondary DB and reconfigure the log shipping from scratch.

2) The restore job isn't running, BUT when I ran a CheckDB with no_info/All_errors, SQL replied that the DB WAS in the middle of a restore!

Yes, that's how log shipping works. The secondary DB is left RESTORING, hence in the middle of a restore.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jake Shelton Hall of Fame Points: 3109 More actions · Answer 2

GilaMonster (3/24/2012)
Jake Shelton (3/24/2012)
Hi all,
We have a DB whose 15-min interval log shipping recovery job has been failing for 2 weeks (wasn't me, I'm the whistle-blower, not the negligent!) and is now marked suspect. I spotted the following error no's:
1) 3013(saw this in the Monitor)
2) 22029(saw this in the restore job)
Not many people memorise error numbers. Do those have a message?

Fair point, but I was in a RDP window that doesn't allow copy/paste to my local desktop.

2) The restore job isn't running, BUT when I ran a CheckDB with no_info/All_errors, SQL replied that the DB WAS in the middle of a restore!
Yes, that's how log shipping works. The secondary DB is left RESTORING, hence in the middle of a restore.

But you DO see the contradiction, I hope? When I attempted to stop the restore job, EM reported that it wasn't running. The CheckDB result claimed it was. Both are native SQL processes giving me conflicting info. This is crappy software.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

First, It isn't crappy software.

When you attempted to stop the job, exactly how did you do this? Please provide the exact steps or T-SQL code you used.

Second, although I haven't used Log Shipping, I have used Database Mirroring and the mirror database shows in restoring state. I would assume that a Log Shipped database would as well, so I would expect that as well.

Jake Shelton Hall of Fame Points: 3109 More actions · Answer 4

Lynn Pettis (3/24/2012)
First, It isn't crappy software.
When you attempted to stop the job, exactly how did you do this? Please provide the exact steps or T-SQL code you used.
Second, although I haven't used Log Shipping, I have used Database Mirroring and the mirror database shows in restoring state. I would assume that a Log Shipped database would as well, so I would expect that as well.

I admit when I got into EM I didn't notice the job status. I ran the DBCC, and on the strength of the return, tried to stop the job when it wasn't actually running, so of course got the standard "SQL cannot stop the job as it is not currently running" etc message. It's not inconceivable that the job stopped running in the few seconds it took me to switch from QA to EM.

The restore job is timed to run every 15 mins (if memory serves, I won't know for sure until tomorrow). The restore job history on the secondary states that the job was failing all day.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

Jake Shelton (3/24/2012)
But you DO see the contradiction, I hope? When I attempted to stop the restore job, EM reported that it wasn't running. The CheckDB result claimed it was.

There's no contradiction.

A database that has been restored WITH NORECOVERY is considered to be in the middle of a restore. It is considered to be in the middle of a restore even if the last restore job completed 2 months ago. That's because WITH NORECOVERY is used as part of a restore chain.

Hence no contradictory messages.

The last restore job that ran had the option WITH NORECOVERY. As a result the database is in the middle of a restore (as CheckDB indictated) until another backup is run WITH RECOVERY, thus completing the restore process.

A log shipping secondary will always be considered to be in the middle of a restore until the database is brought online (RESTORE DATABASE/LOG ... WITH RECOVERY), which you would only do when the principal fails.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

Now, about the corruption...

While I would still like to see the error messages to be sure, with a suspect log shipping secondary you're probably best off dropping the secondary DB entirely and restarting log shipping (new full backup from the primary)

Also make sure you do some root-cause analysis on the secondary. Corruption (and the resultant suspect status) is almost always a result of a problem in the IO subsystem.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jake Shelton Hall of Fame Points: 3109 More actions · Answer 7

GilaMonster (3/25/2012)
Now, about the corruption...
While I would still like to see the error messages to be sure, with a suspect log shipping secondary you're probably best off dropping the secondary DB entirely and restarting log shipping (new full backup from the primary)
Also make sure you do some root-cause analysis on the secondary. Corruption (and the resultant suspect status) is almost always a result of a problem in the IO subsystem.

Agree completely. Drop + Attach or Drop +Restore?

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

Restore. You can't attach a database and leave it restoring, which is required for log shipping to work.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jake Shelton Hall of Fame Points: 3109 More actions · Answer 9

GilaMonster (3/25/2012)
Restore. You can't attach a database and leave it restoring, which is required for log shipping to work.

Cheers Gail. A bit hung over still from last night. 🙂