Possible reasons for really long database recovery

Question

Possible reasons for really long database recovery

sqldriver

SSChampion

Points: 10440
More actions
September 23, 2014 at 10:03 am

#311757

Hello,
Overnight I ran restores with NORECOVERY for a 5tb database, along with several other 250GB - 1.3tb databases. I did that because sometimes our 3rd party backup software throws wacky LSN errors if it thinks something timed out that really didn't. Anyway, the restore completed with no errors. This morning I started running the RESTORE WITH RECOVERY commands to bring them online for DBCC checks.
Ran on the first database fine, got stuck on the 5tb database for about an hour, then I ran the commands on the remaining six databases in parallel to the 5tb. All of them recovered very quickly. The 5tb database has been restoring now for about 3 hours.
Any thoughts on:
Why this in particular would be taking much longer than both:
a) how long it usually takes to recover, and
b) much longer than other databases took to recover?
If I should cancel/kill the transaction and try to start over
Error log is attached, though there's nothing all that interesting in it.
I can see it doing something in sp_WhoIsActive, i.e. informational columns are incrementing, and the wait info changes (though the wait that it always seems to be on is IO_COMPLETION), though status seems to always be suspended.
While researching I came across one article that suggested high VLF count can cause this, and there are about 5k VLFs in the log file, but I'm on a version of 2012 that is patched for the issue:
Microsoft SQL Server 2012 - 11.0.5532.0 (X64)
Jul 14 2014 15:00:27
Copyright (c) Microsoft Corporation
Developer Edition (64-bit) on Windows NT 6.3 <X64> (Build 9600: ) (Hypervisor)
Thanks

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

sql-lover SSCoach Points: 18530 More actions · Answer 1

Yes, VLFs count. Fix them. They can get out of control due improper auto-grow settings for your Tlog file. If you have thousands of them, it will take longer for SQL server to read them affecting the recovery interval.

I think I have a script that you can modify and use... let me check ...

sql-lover SSCoach Points: 18530 More actions · Answer 2

Take a look on this article that Kimberly wrote: Transaction Log VLFs – too many or too few?

Attached the script I used. Please test 1st, on Dev environment. Adjust upper size to your particular case.

dbafromthecold@gmail.com SSChampion Points: 10160 More actions · Answer 3

The guys here and over at Stack Exchange have got it right (IMHO).

I inherited a 3TB database with thousands of VLFs (in three .ldf files on the same disk!) which took an hour to recover.

Removed two of the.ldf files, reduced the VLFs to under 100 and the database recovered instantly after a server reboot!

sql-lover SSCoach Points: 18530 More actions · Answer 4

DBA From The Cold (9/23/2014)
The guys here and over at Stack Exchange have got it right (IMHO).
I inherited a 3TB database with thousands of VLFs (in three .ldf files on the same disk!) which took an hour to recover.
Removed two of the.ldf files, reduced the VLFs to under 100 and the database recovered instantly after a server reboot!

Did I read well? Three ldf files on same disk? ... :crazy:

dbafromthecold@gmail.com SSChampion Points: 10160 More actions · Answer 5

Yes, you did.

The database was 3TB in size, in a .MDF file and two .NDF files.

They must have thought each needed a .LDF

sqldriver SSChampion Points: 10440 More actions · Answer 6

Yeah, I'm re-working the VLFs now. Doesn't help me with the current restore in process. Curse these SATA drives.

sql-lover SSCoach Points: 18530 More actions · Answer 7

sqldriver (9/23/2014)
Yeah, I'm re-working the VLFs now. Doesn't help me with the current restore in process. Curse these SATA drives.

fyi

On some cases, you may find that it is better to do this during a downtime window. Not because you will break something during the process, but due the possible locks that some active clients generate; those locks can block any DDL statement you may be running. Also, if you do this during a downtime window, you can immediately restart the SQL instance and measure recovery time on those big clients.

sqldriver SSChampion Points: 10440 More actions · Answer 8

I do monthly VLF maintenance, so I'm aware of possible issues. These databases belong to a 3rd party vendor and are just kind of nightmares in general. If I could get the okay for a maintenance window, the servers probably wouldn't be running 2008 RTM :crying:

Thanks

sql-lover SSCoach Points: 18530 More actions · Answer 9

DBA From The Cold (9/23/2014)
Yes, you did.
The database was 3TB in size, in a .MDF file and two .NDF files.
They must have thought each needed a .LDF

Wow! One of those stories you may want to tell to your children or co workers, during christmas party.

Long time ago, I fixed a database downtime issue where the former DBA deleted the ldf. It was SQL 2000. He thought it was too big. Imagine that! He was later fired during a cost reduction program.

dbafromthecold@gmail.com SSChampion Points: 10160 More actions · Answer 10

sql-lover (9/23/2014)
DBA From The Cold (9/23/2014)
Yes, you did.
The database was 3TB in size, in a .MDF file and two .NDF files.
They must have thought each needed a .LDF
Wow! One of those stories you may want to tell to your children or co workers, during christmas party.
Long time ago, I fixed a database downtime issue where the former DBA deleted the ldf. It was SQL 2000. He thought it was too big. Imagine that! He was later fired during a cost reduction program.

Ouch, why would you delete it instead of at least trying to shrink it first?

Whenever I see a badly designed database I always think back to that one, then things don't seem so bad! 😀