How to stop database restore that survived reboot showing progress in error log

Question

How to stop database restore that survived reboot showing progress in error log

Mark D Powell

SSCarpal Tunnel

Points: 4775
More actions
July 22, 2020 at 6:26 pm

#3773918
SQL Server 2016 SP CU7 on Windows Server 2012 R2 Standard. Backstory: Agent job runs daily that performs restore of database in about 30 minutes (360G MDF). All of a sudden one day restore still running hours later. Waited it out. Ran in 18 hours then several hours later early morning restore ran as normal. Everything fine for a week or so then problem reoccurs. Customer wanted faster fix. Told Agent job to stop. It took a little while but did; however database in use even though I disabled most users that access it. Checked for process with spid > 50 none. Checked under 50 and found 6 which you cannot kill because they are SQL Server processes. Tried reboot of server. On restart discovered database still in use by same and error log shows recovery at 1% (since advanced to 19% but taking forever). Also note recovery message show 20% at once then backed back down to 18% and advanced again from there. I thought about stopping SQL Server, removing MDF and LDG, restarting, but if SQL Server actually restarted the restore operation then I would end up in the same place. Anyone recognize what is actually happening and can offer suggestions on how to resolve.
```
Message
Recovery of database 'XX_XXXXXX' (12) is 20% complete (approximately 27762 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
```
Earlier message before reboot were Phase 2 of 3.
Mark D Powell

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Alejandro Santana SSCommitted Points: 1838 More actions · Answer 1

What it looks like your database is doing is the recovery after restarting the service/recovering the database, this phase can take quite a while if you don't backup your transaction logs or if you have a open transaction in a heavy table.

We once had an issue which was the next one.

one transaction was left open for like 24 hours (by mistake, lord), the next days we have an automatic restore of one of the main databases, this restore took way longer in the recovery phase, why? because that open transaction never let the log backup clean up the ldf file so we basically had 24+ hours of transactions 90+GB of transactions in there and when you have to restore the recovery is gonna take ALOT of time, because it has to go through the analysis of all these transactions and commit and rollback the transactions one by one, this phase is called "Three phases of recovery : the analysis phase, the redo phase, and the undo phase".

So if you don't backup your log for quite a while, after the restore the database will have to go through the phases said earlier to put the database in a consistent state.

There might be different cases to this issue but so far, this is the only one i had.

I don't think you can do anything but wait till the recover ends.

Bonus:

in SQL Server 2019 & Azure this doesn't happen if you enable the new feature Accelerated database recovery.

regards,

Mark D Powell SSCarpal Tunnel Points: 4775 More actions · Answer 2

Alejandro, thank you for the reply. The database is in SIMPLE recovery mode with the backup coming from another server where the database is also in SIMPLE recovery mode. But an open long running transaction on the source would be incomplete and require rollback on the restore. Ended up stopping SQL Server and attempting to remove database. I say attempted because while the MDF file deleted without issue Windows refused to remove the log file saying SQL Server had it open. I had stopped the Agent and main SQL Service. I stopped everything and still the delete of the log reported in use by SQL Server. I rebooted, all the SQL services started, database was marked restore pending so I deleted it. I manually ran the restore. It worked but it took 3 hours compared to the hoped for 30 minutes. Database is about 300G, log was 95G. I truncated the log after the restore completed. Nightly job has run in about 35 minutes each of the two runs since. The restore is demanded in by an application job which runs on the source server so I am going to see if I can put something in that will capture the open transaction information and anything else I can think of. This was the third occurrence of the issue since May. We restore several databases this way and this is the only one on which we have hit this issue.

Sue_H SSC Guru Points: 90861 More actions · Answer 3

Next time or on the source databases, you may also want to check the number of VLFs in the log file as an excessive number can also slow the recovery time. You can find more information in the following article:

How a log file structure can affect database recovery time

Sue

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 4

Just guessing - but is it possible that a scheduled process on the source system is running at the same time as the backup? If there was a large load process - or update/delete/insert process in progress when the backup finished it would cause the symptoms you are seeing.

The restore process would then have to rollback the in-progress transaction to recover the database after the restore.

Since this appears to happen occasionally, it seems to me that some process is not completing before the backups and you end up with a recovery process during the restore.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

compiscowu Grasshopper Points: 11 More actions · Answer 5

I say attempted because while the MDF file deleted without issue Windows refused to remove the log file saying SQL Server had it open. I had stopped the Agent and main SQL Service. I stopped everything and still the delete of the log reported in use by SQL Server. I rebooted, all the SQL services started, database was mygroundbizaccount marked restore pending so I deleted it. I manually ran the restore. It worked but it took 3 hours compared to the hoped for 30 minutes. Database is about 300G, log was 95G. I truncated the log after the restore completed.

This reply was modified 5 years, 6 months ago by compiscowu.

ScottPletcher SSC Guru Points: 101238 More actions · Answer 6

SQL Server must pre-format log files (setting each page's non-control data to binary zeros, I think). That will take quite a while for 95GB (apparently 3 hours on your system 🙂 ).

Be sure your log uses a fixed amount for growth and not a percentage. Not only can percentage growth cause oversized logs, it is also slower than fixed-amount growth.

Finally, dynamically allocating log space takes extra time too. The entire db pauses while log space is extended. How many log extensions/additional allocations were need to get to 95GB? For example, if for some reason the log was set to grow by 10MB, 95GB would be 95000 extensions. So make sure the log growth is not set too small nor too large.

To avoid all the allocation and pre-formatting, you'd be better off restoring over the existing db, since that log space has already been formatted.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Mark D Powell SSCarpal Tunnel Points: 4775 More actions · Answer 7

I always use fixed size file growth for database and log files. This is a nightly restore over an existing database so some of your suggestions will not apply, but thank you for posting them. As far as I know the problem has only happened once more since my posting. I added an opentran capture into a table to try to determine if the problem is due to a long running transaction. I would almost rather not know what caused the problem since that would mean it re-occurs.