SQL goes into Recovery Mode after Cluster Fails Over

  • I have a SQL Cluster Set up that works great, however, when a node fails and it fails to the other node, my database goes into this 30/35 min. Recovery Mode.

    I did a backup of the Transaction Logs before the test and it still took 35 minutes for the database to come back up. This delay nullifies our Cluster Fail Over.

    What am I missing? I am running SQL 2005, w2k3 Enterprise, Transaction Logs are about 70g, Database is about 70g

    Thanks in advance.

  • How often do you backup your transaction logs? Why is the transaction log file 70GB and the database 30GB?

    My guess is that you have a lot of transactions that need to be rolled forward/back during recovery for that database.

    Another way to test would be to take SQL Server offline, and bring it back online. How long does it take to recover the database? It should take the same amount of time as a failover.

    If it takes less time to recover on the same node, then you probably have an issue with the cluster and cluster resources coming online. I doubt this is an issue though.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Did you run something before fails? or any jobs?

  • Jeffrey Williams (6/18/2009)


    How often do you backup your transaction logs? Why is the transaction log file 70GB and the database 30GB?

    My guess is that you have a lot of transactions that need to be rolled forward/back during recovery for that database.

    Another way to test would be to take SQL Server offline, and bring it back online. How long does it take to recover the database? It should take the same amount of time as a failover.

    If it takes less time to recover on the same node, then you probably have an issue with the cluster and cluster resources coming online. I doubt this is an issue though.

    Sorry, it is not 30g, it is 70. I have made the change on the original post

    Also, I ran the Transaction Log backup a few hours (10hrs) prior to the test.

    you are correct in that if I take the database off line it takes just as long for it to come back up.

  • Soojin Yoo (6/18/2009)


    Did you run something before fails? or any jobs?

    no, since this was the fist backup ever on the Transaction Logs we did a manual one. We did a full backup and it did not fail.

  • Also, I thought that by doing the backup on the transaction logs it would 'comit' those transactions hence not having to roll back/forward much.

  • SQL Server has to roll back/forward all transactions that are hardened in the transaction log. Those transactions will be hardened after a transaction log backup is performed.

    Since you are not backing up the transaction log on a regular basis - SQL Server has to process all VLF's in the transaction log which, as you say, is 70GB's worth of data.

    I would setup a job that backs up the transaction log every hour (at least, possibly every 15 minutes). Once you have that job setup and running, monitor the size of the transaction log backups. After you have a good idea of how large the transaction log backups are on average - shrink the transaction log to a size a bit larger than that. I generally round up to an even multiple of 500MB/1000MB.

    With that done, your recovery time should be much faster.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams (6/18/2009)


    SQL Server has to roll back/forward all transactions that are hardened in the transaction log. Those transactions will be hardened after a transaction log backup is performed.

    Since you are not backing up the transaction log on a regular basis - SQL Server has to process all VLF's in the transaction log which, as you say, is 70GB's worth of data.

    I would setup a job that backs up the transaction log every hour (at least, possibly every 15 minutes). Once you have that job setup and running, monitor the size of the transaction log backups. After you have a good idea of how large the transaction log backups are on average - shrink the transaction log to a size a bit larger than that. I generally round up to an even multiple of 500MB/1000MB.

    With that done, your recovery time should be much faster.

    ok, so I am confused. When I backed up the transaction Logs; the logs were 64g. When I rebooted the server the logs were 70g. When I backed up the Transaction Logs shouldn't that have done/comitted the logs? In essence only having 6g of logs to roll back/forward instead of the entire 70g?

  • There appears to be some confusion here around what is stored in the transaction log and the database and what a log backup does.

    Writes to the SQL database (inserts, updates and deletes) are stored in the log. Each transaction, whether an explicit transaction starting with a BEGIN TRAN or an implicit transaction of a single statement or all the variations thereof, is written first to the transaction log. Once the write is in the log the user is told that the write was complete, even though it's not yet in the database. The lazy-writer and checkpoint processes "follow along behind" and flush those writes (stored as "dirty pages" in memory) to the database.

    A log backup writes everything (since the last log backup) to a backup file. It does not affect in any way the writing of transactions to the database.

    The key thing here is that a transaction must be marked as "complete" in some way. This can either be an explicit COMMIT or ROLLBACK statement or something implicit like a GO statement or your connection's IMPLICIT_TRANSACTIONS setting.

    If a transaction is not completed in some way and/or has not been written to the database when the service/database is stopped then SQL Server has to recover the database when the service restarts. This entails reading through the log file, rolling back transactions that have not been committed ("unwriting" them from the database) or rolling forward transactions that have been committed in the log but had not been written into the database.

    Far and away the most common reason for long recovery times is uncommitted transactions that have to be rolled back.

    A good clue that this is the cause is the usage of the log file after a log backup is taken. If there's still a lot of used space in the log file after taking a log backup it means there's at least one open transaction that started a fair while ago. Common causes for this is having your users' IMPLICIT_TRANSACTIONS set ON instead of the default OFF (especially if they leave their connections open), applications that allow transactions to be held open (again, especially when a user walks away from their machine in the middle of a session), large data loads, index rebuilds and the like. The first two scenarios need to be investigated and addressed if possible.

  • It is my SharePoint database that is causing this issue.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply