Things to check after an unexpected power outage

  • We've had a couple of experiences recently with power outages, (sometimes people pulling at wires that they shouldn't be pulling at and sometimes genuine power failures).

    I currently just run a DBCC CheckDB for each Database, does anybody have a set routine that they go through on a server restart to check the integrity of the SQL Server databases, or maybe a set of routines to ensure that integrity and consistency of the database

     

  • How about making sure that the server never has an outage by having redundant power sources, and one battery that can make it work just long enough that you can at least try to redirect on another server??

  • Aside from Remi's thoughts. The server will check itself for integrity and let you know that transactions are rolled back or forward. The checkdb should get any other issues. You might check the processes that may have died, SQLAgent ones, get restarted if needed.

  • Generally, SQL does a good job of recovering after a sudden outage like that.  However, some IO systems can cause real problems.  Most modern IO systems contain significant amounts of RAM used as a cache.  Normally we like this because it improves performance.  The problem is that in a power outage if your IO system doesn't have a UPS of some sort, the cache will be lost.  Hence the possiblity exists that data will be lost and completely unrecoverable. 

    If you have TORN_PAGE_DETECTION on SQL Server will detect this problem, but the only way to recover from it is to restore the database from backups.  DBCC CheckDB will also catch this problem.

     

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • To answer the original question (our site 'standard process') ...

    1) Check the Windows event logs (System, Application) for any errors prior to the 'outage'. I'd look out for disk errors or hardware errors related to disk controllers, the network or SAN attached storage in particular (they can cause database corruption or data loss).

    2) Check the SQL Server errorlog (probably errorlog.1 ... if the server is already back up, errorlog ) for any error essages. If you see 'shutdown by/requested' type messages as the last line in the log prior to the outage then you're probably OK. If you do these messages at the end right after SQL Server starts (if you can take the databases away for a bit) run DBCC CHECKDB and DBCC NEWALLOC/CHECKALLOC for all databases (both system and user). If they are

    3) Check the Services in Computer Management for SQL Server and the SQL Agent to see if they are up. If not start them.

    4) Check the SQL Server errorlog for normal startup (don't forget the SQL Agent log too !).

    5) Check the  the Windows event logs (System, Application) for normal startup.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • Thanks Guys, some useful info there

  • We have a very good UPS, a diesel generator and a good fire suppression system in our data center. Unfortunately our building is going through a major remodeling and two Friday afternoons ago a painter accidentally hit the emergency kill power switch to our entire data center. Mainframe,backend storage, 300+ Windows / UNIX servers and phone system turned off instantly. All SQL Servers DID come back up just fine. I ran integrity checks on all dbs. I guess you can spend millions of dollars for protection but you cannot do anything about human error !

    All in all it was a good 'test' of our recovery procedures on bringing everything up in the correct order.

  • I'd say... Did you put additional fail-safe on that switch???

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply