I was reading a blog post by Steve Jones on Monday and I started to write a comment on his blog, as it turned out it that comment turned into this post. Now this week is SQL Server disaster recovery week over on sqlservercentral.com so I thought I’d get involved with a post of my own. Steve’s post was titled ‘Grace Under Pressure’ and Steve talked about a power failure in a server room at a large company he used to work at. Steve was in the data centre at the time of the power outage, along with a Senior Executive, the Senior Executive didn’t seem to handle the situation all that well and definitely didn’t make it an environment whereby the system administrators and DBAs could go about their business, and let’s face it, a quite stressful business of recovering production servers that have unexpectedly lost power.
No doubt the root cause analysis of the issue revealed that the power outage was a direct result of the Senior Executive being in the data centre 🙂
Joking aside I have experienced first-hand those types of disasters that Steve talked about in his post
Power Outages to server rooms
Server room flooding
SAN disk issues that caused a filer crash and took several servers with it
I have been very fortunate in my career, many years ago, my first full time DBA gig was a great job and I got to work with a lot of great people. The company was good enough to fund training courses and more importantly gave me a great deal of experience working with SQL Server and also the opportunity to learn from a great team of people. I got a taste of a first real disaster during my time here, I experienced lost power to a server room and it caused chaos.
Some of the simple things can help in this situation and because of the training and guidance I had received from some great people that used to work there when disaster did strike I was prepared with the following:
- We had a fully documented run-book for each production server. This detailed everything from SQL Server versions, databases installed on the server. When the backups were scheduled to run, the location of the backups (On server and off server) and more importantly what needed to be done to recover the databases on the server
- We were expected to test recovery scenarios regularly and ensure the documentation was up to date as a result. Recovering a database is stressful enough, and recovering a production server is not the first time you should be experiencing a server recovery. If you have practiced database restores, recovering from server crashes when you come to do it for real you have experience of doing it and a nice document to help you on your way
- Local copies of SQL Server books-online installed. With SQL Server 2012 the help files need a little more to install locally. A client of mine didn’t think installing BOL locally was important as you get that off the internet. This is great when things are well and you can get to the internet. If you experience one of the disaster I listed above the internet might not available to you. I would recommend you download and install local copy of the SQL Server documentation you never know when you might need it.
I was also fortunate to have a really great manager who ‘kept the wolves from the door’ dealing with the business dealing with high ranking people keeping them informed but more importantly keeping them off the back of the people trying to fix things. He knew they were stressed as the business was suffering, he also knew that if they interfered it would likely to take much longer to fix. It’s like a vicious circle, things take longer to fix, senior people get more irate and round and round the circle goes.
I visited New York last week, more about that in another post, and it was apparent that the hurricane/storm that hit a few weeks back was still having a severe impact on homes and business in NYC. There are still building closed, some are still flooded and being pumped out. The phones lines are still available for some; you have to pay cash in some bars and restaurants. No doubt the storm is still costing the NYC economy.
IT disasters can happen to anyone, to handle them with grace you need to be practiced, prepared, have a plan and work as a team with no infighting, finger pointing or assigning blame to fix the issue as quickly and efficiently as possible.
For more information on SQL Server disaster recovery planning and other services we offer visit SQL Server Consulting page