Data Center Power Outage

  • Hi,

    We had data center outage a few days ago. We have over 140 SQL Server physical servers with over 500 databases. After the outage, we were tasked to check availability and integrity of all the server instances and databases.

    We do not have a centralised control or monitoring tool at SQL Server level to check the health of all the severs and databases.

    We do have home grown T-SQL based system that monitors all the servers and we used that to make sure every is up and running.

    I would like to have expert views on what is the best way to monitor and check the health of all server and databases when you do NOT have option of buying such tool.

    Thanks

    Shriji

  • You'll want to set up a script, using dynamic SQL, that will step through all databases on a server and do a basic health-checkup. Then copy that and a job to each server, so it runs regularly.

    It's something you'll want to run periodically anyway, if you're responsible for keeping all those servers and databases up and running. Don't wait till after disasters to run it.

    What happened to the datacenter? From the title here, it sounds like you don't have a backup generator or something like that. You'll want to bring to management's attention that backup power is something that needs to be set up, and periodically tested and validated.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks for the reply.

    We are actually doing what you suggested and we are controlling these individual scripts on all the server that are run by schedule jobs, through a dedicated server for the home grown monitoring system.

    I am curious to know what others are doing.

    Backup power slipped into bypass mode and could not support the power outage. We are well equipped with such scenarios...it was like an oops moment... 🙂

    Thanks

    Shriji

  • Makes sense.

    I interviewed at a company once, where they tested their backup generator every month. And that used up all the fuel in it, so when they actually needed it, it only ran for about half an hour during a 4-hour blackout. Had a fuel tank that was designed to last for 2 days, but five years of monthly testing had burned almost all of it, and nobody ever thought to check that. They had a lot of questions about how to recover servers after a hard-crash caused by that, during the interview process.

    I've done the same thing you are. I have a DBA database on every server I administer. It has scripts in it for managing backups, testing restores, doing health checks, intelligently managing index fragmentation, and a few other things. All configurable through a settings table. Lots of complex dynamic SQL to control it, but it's been working beatifully for me for years on several different DBA jobs.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I was thinking to convert this T-SQL based system to SSIS solution using Maintenance Plan Task components. I haven't done much research on it but seems like doable.

    Do you think it is a good idea? Do you have any experience with it?

    Thanks

    Shriji

  • I find maintenance plan tasks to be a good stopgap till you can get something "real" in place. They're very, very useful for new/accidental/incidental DBAs, but too limited compared to what you can do with T-SQL and a few CLR/xp_cmdshell/SSIS options.

    SSIS packages would be great for it, but I really don't bother with the maintenance plan pieces.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • hi,

    i think you can go for powershell for monitoring all the servers, this is one of the best options i found.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply