Mini Disaster - AC Failure

  • Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/awarren/mi

  • Been there, done that, got the T-shirt...  at previous work-place, the aircon in the server room failed one evening.  The first time our support guy noticed was the next morning, when she walked into the server room and thought "it's a bit warm in here". 

    Took us a long time to figure out all the extra damage afterwards -- a lot of machines were less stable afterwards, and it often turned out to be down to a component suffering some heat damage.  Indeed, one riser card was found to be "welded" to the motherboard several months later (after we'd replaced just about everything else on that server...)

    RAVEAC?  Redundant Array of Very Expensive Air Conditioners?

    Thomas Rushton
    blog: https://thelonedba.wordpress.com

  • Fortunately our building does have a backup generator so heat has been a minimal issue in our personal server room. But we have found keeping the lights off (we have these neat little lights we keep for walking in and out without need to reall turn lights on) and not haveing the monitors on makes a huge difference in itself. Every so often we pruchase a KVM to consolidate a few machines to a single monitor. Also we have a two sided room with electricitiy coming in from two ways. Generally when one side is out (which in itself is rare) the other is always up so if needs be we swing the plugs.

  • I've had a couple of experiences like this.

    I went into the server room to find a layer of smoke covering the top 6 inches of the server room, but no obvious source.

    It turns out that a server with 3 power units had had one burn out, unfortunately there was no way of telling which one of the three it was.

    The engineer said that if he pulled the wrong unit out then there was no way that the server could keep going on one.

    Well he pulled the wrong one out, and three cheers for HP the server did keep runnning!

    He put the working pack in pretty smartish and replaced the dodgy unit.  Whilst he was there he decided to check the 3rd pack, just to be on the safe side.  He pulled the pack out and there was a sudden flash and a lot of smoke and a 2nd dead power-supply.

    Somehow, through all of this the server kep running.

    In case you are wondering, the server room was being expanded and therefore the smoke alarms were off.

    In another situation, power was lost over the Christmas holiday.  One old SQL 6.5 machine had its disks seize once the temperature had dropped.  Note to the wise.  4 hour callout does you damn all good if your box is old and obsolete.

  • I shared this with my tech support manager, and he shared this comment:

    Yeah, I could add that if you don't have a fan, you could stand at the server room door and use the door itself as a "fan" to move air around.  Of course, only works well if your server room is the size of a closet! 

    Also, regular, quarterly inspection and maintenance of an AC unit goes a long way towards preventing failure of the unit.  

  • We actually have a small window style AC unit in the server room that is powered by an inverter and runs off the battery bank.  The AC Unit was about $120 at Walmart, and the inverter is a 600 watt ExelTech XP600 that I bought on ebay for (memory here) around $500.  We can run for around 4 hours on batteries, but like to switch over to a generator manually once power is out for more than a half hour or so.  The window unit cools the five servers and the inverters just fine.  In Central Florida, storms and power outages are a normal occurence, so we try to be prepared.


    Student of SQL and Golf, Master of Neither

  • Hey!  Anybody remember the Great Blackout of August 2003?  And, I have that T-shirt.  While we were sweatin our ac's off (pun intended), we were cranking diesel fuel into the backup generator of our data center.  Fortunately, we just made it when it the power was bumped back on, and I lost a few pounds too boot!

    I like the concept of RAVEAC!

     

  • Happened to us a week or so ago. Power went out overnight, came on and everything was OK, but the next night the AC turned itself off in our primary server room. The next morning the outside of wooden door into the room was hot. Inside it was like a sauna - 5 racks all pumping out heat. The modems in the room melted. Only 1 server shut itself down - all the others were up. You couldn't touch the racks.

    To get it cool we held the door wide open, used 4 big fans and turned off half the servers. Took 3 hours to get it bearable. We lost 2 disks on top of the modems and that was it fortunately.

    The AC engineer said there was nothing wrong with the unit and it has been OK since. We've no choice but to go with the thing. Scary. Of course our LTO drive backup units were also in there. God knows what will fail over the next few weeks.

    Guess we were lucky this time.

  • The point about turning out lights, monitors, etc, is one I missed and is worth noting. Im also pleased to find I'm not the only one that has bad things happen!

  • We now have 2 AC's, temperature monitoring that send SMS's every 2 deg C after reaching 22 deg C (about 72 deg F), UPS and generator. But in the past we really had it bad on hot days (West coast, South Africa). What we have found is that an overheat of the server room causes hard drive failures for about six months. We now know that if it happens, we make sure that we have enough spare hard drives.

    Japie

    5ilverFox
    Consulting DBA / Developer
    South Africa

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply