August 16, 2007 at 4:59 pm
Ah, I've met David. "Fine" might not be the right word
August 17, 2007 at 1:23 pm
Look at what happened at LAX this past week. Simple network card failure and the entire customs system went down. I would not have wanted to let any of the passengers that sat on planes for up to 7 hours or in the terminal for that long. I've had several servers go down and it usually is simple things; fan going out, someone unplugged it etc. But what I think is intersting is that most of the upper management won't spend money to buy an extra fan. Spend some now and save later.
August 18, 2007 at 9:00 am
Katie brings up a really good point and it really got me thinking about something that aggravates the hell out of me about both my peers and my subordinates...
I'm normally the first one to take a pot shot at managment, especially upper management, but is it really their job to worry about ordering something like an extra fan? Nope. It's their job to approve a budget and hire the right people. The lead for maintenance would be the guy/gal to blame for not having justified/ordered a spare or two for a high failure rate item like a fan. Or, maybe a tech should have justified it to the lead who then should have ordered it. Sure, it could still be a management problem because they don't sign the final paperwork to order the parts, but did anyone try to order the parts to begin with???
I find that no matter what the disaster is, a little fore thought in the ranks can prevent a good number (most) of them. It's way too easy to blame the upper levels of managment for things they cannot possibly know anything about especially when folks in the ranks don't take that little extra step to identify a potential problem and then follow through on it.
Justifying the need for something, even for something as simple as a spare fan or two, can take quite a bit of work and maybe even some serious aggravation. Because of that, I've found that way too many folks will find or be made aware of problem only to shrug their shoulders and say "oh well... not my job... I'm way too busy eating this donut". They might be right... it might not be a part of their job. But that's the difference between just an employee (in the ranks, or in management) and a valued member of the team.
The problem can still be managment's fault, though... managers, especially low and mid level managers, need to stop killing the messenger. Good managers will surround themselves with intelligent, diligent people and then listen to what they have to say.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 21, 2007 at 9:10 am
Here is an odd but true RAID/disk failure story. We had a application server configured with RAID 1 for the OS and such and everything else on the SAN. Well, this server, not more than 3 years old, first had 1 disk in the RAID 1 mirror fail. You say no problem, that is what RAID 1 is for - to give you time to get to the server and swap the hot-pluggable disk that is bad and viola - the RAID 1 array gets rebuilt in minutes. Well in the process of receiving the MOM message about the first disk failure and walking to server room, 5 minutes; getting the spare disk out of the parts cabinet in the server room, 3 minutes; and walking over to the application server, less than a minute; we got a second MOM message that the application server had crashed ! After seemingly lengthy diagnostiocs, really only about 30 minutes, we determined that lo and behold the other disk in the RAID 1 array had failed as well !!! Now what are the odds of that ?
RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."
August 21, 2007 at 9:51 am
The odds?
Depends on whether it was your day off or not
Viewing 5 posts - 16 through 19 (of 19 total)
You must be logged in to reply to this topic. Login to reply