Nightmares

  • Nightmares

    OK, I know the "SAW" franchise probably doesn't resound with everyone, but it's a scary movie and it just came out. I haven't seen it, but was intrigued by the first two, so I used the image. But enough of that.

    Tuesday was Halloween and I liked to a story about scary IT situations, intending to write about those, but I decided to hold off until today so I could get your stories as well. Best one wins a SQLServerCentral.com polo shirt. I have no criteria other than what Andy and I debate about over the weekend of Monday, but take your chances. Maybe some mugs for runners up as well 🙂

    What's a scary IT situation you've been in?

    I'll lead off with my New Years horror story of the early 90s (93/94 I think). I was working for Dominion Resources, also known as Virginia Power, at their Surry Nuclear Power Plant. I was a techie guy in the local IT group, supporting a large user base (> 1000) people on a 900+ node Novell network. We mostly ran DOS, with Windows 3.1 just starting to get used in places.

    Throughout the fall, the corporate IT developers built a new Windows based application to track radiation received by workers. As you might guess this is critical for all kinds of legal reasons. Evidently a full scale load test was never made because on Dec 31, 11pm, all work was shut down while we transferred old radiation data from the DOS system to the new SQL Server running on OS/2 1.3.

    The nightmare began the next morning when workers arrived to work. Our servers started crashing from the load and 40-some hours after I'd arrived, sometime Jan 2 in the morning, I got to go home and rest. I worked 120 hours a week for the next 4-5 weeks trying to stabilize the system, which eventually occurred with an upgrade to OS/2 2.1 along with numerous application upgrades and SQL Server 4.2 patches.

    I still vividly remember that experience and it's made me a fairly conservative DBA. So I'm interested to see if you have any similar nightmares.

    Steve Jones

  • Just a small nightmare for me compared to this

    Late 90's Application Support Guy here in early on the Monday moring to greet the 200 seat Call Centre that was implementing a new version of the application software.

    First 30 users all ok, then the problems started. loads of different locking problems.

    Rollback, Not an option we already had over a 1000 new cases in the system and the DB structures had changed for the new version.

    Now at that time I was a meer Applications person who used the application Front end to the database so nothing looked any different. Following 6 hours of screaming call centre supervisors after my blood, in walks the DBA and asks 'Hows the system reacting, I upgraded to SQL Server V7 last night'

    No prizes for how many new pieces he got on his anatomy!!!

    So I just told the application it had to use V7 instead of 6.5 and all the problems disappeared (as did I for a decent Breakfast).

    Paul Smith

  • Earyly 1980' working as a consultant for an unanmed Atlanta based (started with a Tall...y) Corp. which was a holding company for S.th Thom..  and Wesc..ck Besides owning those clock and watch manufacturers they did manuf. for Tim..

     They sent two brand new programmers to another company to look at their manufacturing system in Va. THis was written in Cobol and used the Image DB on a HP3000, which by the way was a bullet proof machine and OS. Based on the recruits 3 day evaluation, they purchased the software. They brought it back to ATlanta, turned it over to a project manager. They identified no less than 6000 mods to make to the software and started programming. Of the 250 or so programs, they unit tested the ones they changed, they never system tested, let alone stress tested. On Oct 1, they shut down the ailing IBM and cut over to the HP. They forgot to train the users which were scattered in about 7 manufacturing plants around the US and Taiwan. Besides all of the problems that went along with that, at the end of the first 8 days of operation, they had shipped millions of dollars of watches and clocks to customers around the world, problem was, they didn't know who go what, what it was, or how it had been shipped. THey had to shut down all of the manufacturing plants for about 15 days, go back to the old IBM system, which had been partially disassembled by this time. The consultant (not me) who kept trying to tell them not to do it this way got fired. The company never recovered. It was rummored that 4 containers of product headed from Taiwan to US was "washed overboard" in a storm from off the container ship and the insurance check was niiiiiice.

    One project manager + one new system = no more company.

    PS .. the IT department stayed within budget by not system testing.

    There was some good here though, I ran off with one of their analysts who was pretty, could write good code, and was a  good golfer to boot.

    TRUE STORY !

  • Okay, here is the nightmare that I am currently living.  I work for a small company whose whole mission is to store and keep track of users data.  We employee up to 50 people.  The owner of the company wanted to do away with an AS400 DB2 mainframe and explore other options.  Me, being a VB programmer, put together a nice presentation on MS SQL Server.  I had been using MSDE for some of my programming projects.  Plus Visual Studio would be able to interact nicer with SQL than with the old monstrosity of what is the DB2.  A couple of months went on and we have half of the workforce plugging away on dumb terminals.

    Finally the decision was made to use----- LOTUS AND DOMINO!  We have an IT person who is very anti-microsoft; but despite the fact that we hired business consultants who agreed with my assessments, the owner still went with Lotus.

    And today, I am using SQL Server Standard (2005) and writing my own apps.  Also supporting an intranet and I am also pumping out apps as intended.  A lot of the functionality that was planned for our internet website was supposed to be complete a couiple of years ago and still has not even been done.

    This is my continuing nightmare - LOTUS!

  • for some more horror tales, check out:

     

    http://thedailywtf.com/default.aspx

     

    Go here for the first of a 4 part (be sure to click thru to the other parts) allegedly true story that dwarfs most anything i have heard.

     

    http://thedailywtf.com/forums/thread/95273.aspx

    ...

    -- FORTRAN manual for Xerox Computers --

  • Now Surry uses an Oracle based dose tracking system provided by my company.  I wonder if the 93/94 nightmare helped them decide to go with our product? 

    I'm not SQL Server bashing, I'm a SQL Server guy too.  I do both Oracle and SQL Server about 50/50 now, though my first several years were 100% Oracle.  They each have their place and benefits.  I just find it interesting when you mention your experiences with Surry since they are one of our customers and the primary focus of our company is systems for nuclear power plants.  We also have SQL Server based systems that Dominion has expressed interest in. 

  • There was a BIG red button on the wall of the IT department at a small university.  This button looked like the type of button you see in the movies that can initiate a nuclear meltdown sequence.  It was 3-4 inches in diameter, and was attached to a shiny metal plate mounted to a concrete wall that was painted white.  Nobody in the IT department was employed when this button was installed, and therefore, none of us knew what it did.  For all we knew, this was the most powerful button in the universe!

    One fateful day, a day that the network admin was on vacation, a new student worker saw the button, and asked in a hushed voice “What does that button do?” as if just talking about the button would make it do something horrible.  A seasoned IT veteran (his name was Steve) was in the vicinity of the button, and he promptly replied, “This button doesn’t do anything.”  Why he decided to say this is still a mystery, even to him.  As these words were being spoken, he started his hand towards the button.  The world went into slow motion, and I’m sure there was a long, drawn-out “NNNOOOOOOOO” coming from my mouth, but it was too late.  With a resounding crack, his fist smacked the button.  The lights immediately dimmed, and every single UPS in the room began beeping loudly.  The color drained from Steve's face as he realized that the power to the line conditioner, which fed power to all the servers on campus, was cut off.  The line conditioner was about as big as a washing machine, and not being electricians, we had no idea how to restore power to it.  I put in a frantic call to the maintenance department, and then proceeded to fly around the room trying to shut down all the servers properly.  Not being the admin, I had to follow documentation to shut them down.  The maintenance crew arrived as I was shutting down the last of the servers.  It took them awhile to get the line conditioner up and running.

    Before I left the university for another job, a clear plastic lock box was placed over the big red button, and we put a label below it that read “The Steve Button”.

  • My horror story comes in from 1996.  Our company had been a DOS based company using Clipper 5x for their entire application base.  We decided it was time to move to Windows based programming and SQL Server.  So we bought a server with Windows NT 3.51 and installed SQL Server 4.21b and began programming using Delphi 1.0.  Shortly after that, MS came out with version 6.0 of their server and we made the migration as it was supposed to be a much better platform and EM was definitely a step up.

    I was a developer at the time and we realized that we needed a full time DBA (this was right after we moved to SQL).  I was approached to take the job as I had been working most closely with our installation.  I decided it was a good career move and took the position.

    I don't know how many remember 6.0, but it was buggier than a camel's hump.  I spent almost two solid months living in our data center around the clock. I would go home long enough to change and shower and head back to the office.  I often slept on the floor at my desk.

    I had one developer in particular who was adept at writing queries that would make the server go away.  It would start escallating lock after lock until it had exhausted it's pool, to the point where SQL Server itself couldn't obtain a lock to start shutting things down.  At this point I would basically power off the box and then bring it back up and wait for SQL Server to finish recovering the database.  Then I would have to go through and check each database to find out where chains had been broken in the pages and fix them.

    I knew all of the top level support techs at MS and had most of their home phone numbers (they did a lot of support after hours from there) and would be bugging them every other night over one problem after another.

    My boss at the time had a habit of coming around asking everyone "Where we at?".  I was having a particularly stressfull day and hadn't slept in about two and I was trying to show my co-worker some of the things I had been doing to fix the latest problem and my boss was coming in every five minutes it seemed to ask me "Where we at?"  (mostly because our managers were asking him why the server wasn't available).  I lost it and started yelling at him that we were kneck deep in the doo and that it would be up when I had it up and running again and to leave me alone until then.

    He got such a hang dog look on his face and left the room.  My assistant looked at me and said "You're not going to go get a gun or anything are you?"  I realized just how far over the edge I had gone and apologized to her and then went to find my boss.  I think that little bit of steam blowing was good for me, but I could definitely have handled it better.  Maybe just run outside and scream. 😉

    I finally got approval to get another server where I could have our developers run their jobs there first, before they made it to production.  I could care less if the DBs puked on that machine, no matter what time of day it was.

    I learned a boat load about the internal workings of SQL Server and I'm truly gratefull for that, but I wish it could have been under different circumstances.

    I remember after I finally got the dang thing stabilized and processes in place to keep it that way (and I don't know how many patches) along with a few assistants to help manage the day to day operations of the database that I took a week long vacation.

    Ad maiorem Dei gloriam

  • I have a simliar story to this one.  I worked for a big catalog/internet company and we had moved into a new building with a brand new shiny data center.  It too came with a big red button.  A manager was giving a tour of the facility to some big customers and they asked what the button did.  He replied that it was there to shut power to the room in case of an emergency.

    He then proceeded to hit the button.  Not realizing that we didn't have any of the UPSs installed yet and the minute he hit it, everything went dark.

    You could have heard a pin drop in the silence and then you heard "uh oh".

    Not only did he crash every server (including the AS/400), he also took out our phone system.

    I don't think he was ever allowed back in that room again.

    Ad maiorem Dei gloriam

  • I don't know why, but I kept on expecting the simpsons sound track to kick in during that story Steve. Did you check your pockets?

     

    Max

  • I work in a fairly large medical practice that does all electronic billing and medical records. The proprietary software we use to do all of this is based on SQL2K and has the ability to call stored procedures with triggers in the VB-like front-end.

    Long story short...someone removed a condition on an update statement. So, instead of updating a single procedure on one patient's record to "no charge", it set every charge in our DB from the beginning of time to $0.00. That was not a good day.

    But, we found the problem, and restored from backup all the charges up until that day. Luckily it happened early in the day, so we just reentered the few charges from that morning and we were up and running again.

  • I was working on a conversion project for a manufacturing major in 2004.

    Their data was in DB2 and front end in a variety of languages and they wanted to

    bring some uniformity and also wanted sql2k.

    so began the project and every one got atleast one system. the same person was

    responsible to design the system and act as a DBA for it too.

    mine was very critical in terms of the data - becos i was dealing with tables over 30 million rows in size.

    indexes would not be added or removed as it was proprietary.

    all my programs ran in series on monday mornings and after which it would decide the

    commissions and the pay checks would be cut. In short if my programs were late

    then pay checks would be late - who would like that?

    well if they ran wrong - then i dont even want to say it.

    During design DBA's wouldnt even allow me to pull all data to test my queries -we had space restrictions.

    but i didnt give up. every evening for the next month i set up DTS's to pull data to all my colleague's sytems

    when i left work. it worked..

    design went thru for a whole 8 months.

    testing went thru without major issues - no one gave a lot on performance because they were looking at

    the commission amounts. it put me more in the risky position.

    production day we were all stuck for straight 30 hrs at work. no one could substitute for another

    as each one had his own horror to deal with.when the system came alive

    mine would be first set of apps to run. i wud never be able to forget the stress and anxiousness.

    no one knows to this day(not even my PM) that 4 of the senior mgmt met with me separately within a span of 1 hour before go live

    to check if i felt the apps would do ok. weird - i hadnt spoken to some of them during the

    the whole project.

    icing to the cake was the previous day(of which i was in office for 24 hrs) was my 6yr olds

    birthday. he waited the whole day at home without cutting the cake and my husband called me at 9 pm to

    let me know that they were going ahead without me as he didnt want the kid to be disappointed.

    besides i had left my second kid - a 60 day old for the first time for so long..

    the whole office including the senior executives were right behind my desk when i sat at my system

    seeing the programs monstrously crunch the millions of rows.everything went as expected. no data was screwed and

    all the amounts looked good.and the timings were good too. except for one program. the select sp kept timing out.

    it was beyond the paycheck point so i decided to call it quits. and this one didnt have a depenedency within the next 12 hours.

    well..not yet..my PM "why dont you fix this before you leave"

    i just glared at him and said "i am beyond the point at which i can tune queries. I will be back after a 2 hr break."

    i was back within 2 hrs ( i was still feedind my little one) and then put it back in place. it needed

    a little tweaking.but the attitude is some thing i wont forget and the stress i went through - something

    i will remember a life time.


    RH

  • Hehe.  I was interviewing with the CEO of a manufacturing company for an IT Manager position when an employee came in to tell us every client on the network had simultaneously locked up...  again.  Apparently this happened every single day right before lunch.  The CEO asked me to take a look.  Turned out the Accountant did a massive file transfer just before lunch every day, and the poorly hacked-together network just couldn't handle the strain.

    Anyway I got the job offer on the spot, and that's when the real nightmare started   Oh, did I forget to mention the infamous ERP software vendor (they were bought out by another vendor with an accounting software package... guess which one?) who had administrator access to our servers via remote access?  After they took our servers down 2 or 3 times with their "midnight updates" I completely disabled their access to my boxes.

    It took nearly two years of 14 hour days x 6 days a week, to straighten out that mess.  Ahh the good ol' days

  • Late to the game and it's a story I've told before, but, it was a heck of a nightmare.

    I was working for a start-up that had a bunch of Harvard grads in the top spots. These guys had been brought up on "thinking outside the box" which they translated as "don't do things that other people do no matter how much sense it makes." One of the brilliant concepts they had was that, really, IT was easy and anyone could do it. The question wasn't ability, it was access. So every single person in the company was given full admin priveleges to every single system. They were all 'sa' on the database servers (SQL Server 6.5). There were regular little nightmares & crashes as sales people tried to "fix" code or "troubleshoot" performance.

    The kicker, where I finally got the production systems off the "all access" track came after we'd spent a week getting a new customer's database server all set up. Everything was tested & ready to go. The 18 hour days were over and we could get some rest. I got a call later that night from one of the sales guys. The new database server is down, can I fix it. I'm up the rest of the night poring over the logs trying to figure out what the heck is wrong. I can't get the server to start. I'm getting an error message. Finally my boss, a fantastic programmer and DBA, spotted the problem, the tempdb was missing. How the heck did that happen? Oh, says the sales guy, I saw that when I was checking the system last night and I knew that we didn't want anything temporary on the new production server so I deleted it.

    I'm fairly certain they had to repaint the room I was in when I got the word on this because my language caused the paint to peel. After that we took 'sa' away from everyone in production, but they still regularly broke dev. I only lasted 9 months at the company which folded shortly after I left.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Scarriest DBA story for me was an almost story. 

    Oh, I've had scary moments, like the realization that you have a corrupt purchasing database because of a raid problem and the only tape backup is a week old (yeah, the backup folks were just hoping nobody needed those tapes).  Or realizing you just installed MDAC on a production terminal server running SQL Server WITHOUT using the proper installation method, but the scariest story was still the one that ALMOST happened.

    I was working as the primary DBA at a chemical manufacturer, responsible for all SQL Server operations with about 120 servers running SQL.  I was responsible for the change management for the largest manufacturing application that was run locally at every manufacturing site/division.  There was a total of about 65 production databases.  All databases were identical and changes were made in unison to all plant sites via scripts.  I had a contractor doing some work that had discovered his contract was ending, so he wasn't being very careful.  He had scripted some of the changes that were to go out with the next release.  The changes had been through software QA testing (obviously not enough) and now were ready for me to combine and prepare for the production deployment.  This was a particularly long set of scripts.  I kept getting a nagging feeling that I ought to look over the contractors work but I was really busy and kept putting it off.  A couple hours before deployment I picked one or two to review. 

    There is was - a new field was being added to the most important table in the database - the table that stored the criteria for out of control conditions in the plant - but he had scripted it as a DROP TABLE and CREATE TABLE instead of an ALTER TABLE.  I was paralyzed for a few seconds in disbelief.  If I had deployed the scripts as written, I would have simultaneously shut down a large part of production monitoring for all plant sites.  The only recovery would have been restores to roughly 65 production databases all over the world.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply