April 18, 2002 at 12:00 am
Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/sjones/intherealworlddisaster.asp
April 22, 2002 at 3:27 am
Looks like you dealt with fairly smaller databases. The same scenario could have been worse, if the databases are of size, say 10 GB (in terms of restore times). Logshipping would have saved a lot of time in this case!
HTH,
Vyas
SQL Server MVP
http://vyaskn.tripod.com/
April 22, 2002 at 4:13 am
So far myself I have been lucky in that a very few number of our databases have a restore commitment of less than 24 hours and those that are are less than 100MB each and replicated to another location. I have however had the loving experience of within a month of each other both the primary and backup sites lost a drive in a RAID5 array. Fortunately this was one drive and we got replaced before lossing any more (we did have to wait a week on 1 drive and boy was everyone sweating it).
"Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)
April 22, 2002 at 8:06 am
Logshipping is definitely a worthwhile thing. I created some ultra-basic scripts (since I don't have the enterprise edition of SQL Server) that do the equivalent thing (look on comp.databases.ms-sqlserver). My goal is to be up within 5-10 minutes.
Some ideas for log shipping -
1) For all the databases that only require nightly backups (and can easily survive a day's loss of data), set a script to restore nightly on the backup server in operational mode. That way, they're ready to go & don't need to be touched, shaving off precious minutes.
2) Scripts scripts scripts. I have one to restore the transaction logs, one to run through and fix the users, etc, etc. I try to make them as generic as possible, using inputs to tell it which database/files to work on.
3) Jobs - I have 3 jobs set that will bring everything back up. They run the aforementioned scripts with the necessary parameters. The only thing they don't do is change the IP address, server name, and run the setup program so that everything synchronizes.
4) Documentation. Do a complete run through, documenting everything. Make it so easy your kids can run it!
5) Assume you won't be there. Assume you'll be hit by a bus. Although, granted, at that point you won't care if the databases aren't brought up quickly. 😉
Great article, thanks!
April 22, 2002 at 10:23 am
I'm not sold on log shipping, mainly because I've seen too many errors with MS tools like this. I prefer to do it myself.
Total DB size (3 dbs) was about 1GB, though this was initially implemented because we had the servers in another location and ftp'd the data back every 15 minutes. A larger db wouldn't have changed this, though the fulls would probably have been weekly and differentials daily.
Good ideas below and I'd like to implement them, but I don't have a spare server. In this case, we pressed the QA server into production. However, I do practice the restore every Monday to reset the QA environment, so I've got good documentation on that. Only thing we missed, explaining the repointing of the web servers to a new database. Since this was a temp fix, we did not want to rename the server.
Steve Jones
April 22, 2002 at 12:19 pm
April 22, 2002 at 12:23 pm
April 26, 2002 at 7:21 am
Performing a cold backup of master,model and msdb once in a while on the local disk could also help when you get a call from the System Engineer saying that there was a controll failure and I had to rebuilt the box and restore the files but the SQL Server services won't start. Obviously because the backup software that was used was not backing up the *.mdf and *.ldf files and these files were never restored. Fortunately I had taken a cold backup of all the system files and renamed it with different extensions which were restored. All I had to do was rename the files back to their orignal extensions and place them in the data folder and start the services and restore the rest of the user databases.
April 26, 2002 at 8:33 am
Good point, though I'd be sure I had these two on another server as well. Wouldn't have done any good in this case to have them on the local drive.
Steve Jones
April 29, 2002 at 2:45 pm
Hi Steve,
is this article the reason you've lost your job?
April 30, 2002 at 9:22 am
Ouch ,
no, this actually occurred two weeks ago on a Wed. I succsessfully moved everything over and ordered a new RAID controller the next day. We were configuring the production machine that Fri, when our board meeting ended and the CEO came and told us that the company would be folding. It had nothing to do with the technology, the salesman just couldn't sell enough stuff over the last two years. I and my former CTO are extremely proud of the software and systems we built. It was the best, most flexible and reliable software I've ever been a part of and I was sorry to see it go.
I have some notes over the last two weeks at my http://www.dkranch.net site, if you are interested. Also I am consulting with a couple former customers of IQD that want to continue to use the software and had an escrow agreement for it's use.
If I hadn't gotten things running that night, then I might have deservered to lose my job, but that wasn't the case.
Steve Jones
April 30, 2002 at 1:28 pm
I apologize if I came across rude, but that was my reflex after seeing your little add in sqlcentral newsletter.
Your experience really made me think again about backup procedures. I also discussed your article with my colleagues at work (IT dept). Some people think 2h 45m under those conditions is formidable recovery time, some disagree. I think that hardware failure plus not having enough time to prepare backup box plus not physically being there made it all very difficult. But I was surprised you didn’t have restore scripts ready (taking care of spids, logins/users, etc.)
Couple of questions:
1.) How many databases were recovered?
2.) What was used to move backups to tape?
3.) You said one box was co-located. Different domain I guess. I also gues you transferred users/logins through script?
Very useful article, thanks!
April 30, 2002 at 2:17 pm
Only a few users/logins and they were synched with sp_change_users_login (we use SQLAuth). They were added with sp_addlogin because our backup box was "appropriated" a couple months ago for another task. Had this been a more critical item for the company, we would have had it done quicker.
np, it was just a shock this am seeing the post. I understand your seeing the two events together. I'd actually written this about 12 hours before being told the company was failing.
The backups were not on tape. Actually they were, but only for the previous night. I ftp backups and logs every 15 minutes to our ftp site, so we recovered the most recent from there.
A total of 3 databases were recoved that night. I did msdb the next day.
Steve Jones
June 3, 2004 at 4:24 pm
A couple of things:
1) The easiest way to kill all spids is actually to go through the "detach database" form, which allows you to kill all active processes. You do not actually have to detach the database for it, it's just a convenience given to you with courtesy of MS (and for some reason is only found in the "Detach DB" winform).
2) There has to be a better way to have your database server redundant. Starting with a clustered active-passive two server configuration, through a raid 10 disk array, with spare parts sitting around just for those few remaining single points of failure (e.g. raid controller). Wouldn't you agree?
April 22, 2008 at 8:16 am
About 80% businesses failed after a major data disaster
happened to them. This we have already seen recently
in the UK when major floods caused a major breakdown for
various IT and non-IT companies to lost their data and they
were out of business.
Using backup software which is embedded in OS is a good idea when you are sure
about your hardware. Major data lost occur due to hard disk failure or controller
failure. In this case, you wont be able to retrive from your hard disk and basically
you will lose all your data unless you have done offsite backup.
To avoid this, any business, either SMB or enterprise, must
have disaster recovery plan. D2D Bare Metal Recovery is one
technology which is available in the market but not all
people know about this. Check this out at,
They are the originator for Bare Metal term. Using this
technology one can restore OS and Data very quickly.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply