November 1, 2008 at 9:01 pm
I have taken over responsibility of a Production SQL 2000 Cluster running on Windows 2003. I found out last week the SQL System Databases are on a RAID 5 drive spread across 2 storage arrays (MSA30's). Clearly this is not fault tolerant (if one of the storage array fails, it will wipe out 2 drives and thusly the SQL system DB's). I want to convert the RAID 5 to a RAID 1 across the two storage arrays. I am aware Microsoft has a technet article out there describing how to move system db's. However, since this is production, I don't want to get involved with reg\cluster changes.
Below is the plan I want to take. I'm very nervous about deleting the S: drive resource as I do not know if SQL server depends only on drive letter or some other cluster identifier to find it's system DB's. I do not have an environment to test this so I would appreciate any input.
1. Backup contents on S: drive (which contain SQL system DB's and nothing else).
2. Delete the S: Drive Resource from Windows Cluster Administrator
3. Delete the S: Drive in Windows Drive Manager.
4. Delete the RAID 5 Volume in Array Manager.
5. Create RAID 1 Drive in Array Manager and call it S: in Windows Drive Manager.
6. Create new Cluster Resource S: through Windows Cluster Administrator.
7. Restart SQL Server (will the SQL Server installation start up again with a new S: drive resource created???)
Thank you in advance to someone who has more experience in this than I do!
November 2, 2008 at 3:06 am
Hi,
I can see why you are a little bit nervous, I would be to 🙂 But fear not, you will get through this alive.
In theory you can just take everything offline and copy the directories on S:. Once the resource is added back in the SQL Server Group (between our step 6 and 7), copy the files back and bring everything online.
However, I would use suspenders and a belt and also do SQL Server backups of the databases. I would then go ahead and restore those backups to ensure that they are OK.
Now for a better option - you mention they are system databases so I assume you have another Disk where you put the User databases? In that case I would do like this:
1. Move my system databases to other Disk available to SQL Server
2. Do the steps above (and 1 step between 6 and 7 to add the resource to SQL Server)
3. Move the system databases back to S:
In that way, the worst that can happen is that your system databases end up on the wrong disk.
Complete instructions are available at "Moving System Databases", http://msdn.microsoft.com/en-us/library/ms345408(SQL.90).aspx.
HTH!
/Elisabeth
elisabeth@sqlserverland.com
MCITP | MCT
http://sqlblog.com/blogs/elisabeth_redei/
http://linkedin.com/in/elisabethredei
November 2, 2008 at 9:00 am
Hello Elisabeth,
Thank you very much for the response. I do hope I will come out of this alive but definitely do not feel good about what I'm going to attempt.
Yes I do have another drive which I could move the system databases to. The only reason I do not want to do this is because of the article I read from Microsoft. I'm a sys admin not a DBA so starting SQL server with these parameters -c -m -T3608 scares the heck out of me. To me it seems like this will invoke more change to the cluster registry values (new drive letter) and who knows what else. If SQL does not start up again, I do not know how to recover. Have you had any experience moving system DB's on a cluster using this procedure? Btw, this is SQL 2000 so this is the procedure I was reading:
http://support.microsoft.com/kb/224071
In theory the idea of deleting the S: resource and re-adding does make sense, but the unknown here is how SQL server may or may not depend on some unique identifier which was tied to the originally created resource. I'm hoping that SQL is only looking for the S: drive letter but I hate mixing "hope" with "production." 🙁
And yes I do plan on backing up all databases. If I really mess up this installation the backout plan to uninstall SQL on the Cluster and reinstall which, as you know, is not an easy or clean task - especially at 4am 🙁
November 2, 2008 at 9:40 am
Hi,
Actually I have done it quite a bit on 2000 although admittedly some time ago; and I used a LOT dirtier methods but at the time I was a Microsoft employee so it was alright ha ha 😀
Nevertheless, yes, if you have never done it before and you will be by yourself then http://support.microsoft.com/kb/224071 does look a bit threatening but that's also because the steps for all versions are stuffed into the same article. It is only model that requires -c -m -T3608 so if you break down the steps for model, master, msdb, tempdb there are really not that many. If you have a SQL Server installed that you can play around with you, go ahead and test the steps a few times.
What's good about the method is that it protects you from what would happen if you can't get your S: disk up and running again, which I as a database person would be more worried about 🙂 You could get around that as well but maybe not at 4 AM.
As long as you are absolutely sure that disk will come up as a shared resource again, then copying the directories to a safe place should be fine. SQL Server will look for nothing else then that S: drive.
As an aside, what I have done in several clusters is to set up a "spare" SQL Server instance (and some ready available backups) I can use if things fire off in the wrong direction during a service pack setup or whatever.
/Elisabeth
elisabeth@sqlserverland.com
MCITP | MCT
http://sqlblog.com/blogs/elisabeth_redei/
http://linkedin.com/in/elisabethredei
November 2, 2008 at 12:14 pm
Thank you Elisabeth for taking the time to reply to my post. I really appreciate it!
I will practice the MS way of moving system DB's. I just wish I could practice it on a cluster because that in itself is a huge variable. I called MS about what would I have to do to get SQL up again if I am unable to start the SQL service after moving the databases. They started telling about the Cluster Recovery tool and restoring checkpoints. That was about the time I decided I really didn't like that method 🙂
I'm interested in the comment you said below about a "spare" SQL server instance. If I understand correctly, the spare instance would be on the same clustered servers, right? Maybe I don't understand fully but it would seem that a spare instance on the same cluster would not help me here because I'm messing with the system DB's. If I kill SQL in a way that it can't find the system DB's, neither instance would be of use to me.
I guess I can ensure I can bring the S drive back online by leaving the original drive config alone and unassigning it's drive letter in Windows device manager. I can create the new S drive mirror on 2 different drives and if I have problems I can just reassign the S drive letter back to original RAID 5 drives. The only part I cannot "un-do" is the deletion of the Cluster Disk Resource (S:) . Granted I should just be able to recreate it and SQL shouldn't know that I messed with it...but again..the great unknown... Testing on Production is never a good idea 🙁
Thx again!
November 2, 2008 at 1:12 pm
Hi,
The system db's are just master, msdb, model and tempdb and in this discussion only master and msdb are interesting. Model is just an empty template database and tempdb is recreated everytime (based on model) you recycle the SQL Server service. Master contains all your logins and msdb all your jobs.
So, let's say you have one SQL Server instance SQL1 which is your "production" instance. Say then you install a second instance SQL2 (name will be in the form hostname\instancename) and bring it up to the same service pack level as SQL1.
Along comes a maintenance job (such as the one your facing, a patch job or any job that is difficult to rollback if it fails) you could then backup all your databases (including master and msdb) and at some point before you start your maintenance, restore them on SQL2.
If your databases are very big that might involve doing the restore at an earlier point and take differential backups of your user databases just before you start the job.
Obviously your appservers/clients would have to be redirected to SQL2 if things go wrong with SQL1.
🙂
/Elisabeth
elisabeth@sqlserverland.com
MCITP | MCT
http://sqlblog.com/blogs/elisabeth_redei/
http://linkedin.com/in/elisabethredei
November 2, 2008 at 8:30 pm
Thank you for that very clear explanation Elisabeth. I had no idea I could install multiple instances this way on the same server! I have googled this scenario up on the internet and found even more info. This sounds like I have found a test/backout plan in case something goes wrong 🙂 Cool!
Thank you very much for your time and wealth of information. You have really helped me out! Ty!!! 😀
March 13, 2009 at 2:29 pm
Hey irony34
Im in the same situation as you, Migrating a 2000 Cluster to New SAN. Can you please suggest me if you have any additional info or resources on this.
Thanks,
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply