Clusters or Not, Failure/Recovery, and Storage

  • Whew, that subject covers a lot.  Please bear with me.

    I'm just curious about getting some feedback on this.  I have been tasked with laying out a new SQL Server infrastructure configuration for our company.  In the past everything has been very reactive but now we are experiencing a lot of fast growth so we are now able to focus our energies on building an infrastructure that will efficiently handle our current load and is geared for growth.

    I'm not asking a general question like, "How do I do this?"  I am working on this and I have a long way to go.  Right now I have a couple of questions.

    One, we know we want to use a centralized data storage device to hold the data.  This will either be a NAS system or a SAN system.  As part of my design I am exploring the possibilities for failover and recovery. 

    One option is building a cluster.  (I'll be honest, with my limited experience, this kind of scares me.)  Other options include log shipping, replication, etc. 

    Assume that I have a stand-alone SQL Server pointed to the centralized data storage.  The SQL Server crashes, for whatever reason.  Assume I have a server waiting to the side that I can use to run SQL Server and I set it up with the same name as the original server.  Can I use the database files that the original server was using or will I need to restore the databases because of corruption.

    Two, in my first question, I mentioned a NAS or SAN system.  I'm curious what others are using.  We current have two NAS systems but I have removed almost all of our databases at this time because the performance has been poor.  I have been told that the NAS' performance has been improved and I need to consider that again.  Our network administrator is biased against SANs and wants to use NAS BUT I do have the ability to persuade him if I can present a convincing argument.

    All feedback is appreciated.  I look forward to reading your responses.

    hawg

     

    ----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011

  • NAS or SAN

    Microsoft does not recommend the use of NAS for the performance problems you've cited. If you have the option, go with the NAS or a dedicated shared storage array connected to the server using SCSI/Fiber.

    Cluster or not

    We've gone with clusters. The advantage is if I develop a hardware problem on one side, I can "fail over" and not have to worry about bringing up a new system with identical names, etc. Which direction you go is dependent upon how much cost you can bear as well as how much down time you can tolerate.

    Using old database files

    This depends on how your system went down. If you have the option of bringing SQL Server down gracefully, you could copy over the database files and re-attaching user databases appropriately. Master and the system databases are a little trickier but do-able. If you can't, however, because your system went belly-up before you could get the files off, yeah, you're stuck with a restore. This is where clustering/log shipping/replication/mirroring (SQL2k5) all have an advantage.

    K. Brian Kelley
    @kbriankelley

  • You wrote:

    NAS or SAN

    Microsoft does not recommend the use of NAS for the performance problems you've cited. If you have the option, go with the NAS or a dedicated shared storage array connected to the server using SCSI/Fiber.

    *********************************

    Did you mean to say go with SAN or did you mean NAS (as you wrote it)?  I just want to be sure I understand.

    Thanks,

    hawg

    ----------------------------------------------------------01010011010100010100110000100000010100110110010101110010011101100110010101110010001000000101001001101111011000110110101101110011

  • Er, yeah. typo. Sorry about that. Go with SAN.

    K. Brian Kelley
    @kbriankelley

  • We use a two-node active/passive sql2000 enterprise cluster.  Initially we had all data on a SAN maintained by a Raging Wire facility near us in Sacramento.  At that time we weren't ready to tackle log-shipping.  One day the SAN went down ( supposedly a once in a lifetime event ).   After that we decided to get a dedicated storage array to connect to the sql cluster.  This storage array ( DELL power vault ) only houses the main client live database files).

    What did that get us? An option. Before that if the san goes down you're in a hurt.    Well now if the SAN goes down, in our current situation the disk based backups are unavailable along with the system databases.  So you would have to rebuild master from scratch on the powervault ( very ugly and not quick ). 

    Anything you change in a clustered environment may have to be unchanged when the production hardware comes back up.  The san and other auxiliary devices are cluster dependencies, mapped drives, etc etc.

    If the powervault goes down you restore from backup on the SAN, creating your client database live files on the SAN.

    Just recently I set up a log shipping standby server.  It's functioning but haven't tested a failover yet.  ( see my other posts today about synchronizing logins and failover ) .  Replication is supposed to be very complex, requiring perhaps multiple full time DBAs to keep it going.  We don't even have one full time dba right now. In sql 2000 Enterprise you create a maintenance plan and check the log shipping box and follow the wizard -- pretty simple really.  The devil is in the details as always, but log shipping is a lot simpler than replication ( for example, your production database schema is mirrored exactly on the standby -- not necessarily with replication ).

    The beauty is that the standby sql server is completely independent of the  primary sql boxes, SAN,NAS, storage array, whatever in a failover.  You still need your web servers, and a quick way to point the web servers at the new database server.

    My current plan, unless we get a full time DBA who overrules, is to never reverse the log shipping flow of data ( role reversal ).  If the primary hardware is out, everything flips to the standby and log shipping is offline.  On the weekend, we backup the standby, restore to the now functioning production sql server, and then get log shipping patched up ( that's an unknown, might have to manually apply some backups to get it synced up again ).

    I'm still learning, but home that helps some.

    Randy

  • We've got our database backups going to a stand alone server which has also our tape devices. It reduces overall load on our SAN and prevents the issue that you've run into with the database backups going off-line with the SAN going down.

    Keep in mind that in most situations if you lose the SAN you've likely lost a big aspect of business, therefore it does have to be highly reliable, more so than a normal server/shared storage array. I've lived through one of those "once in a lifetime events" so I know where you're coming from, but for us it impacted more than our database servers that were on the SAN. Actually, the database servers were one of our minor concerns.

    Where a clustered solution really gains over a replication/log shipping scenario is if you've got a lot of apps, all on different systems. In that case if you lose the primary server that's a lot of applications that have to repointed. If best practices are being followed with encrypted connection strings, etc., that can be quite a bit of effort and a significant amount of downtime. In a cluster situation if you lose one node the failover takes effect and and since the applications are pointing to a virtual server, there may be a hiccup, but you're not having to perform a mass change. Of course, if you have a limited number of apps, log shipping may be more advantageous.

    As for your concern about anything being changed having to be unchanged when the production hardware comes back up, unfortunately that's the case in any recovery situation, not just SANs/clusters.

    K. Brian Kelley
    @kbriankelley

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply