Failover vs replication - which is better

  • HI,

    I need to know about the pros and cons of using failover mechanism (either clustering or standby server) as against replication on SQL Server 2000.  The server need not be up 24x7 (maintenance window - 11 pm to 5 am) and even a 2 hour window for recovery during normal operating hours is ok.  If there are any existing links on the web on this subject, that would suffice too.  Thx.

     

    Vis.

  • The biggest con of clustering is cost.  You have to run SQL Server Enterprise Edition and have to use a SAN, both of which typically cost a bundle. 

    The biggest con of warm standby servers is that failover isn't automatic and your applications may need to be reconfigured in addition to any considerations on the database side.  They also impose some additional overhead on the whole system.  If you want to use SQL Server's Log Shipping capability then you need Enterprise Edition.  There are third-party solutions available (some free) to setup log shipping.  The other con to these solutions is that they typically impose additional administrative burdens.

    My choice has usually been clustering, but there are many businesses that just can't afford a SAN.  There is a company out there that makes clustering software for SQL Server that doesn't require a SAN or Enterprise Edition.  I just can't remember their name...  I'll post it if I can think of it.  We briefly evaluated that solution and it looked promising, bu my employer needed the additional capabilities of Enterprise Edition anyway and we already had a SAN so it didn't make sense for us.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I would prefer either clustering or a standby server over replication. Although many shops use replication, I suspect many of them do so because management read about it somewhere and dictated that it be used. We could never get replication to keep running right over a WAN, whereas our quadruple-redundant system (two clusters, primary AND failover, with log-shipping between them) performed flawlessly.

    With your generous recovery window, automated log shipping may not even be needed - just copy and restore the full backup in a nightly job, and do the logs manually if ever needed. dc's post is right-on about the cons of clustering and standby servers. If you don't have well-planned failover procedures, it can be chaos. I just think replication is worse, at least for non-trivial dbs...

  • We went the replication route and are very happy with it. We have two servers (Data and App). Suprise suprise, Data just runs the database and App the application. We use transactional replication to copy DB updates from Data to App and the latency is about 5 seconds. If Data fails (has happened twice now) we just change the UDL's on App (it is a VB6 app) and we are back up and running in about 5 minutes (against App's copy of the DB) and have never lost any data.

    The systems run in 20 x 7 factory environment with about 15,000 new transactions per day. 20 factory stations using Terminal Services plus about 10 other users around the place.

    Eech server is over configured for normal operation (dual Xeon's) but, when running on one server, the real time factory response is retained but reporting is a bit slower. Real time meaning driving conveyors, weighing product etc.

    We had a look at clustering and it was way too expensive but it also seemed quite complex.

  • Thx to all of you for the varied but extremely informative responses.  I just have one question to Jonespm on replication.  When the data db came back up, what was required to switch back to it?  Issues like updating the data db with updates that took place on the app db, etc.  Was it easy? How long does it take? so on and so forth. 

    Thx again.

    Vis.

  • Yes - this was a bit of an issue trying to sus out the best approach. What we do now is the following. Obviously all users are logged off.

    Backup the db on APP.

    Restore on to DATA.

    Run a script we have that reapplies and restarts replication on DATA

    We then hand test changing a table on DATA using Enterprise Manager to confirm replication is indeed running.

    Change the UDL's on APP to point back to DATA.

    We then put two or three new items into the system using the application and confirm they exist (on both DB's) then delete them (using the app) and confirm they no longer exist on both DB's.

    The whole process takes about 2 hours. The database is 6Gb and the two main transactions tables contain 4.5 and 1.5 million rows.

    We run under Windows 2000 Server and SQL Server 2000 standard editions.

    Cheers, Peter

     

     

  • In this kind of scenario I would opt for log shipping over replication.  Assuming you can cover the cost of Enterprise Edition.

    There's less of an administrative burden with log shipping.  But bear in mind that the latency is greater with log shipping so you might end up losing 5 or so minutes worth of data.

    Replication is cheaper and has a lower risk of losing data, but it requires more admin than log shipping - in my experience anyway.

  • The question you should be asking yourself is what do you want high availability or a disaster recovery solution.

    If you are interested in high availability - or short downtimes and the cost of your downtime justifies the expense of a cluster you should pick clustering. No other solution does automatic failover. If the cost of your downtime is low and you can need a warm standby and your schema is stable - replication is a good choice.

    If the cost of your downtime is low and you have VLDB's and you don't need to be warm use log shipping. Log shipping is not as scalable as replication; ie it does not provide as warm a solution as replication. The latency with log shipping is 1 minute+ with a practical limit being 5 minutes. With replication is should be between 20s to 1 minute.

    If you are interested in high availability and disaster recovery, ie your NYC data center goes down and you pick up again in SF with no data loss you have to look at a hot standby solution, something like EMC SRDF, or one of Hiatchi or Vertias' offerings.

     

     

  • If you have identity columns, they don't replicate, so I don't think you can use replication as a fail over instance. You'd have to use log shipping or clustering.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply