Resilience

  • I've been asked to look into resilence with regard to providing 24/7/365 database service for an entirely database driven web content management system.

    Virtually all my experience is as a development DBA/data analyst so I'm in over my head here.

    If you have a cluster will service continue if either one of the servers fail or is there a primary node that can act as a single point of failure?

    When would I use Active/Active and Active/Passive?

    For this particular application I am thinking along the lines of having a completely separate SQL Server (cluster) looking after content maintenance and then the final data being replicated across to a SQL server (cluster) looking after the actual surfing of the site.

    I know that RAID 5 has some issues with write performance, plus I read somewhere that when the drives begin to degrade there is a risk of RAID 5 copying the bad blocks across the array.

    What I was considering was having the content maintenance database server using RAID 1+0 but the live server using RAID 5 as the content server will have a lot of writing taking place as part of the content processing where as the front end server will only have the results of the processing written across.

    What should I consider with regard to other points of failure?

    • Network cards?
    • Motherboards?
    • CPUs?

     

  • Unfortunately, even with Active/Active (meaning if the primary fails the secondary will kick in with no intervention (hopefully)) there will be downtime while the system fails over...

    Other points of failure are

    1. Network cards
    2. Your network as a whole
    3. Your service provider for the internet connection(s)
    4. battery backup(s)
    5. Air conditioning system

    You can make your server "farm" as robust as possible but..  If 3 goes down you are off-line.  If 5 goes down your server will take itself down as protection (Have recently experienced this during 2 different hurricanes.  Gotta love sunny florida).  If you lose power 4 will get you.

    If your network doesn't have redundancy or your ISP doesn't you are offline.

    Just some more things to make your hair turn prematurely gray...

     



    Good Hunting!

    AJ Ahrens


    webmaster@kritter.net

  • Usually when you are considering points of failures for a web 24/7 database you also need to keep in mind other factors like a denial of service attacks, virus infections. These might sound too primary to be considered, but if they are taken lightly, it could be a cause for some very major embarrassment.

     

    Also poorly written application could end up using much of your system resources thereby reducing response times, though this do not come under resilience issues.

     

    Oh yes, another point of failure could be your RAM.

     

    I sincerely hope u don’t encounter any of the points mentioned by us and your servers keep running 24/7.


    What I hear I forget, what I see I remember, what I do I understand

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply