How to set cluster to not start sql if resource is offline

  • Since those of us acting as DBAs don't have access to the cluster administrator tool, I'm wondering how we can tell ( from event logs etc ) if it's been set so databases and sql server itself cannot start until all resources are online.

    We have a sql 2005 Enterprise two-node cluster ( active/passive) with a NetApp for storage. We've added 8 new luns as mount points and are about to start splitting data files, migrating tables with IO problems to the new luns. It's critical that the database not be brought online and the sql service not start until these are available.

    Our Systems Admin states that he's done the dependencies in Cluster Administrator and tested failovers, but I'm not sure anything I have access to ( event and sql logs ) is showing sql failing to start. I'm now looking at the cluster.log I found in the windows folder on one of the nodes.

    I may look into a vbscript that will read active directory ( \root\MSCluster ) using WMI as I was able to use this approach recently to keep track of users contained in Domain Groups that have sql access.

  • Spoke with our Server guys. They feel that the dependencies are set correctly, but testing this would have to be done on a Sunday because in a cluster environment, if a dependency like a cluster drive is not available, the cluster program will failover to the other node ( sql server), at which point it could just failover again and again.

    We were hoping to find something in the event log stating "sql service failed to start due to missing resource."

  • The cluster can also be set to NOT keep trying over and over. After 3 "round trips, for example, just plain fail (which then gives you something to alert on, possibly notify by e-mail etc...).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Sometimes you just have to trust people who have more power than you ...

    You can also try to bribe the server guy with doughnuts/cake/hobnobs/tea/coffee and ask him to show you the set up. If you're not in the same building, ask for screen shots and hope he will take an IOU for the doughnuts/cake/hobnobs/tea/coffee 🙂

  • what happens when you want to patch the cluster?

    tell the sysadmin you gotta do the change out of hours at 3a.m.,see if he wants to get up to offline the resources for you.

    🙂

    ---------------------------------------------------------------------

  • Access to cluster administrator is a secondary issue. We had a meeting on this today and we may try to use 8 drive letters instead of mount points. Obviously mount points are somehow viewed as more volataile or sensitive than drive letters or Microsoft wouldn't have warned about data corruption with mount points if the cluster dependencies aren't correct.

    My understanding now is that if any cluster dependency ( drives in particular ) become unavailable for any reason, the cluster software should simply stop sql server. There's no reason to failover because the secondary node won't have any better luck accessing a missing drive.

  • Indianrock (11/19/2009)


    Access to cluster administrator is a secondary issue. We had a meeting on this today and we may try to use 8 drive letters instead of mount points. Obviously mount points are somehow viewed as more volataile or sensitive than drive letters or Microsoft wouldn't have warned about data corruption with mount points if the cluster dependencies aren't correct.

    My understanding now is that if any cluster dependency ( drives in particular ) become unavailable for any reason, the cluster software should simply stop sql server. There's no reason to failover because the secondary node won't have any better luck accessing a missing drive.

    That's not always true. If you have multiple drives, it's entirely possible to have multiple Fiberchannel adapters connecting to different areas of your SAN. the resource going "unavailable" could simply be your card going bad/locking up/ the DAE on that one side going wonky. With everything being redundant - in that scenario it's conceivable the other node will be able to see resources this one may not.

    It's hard for it to be 100% identifiable from the node that's having a hard time. At very least - it should fail over once, and then fail altogether is it also fails there. Of course - this should be worked out between you and your SAN admin folks (there are differences in the various implementations, so different concerns come up).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I think the main concern now is that Microsoft and others have stated the possibility of data corruption if sql starts up before a mount point is "available." I haven't heard this about mapped drives etc. It seems that sql can be restarted, rebooted and subjected to power failures and rarely have any data loss or corruption, so the specific mention of this possibility with mount points and the need to make sure the sql service is dependent upon them in a cluster causes concern.

    Since we have 12 free drive letters, and the database split project we're about to begin needs 8 new luns with each of those to receive two new sql data files for more IO channels, we may decide to switch to mapped drives.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply