Active/active failover producing errors...

  • SQL Server 2000 active/active setup.  Node B was rebuild three months ago and since then cluster group A will not fail over successfully to node B however group B will failover to node A with success.

    Three groups:

    Cluster Group

    CLUSTERA Group

    CLUSTERB Group

    Errors in event logs:

    The description for Event ID ( 17052 ) in Source ( MSSQLSERVER ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. The following information is part of the event: [sqsrvres] OnlineThread: RegOpenKeyExW failed (status 2)

    The description for Event ID ( 17052 ) in Source ( MSSQLSERVER ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. The following information is part of the event: [sqsrvres] OnlineThread: Error 2 bringing resource online.

     

    Errors in cluster log:

    Actually the two 17052 error messages are from the event logs on NODE B.  Nothing on NODE A.  The cluster log has this:

    00000ad8.00000e78::2006/07/09-04:34:10.810 SQL Server <SQL Server>: [sqsrvres] OnlineThread: RegOpenKeyExW failed (status 2)

    00000ad8.00000e78::2006/07/09-04:34:10.810 SQL Server <SQL Server>: [sqsrvres] OnlineThread: Error 2 bringing resource online.

    Which is similiar to the event logs on NODE B.  The following message in the event log simply indicates that SQL Server clustered resource cannot start.

     

    All other clustered resources come online without issue when group clustera fails over to nodeB other than the SQL Server resource and its dependencies.  Research is pointing towards registry issues but so far I have not been able to pinpoint anything conclusive.

  • You say that Node B was rebuilt three months ago.  How was it rebuilt - from an image file or by reinstalling the operating system?  How did you reinstall SQL Server on it?  The SQL Server installation program detects that it's running on a cluster and installs on both (or all) nodes.  If you already had SQL Server installed on the other node, this could cause problems.

    What I would suggest, if you did rebuild by reinstalling the OS, is to back up all your databases, uninstall both SQL Server instances, and then reinstall.  You can restore your databases afterwards.

    Hope that helps

    John

  • John:

    Node B was rebuilt by reinstalling the operating system from scratch.  While this was going on both instances of SQL Server resided and continued to run on NODE A.  The node that was evicted (Node B) was brought back into the cluster.  SQL Server was then installed on NODE B using its virtual server name and providing the named instancename.

    I agree with your comment regarding "SQL Server installation program detects that it's running on a cluster and installs on both (or all) nodes" if this was an active/passive setup.  Active/active requires two separte installs of the software for there is no auto detection.

    So the question in my mind is do I need to install the default named instance that is usually native to Node A on Node B.

    Here is perhaps a clearer view:

    The services "SQLAgent$FAILOVER" and "MSSQL$FAILOVER" both log on as IRFC\cluster_admin on both servers.  These services are currently started/running on NODE B and are currently stopped on NODE A.

    NODE A also has services "MSSQLSERVER" and "SQLSERVERAGENT" which also log on as IRFC\cluster_admin.  These services are currently started/running (on NODE A).
     
    It seems strange to me that NODE B does not have the "MSSQLSERVER" and "SQLSERVERAGENT" services.  I thought clustered servers were "identical twins" for the most part.  I thought both servers would have the same services, although of course a particular service would be up or down depending on the node failover status.  I can see that SQL is installed on NODE B, so why did it not get those services? 
  • You're right to say that active/ative requires two separate installs, because there are two instances of SQL Server on the cluster.  However, it's not true that there is no auto detection.  When you install each instance, it creates the installation folders and the services automatically on each node in the cluster.  Therefore, if you rebuilt the operating system on Node B, it will have no SQL Server instances installed on it.  You need to reinstall both instances, because there may come a time when both instances have to run on Node B.  I have never tried installing SQL Server on a single node in a cluster, so I don't know how well it works, but if you have got the one instance up and running without having to trash the whole cluster installation, you may be able to do so with the other.

    Good luck

    John

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply