Problems and Resolutions with Clustering

  •    I recently posted some questions to the forum and recieved some good help. I thought I would post the problems I had and their resolutions, in case anyone else finds themselves in this situation. I apologize if this is a repeat of information that is already well known, but, I did not know any of this and believe others might not be aware some of this.

       First, some background on this system. We are supporting a number of Web-based applications, allowing customer data entry into our claim processing system and IVR support for customer inquiries. For failover purposes, the databases which support these apps are on a cluster server. Originally, this was to be an active-active cluster, with half the databases in a virtual server on one node and half in a virtual server on the other. Due to a mix-up in licensing, we had to switch to an active-passive mode, with one virtual server in one source group, and one in another resource group, all on one node. This node also had a third resource group, which contained the IP address and network name for the physical cluster box. Over time, we also added links from databases in both resource groups to Oracle databases.

       Recently, a requirement came down from management to bring our servers up-to-date in patches and hot fixes, and to maintain them at a current level. Our Network Admins installed an automated patch manager, which would scan our servers and determine what patches and hot fixes(but not service packs) needed to be applied. One of the patches recently applied changed the functioning of the Distributed Transaction Co-ordinator. After the patch was applied, our linked servers stopped working. We had been using the MSDTC service as an implicit resource for our linked servers. This patch required that we define the MSDTC as an explicit resource in the Cluster Administrator. This presented a proplem to our Network Admin. Since we had linked servers in databases in two different resource groups, he believed that placing the MSDTC resource in one group meant, if the group without the MSDTC resource was not on the same node as the group with the MSDTC resource, the links to Oracle would stop working. He also did not think it was possible to have two resouces pointing to one service, since, on a failover, the service is stopped on one node and started on the other. His solution, therefore, was to take both virtual servers and their resources and place them in the default group. This was our present setup, one resource group, containing two virtual servers and all their resources.

       Now came Service Pack 4. Due to a requirement from our Network Admins, we needed to install SP4 on our cluster server. However, unlike previous service packs, the fall back for SP4 was to uninstall, then reinstall SQL Server 2000. The DBA who originally setup our cluster could not remember precisely what he did, so we set up a test box on VMWare to work out our procedures. During this process, we found several cases where the setup of our cluster server needed to be changed so that the install/uninstall processes would work. In BOL, it states, under the topic 'failover clustering/creating clusters', that each MSCS group can contain, at most, one virtual server. Since we have two virtual servers in the same group, and I had been told by a tech person at Microsoft that he had placed more than one virtual server in the same group, this statement is not totally correct. However, the setup program on the install disk for SQL Server 2000 does enforce the requirement that each virtual server goes into it's own group. During the install, when you get to the Cluster Disk Selection screen, it will not show any disk resources that are in a group which has a virtual server already installed on it, even if the disk resource has not been selected in a prior install. The same is true on an uninstall. Using the directions in BOL, under removing a clustered instance, if the virtual server/instance that is removed is not in it's own group, the setup program will not remove all of the resources from the cluster group. Specifically, the IP address and Network Name resources for the virtual server are not removed, nor are the DNS entries for the virtual server. Since the setup program also cleans up the registery in both nodes, this must be done manually. All of the above is also true when installing SP4. Once you have installed SP4 on a virtual server, it will not allow you to install on a second virtual server, if the second virtual server is in the same group as the first virtual server.

     

     

     

     

     

     

     

     

     

  • Just a brief note regarding your scenario...  It appears that the Cluster may not have been set up properly from the start.  You said "This patch required that we define the MSDTC as an explicit resource in the Cluster Administrator."  Actually, Microsoft documentation explains this when installing a SQL Server Cluster.  Also, when you switched gears from active / active to active / passive you might not have gotten a clean cluster install.  Anyway, it sounds like you resolved your issues and learned a lot from it 🙂

  • Yes, the documentation does say that about the MSDTC, but it looks like there was a hole which allowed you to use it without defining it in the resource group.  This patch apparently closed that hole.

  • There definitely was a 'feature' regarding how MSDTC could be set up on clusters, from 2k servers onwards. Glad to see they've finally patched it; sorry you had to find out the hard way. Thanks for the post.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply