Blog of SQL2K SP3a clustering Win2003 3 nodes Clustering.

  •  

    Blog of planning, installing and troubleshooting

    SQL Server 2000 clustering with SP3a

    &

    Windows 2003 3 nodes (Active-Active-Active) Clustering.

     

     

    This blog is for those who work on Microsoft SQL Server 2000 Enterprise (SP3a) Clustering under Windows 2003 Clustering environment. This is 3 nodes clustering with all the boxes have 16Gbyte of ram so also have to use PAE option for Windows and AWE Enable option for SQL. Also the servers are under Active-Directory. Here is the system information.

     

     

    3 SQL Servers

    - DELL PowerEdge 6650

    - Four Intel® Xeon  processors MP with Hyper Thread

    - 3.0GHz 400MHz front side bug, 4MB cache

    - 16GB DDR266 ECC SDRAM

    - Microsoft Windows 2003 Advanced Server

    SAN

    - DELL/EMC Fibre Channel CX500

     

     

    After talking to MSFT support and DELL support what we need to know before we get into large amount of RAM, Hyper Thread and Windows 2003

     

    HyperThread

    *      MSFT recommend not to use Affinity Mask with HyperThread

     

    Windows 2003 option for PAE for over 4GByte RAM

    *      Please make sure to get the HotFix if you have enabled the PAE option since you have over 4GB ram. “Data is corrupted when PAE is enabled on a Windows Server 2003-based computer” (http://support.microsoft.com/?id=834628) Without that HotFix, small databases (< 30G) were fine but when it gets larger, especially Bulk insert from replication fails without reason.

    *      This, I didn’t see any problem yet but good to know. “FIX: SQL Server Scalability May Be Limited if AWE Is Enabled” http://support.microsoft.com/default.aspx?scid=kb;en-us;332004

     

    Multiprocessor computer

    *      A Windows Server 2003-based multiprocessor computer intermittently crashes http://support.microsoft.com/default.aspx?scid=kb;en-us;830098&Product=winsvr2003

     

    MS DTC for Linked Server

    *      http://support.microsoft.com/?id=329332 PRB: You Receive Error 7391 When You Run a Distributed Transaction Against a Linked Server

     

    Okay, now I’ve got the important stuff taken care of so going for Windows cluster installation for 3 active nodes.

     

    So, I followed this guide to install cluster.

    http://www.microsoft.com/technet/prodtechnol/windowsserver2003/technologies/clustering/confclus.mspx

     

    And it FAILED at the “Configuring the Cluster Service”->To Configure the First Node -> Step 7. Right after Step 6 and choose the Typical (full) configuration, failed while trying to Establishing node connections with “Access Denied” to connection the node itself. After hours of troubleshooting, we figured out it was problem with “Group Policy (GPO)” from AD. If you sure that you did everything right and still get Access Denied error, that is the security Group Policy problem. Either you or sys admin have to get the right GPO setup or take the server out of GPO while installing and put it back after all done.

     

    Once that is done, adding first and second node was no problem. Now the problem was the third node adding. When I tried to configure the third node, I got the access error saying that 3rd node can’t see the shared disk. However, I checked make sure all the nodes can see the shared disk so I figure it is something wrong with the installation problem so I choose “Advanced (minimum) configuration” so that I can by pass the checking and just set the shared disk and that did the trick.

     

    Now 3 node cluster is working J

     

    After that, since we have heavy load on Node1 and Node 2, what I configured was when Node1 has problem, it fails to Node 3 first and then Node 2. When Node 2 has problem, it fails over to Node 3 and then Node 1. Since Node 2 has less load then Node 1, when Node 3 fails, it fail over to Node 2 first and then Node 1.

     

    Here is short description of how does node fails over.

    • Node 1 -> Node 3 -> Node 2
    • Node 2 -> Node 3 -> Node 1
    • Node 3 -> Node 2 -> Node 1

     

    Okay, once that is done, I added the MSDTC by using below article.

    *      How to configure Microsoft Distributed Transaction Coordinator on a Windows Server 2003 cluster http://support.microsoft.com/?id=301600

     

    Now, starting SQL Cluster installation J SQL cluster installation on windows 2003 is the same as windows 2000 but there are some bugs with Windows 2003 and SP3a, there are few things to do.

     

    Since all the steps are in BOL and “Installation order for SQL Server 2000 Enterprise Edition on Microsoft Cluster Server” http://support.microsoft.com/default.aspx?scid=kb;EN-US;243218, I will focus on what you must remember, at least what I had to remember to make the installation working nicely.

     

    *      Make sure to disable all services except the ones that are necessary. All that services except storage service. Those must have. For example, if you are using EMC, PowerPath service.

    *      Change the Environment Variables for TEMP and TEP from local setting directory to other location such as C:\TEMP.

    *      Make sure you don’t have Terminal Service is installed. “SQL Server 2000 Cluster Does Not Install on Windows Server 2003-Based Computers Where Terminal Services Is Installed” http://support.microsoft.com/?id=301520

    *      And here is very important thing to do. “PRB: Installation of a Named Instance of SQL Server 2000 Virtual Server on a Windows 2003-Based Cluster Fails”  http://support.microsoft.com/?id=815431 I tested it and it fails. Here is funny thing. After installation is done and apply the patch, you have to change that Named Pipe to TCP/IP. I will tell you why later. Just keep in mind that.

     

    Once that is taken care of, I had no problem installing SQL Cluster under Windows 2003 cluster environment. And also without rebooting the box, I also installed the SP3a service pack nicely; however, if you are not tight with time, I recommend rebooting after installation for each node. That is also MSFT recommendation as well.

     

    Once SQL Server is ready, you have to do a few things.

    *      First, like I said above, change the named pipe to TCP/IP and here is the information. “FIX: General network error when you try to back up or restore a SQL Server database on a computer that is running Windows Server 2003” http://support.microsoft.com/default.aspx?scid=kb;en-us;827452

    *      And the run DBCC TRACEON(11187)  FIX: Concurrency Enhancements for the Tempdb Database http://support.microsoft.com/default.aspx?scid=kb;en-us;328551 We didn’t see much problem w/o it since I just applied it but recommend to at least add –T1118 trace flag or setup a job at start the agent to run that. Apply the hotfix when you see problem is not bad idea.

    *      Increase TempDB size to appropriate size. This will allow having smooth transition from old server to new server. I did 4G for data and 1G for log. Seems working nicely.

    *      Change SQL Configuration to enable AWE to take advantage of large ram. Here is the script that I ran first time. Since I have 16G RAM, I used 15G for SQL and 1G for O/S. http://support.microsoft.com/default.aspx?scid=kb;en-us;274750&Product=sql2k

     

    USE master

    EXEC sp_configure 'show advanced option', '1'

    GO

    EXEC sp_configure 'awe enabled', 1

    GO

    EXEC sp_configure 'max server memory', 15420

    GO

    RECONFIGURE WITH OVERRIDE

    GO

     

    Now here is one problem with that setting. When they are on right node, it is okay. However, when any node fail over to any other nodes. You are in trouble. Why? There is only 16GByte of ram and if two SQL Servers are set to use 15G each service, it will fail. So, I wrote this script to adjust Max Server Memory when it is fail over. Let me just give you the script for 2 node since 3 and 4 node are all the same by just adding another if and else statement. Make sure you already ran above script once.

     

    CREATE TABLE #GetServername

    (

    Value VARCHAR(100),

    Data VARCHAR(100))

     

    RECONFIGURE

    GO

     

    DECLARE @inst VARCHAR(200)

    DECLARE @inst1 VARCHAR(100)

    DECLARE @ServerName VARCHAR(100)

    DECLARE @Node1 VARCHAR(100)

    DECLARE @Node2 VARCHAR(100)

     

    SET @Node1 = 'SQL1'

    SET @Node2 = 'SQL2'

     

    SET @inst = 'SYSTEM\ControlSet001\Control\ComputerName\ComputerName'

    SET @inst1 = 'ComputerName'

     

    INSERT INTO #GetServername EXEC master..xp_regread 'HKEY_LOCAL_MACHINE', @inst, @inst1

     

    SELECT @ServerName = Data

    FROM #GetServername

     

     

    IF (@ServerName = @Node1)

    BEGIN

                EXEC sp_configure 'max server memory', 15420

    END

    ELSE

    BEGIN

                EXEC sp_configure 'max server memory', 10280

    END

     

    GO

    RECONFIGURE WITH OVERRIDE

    GO

    DROP TABLE #GetServername

     

     

    I added that script as a job and schedule to run when “Sql Server Starts” and since we have three node cluster and third node is not that active, I set the third node Max RAM size to be just 4G so when the SQL1 is on Node1, which is where it supposed to be, then SQL1 uses the full memory. If SQL1 fails over to Node 3(this case Node 2) then, only uses 10G ram and SQL2 uses 5G and rest of 1G is for O/S and AWE stuff to use.

     

    I am still testing to get the best way to optimize this script so it is not 100% good one but I see this is working fine for now.

     

    I just hope this is helpful for any of you folks installing SQL server.

     

    iLoveSQL

  • We've actually decided on holding off on installing SQL Server 2000 on Windows 2003 clusters. Unless MS has fixed this in recent days, SQL Server has to come up after MS DTC or it may not register properly. This means distributed queries will fail. If you aren't using distributed queries, this isn't a big deal at all. For us, though, it was a showstopper.

    K. Brian Kelley
    @kbriankelley

  • bkelley:

    There is a fix for that problem. I thought it was this article: (http://support.microsoft.com/default.aspx?scid=kb;en-us;821751) I know for sure later today, if it is other, then I will change this message.

    We ar running 4 instances on a cluster of Windows 2003, and using MSDTC thanks to this hotfix.

     

     

  • It may fix the issue with virtual instances, but the notes from the article are for instances that aren't in the cluster. We were getting a dispenser interface error, not the one given. I may have to follow back up with our TAM to see if it has ramifications beyond what is in the article.

    K. Brian Kelley
    @kbriankelley

  • The fix is part of the rollup fix. The guy from Microsoft who investigated the problem on-site confirmed this.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply