July 30, 2004 at 10:25 am
Blog of planning, installing and troubleshooting
SQL Server 2000 clustering with SP3a
&
Windows 2003 3 nodes (Active-Active-Active) Clustering.
This blog is for those who work on Microsoft SQL Server 2000
3 SQL Servers | - DELL PowerEdge 6650 - Four Intel® Xeon processors MP with Hyper Thread - 3.0GHz 400MHz front side bug, 4MB cache - 16GB DDR266 ECC SDRAM - Microsoft Windows 2003 Advanced Server |
SAN |
After talking to MSFT support and DELL support what we need to know before we get into large amount of RAM, Hyper Thread and Windows 2003
MSFT recommend not to use Affinity Mask with HyperThread
Please make sure to get the HotFix if you have enabled the PAE option since you have over 4GB ram. “Data is corrupted when PAE is enabled on a Windows Server 2003-based computer” (http://support.microsoft.com/?id=834628) Without that HotFix, small databases (< 30G) were fine but when it gets larger, especially Bulk insert from replication fails without reason.
This, I didn’t see any problem yet but good to know. “FIX: SQL Server Scalability May Be Limited if AWE Is Enabled” http://support.microsoft.com/default.aspx?scid=kb;en-us;332004
A Windows Server 2003-based multiprocessor computer intermittently crashes http://support.microsoft.com/default.aspx?scid=kb;en-us;830098&Product=winsvr2003
http://support.microsoft.com/?id=329332 PRB: You Receive Error 7391 When You Run a Distributed Transaction Against a Linked Server
Okay, now I’ve got the important stuff taken care of so going for Windows cluster installation for 3 active nodes.
So, I followed this guide to install cluster.
http://www.microsoft.com/technet/prodtechnol/windowsserver2003/technologies/clustering/confclus.mspx
And it FAILED at the “Configuring the Cluster Service”->To Configure the First Node -> Step 7. Right after Step 6 and choose the Typical (full) configuration, failed while trying to Establishing node connections with “Access Denied” to connection the node itself. After hours of troubleshooting, we figured out it was problem with “Group Policy (GPO)” from AD. If you sure that you did everything right and still get Access Denied error, that is the security Group Policy problem. Either you or sys admin have to get the right GPO setup or take the server out of GPO while installing and put it back after all done.
Once that is done, adding first and second node was no problem. Now the problem was the third node adding. When I tried to configure the third node, I got the access error saying that 3rd node can’t see the shared disk. However, I checked make sure all the nodes can see the shared disk so I figure it is something wrong with the installation problem so I choose “Advanced (minimum) configuration” so that I can by pass the checking and just set the shared disk and that did the trick.
Now 3 node cluster is working J
After that, since we have heavy load on Node1 and Node 2, what I configured was when Node1 has problem, it fails to Node 3 first and then Node 2. When Node 2 has problem, it fails over to Node 3 and then Node 1. Since Node 2 has less load then Node 1, when Node 3 fails, it fail over to Node 2 first and then Node 1.
Here is short description of how does node fails over.
Okay, once that is done, I added the MSDTC by using below article.
How to configure Microsoft Distributed Transaction Coordinator on a Windows Server 2003 cluster http://support.microsoft.com/?id=301600
Now, starting SQL Cluster installation J SQL cluster installation on windows 2003 is the same as windows 2000 but there are some bugs with Windows 2003 and SP3a, there are few things to do.
Since all the steps are in BOL and “Installation order for SQL Server 2000 Enterprise Edition on Microsoft Cluster Server” http://support.microsoft.com/default.aspx?scid=kb;EN-US;243218, I will focus on what you must remember, at least what I had to remember to make the installation working nicely.
Make sure to disable all services except the ones that are necessary. All that services except storage service. Those must have. For example, if you are using EMC, PowerPath service.
Change the Environment Variables for TEMP and TEP from local setting directory to other location such as C:\TEMP.
Make sure you don’t have Terminal Service is installed. “SQL Server 2000 Cluster Does Not Install on Windows Server 2003-Based Computers Where Terminal Services Is Installed” http://support.microsoft.com/?id=301520
And here is very important thing to do. “PRB: Installation of a Named Instance of SQL Server 2000 Virtual Server on a Windows 2003-Based Cluster Fails” http://support.microsoft.com/?id=815431 I tested it and it fails. Here is funny thing. After installation is done and apply the patch, you have to change that Named Pipe to TCP/IP. I will tell you why later. Just keep in mind that.
Once that is taken care of, I had no problem installing SQL Cluster under Windows 2003 cluster environment. And also without rebooting the box, I also installed the SP3a service pack nicely; however, if you are not tight with time, I recommend rebooting after installation for each node. That is also MSFT recommendation as well.
Once SQL Server is ready, you have to do a few things.
First, like I said above, change the named pipe to TCP/IP and here is the information. “FIX: General network error when you try to back up or restore a SQL Server database on a computer that is running Windows Server 2003” http://support.microsoft.com/default.aspx?scid=kb;en-us;827452
And the run DBCC TRACEON(11187) FIX: Concurrency Enhancements for the Tempdb Database http://support.microsoft.com/default.aspx?scid=kb;en-us;328551 We didn’t see much problem w/o it since I just applied it but recommend to at least add –T1118 trace flag or setup a job at start the agent to run that. Apply the hotfix when you see problem is not bad idea.
Increase TempDB size to appropriate size. This will allow having smooth transition from old server to new server. I did 4G for data and 1G for log. Seems working nicely.
Change SQL Configuration to enable AWE to take advantage of large ram. Here is the script that I ran first time. Since I have 16G RAM, I used 15G for SQL and 1G for O/S. http://support.microsoft.com/default.aspx?scid=kb;en-us;274750&Product=sql2k
USE master
EXEC sp_configure 'show advanced option', '1'
GO
EXEC sp_configure 'awe enabled', 1
GO
EXEC sp_configure 'max server memory', 15420
GO
RECONFIGURE WITH OVERRIDE
GO
Now here is one problem with that setting. When they are on right node, it is okay. However, when any node fail over to any other nodes. You are in trouble. Why? There is only 16GByte of ram and if two SQL Servers are set to use 15G each service, it will fail. So, I wrote this script to adjust Max Server Memory when it is fail over. Let me just give you the script for 2 node since 3 and 4 node are all the same by just adding another if and else statement. Make sure you already ran above script once.
CREATE TABLE #GetServername
(
Value VARCHAR(100),
Data VARCHAR(100))
RECONFIGURE
GO
DECLARE @inst VARCHAR(200)
DECLARE @inst1 VARCHAR(100)
DECLARE @ServerName VARCHAR(100)
DECLARE @Node1 VARCHAR(100)
DECLARE @Node2 VARCHAR(100)
SET @Node1 = 'SQL1'
SET @Node2 = 'SQL2'
SET @inst = 'SYSTEM\ControlSet001\Control\ComputerName\ComputerName'
SET @inst1 = 'ComputerName'
INSERT INTO #GetServername EXEC master..xp_regread 'HKEY_LOCAL_MACHINE', @inst, @inst1
SELECT @ServerName = Data
FROM #GetServername
IF (@ServerName = @Node1)
BEGIN
EXEC sp_configure 'max server memory', 15420
END
ELSE
BEGIN
EXEC sp_configure 'max server memory', 10280
END
GO
RECONFIGURE WITH OVERRIDE
GO
DROP TABLE #GetServername
I added that script as a job and schedule to run when “Sql Server Starts” and since we have three node cluster and third node is not that active, I set the third node Max RAM size to be just 4G so when the SQL1 is on Node1, which is where it supposed to be, then SQL1 uses the full memory. If SQL1 fails over to Node 3(this case Node 2) then, only uses 10G ram and SQL2 uses 5G and rest of 1G is for O/S and AWE stuff to use.
I am still testing to get the best way to optimize this script so it is not 100% good one but I see this is working fine for now.
I just hope this is helpful for any of you folks installing SQL server.
iLoveSQL
July 30, 2004 at 9:30 pm
We've actually decided on holding off on installing SQL Server 2000 on Windows 2003 clusters. Unless MS has fixed this in recent days, SQL Server has to come up after MS DTC or it may not register properly. This means distributed queries will fail. If you aren't using distributed queries, this isn't a big deal at all. For us, though, it was a showstopper.
K. Brian Kelley
@kbriankelley
August 2, 2004 at 12:02 am
bkelley:
There is a fix for that problem. I thought it was this article: (http://support.microsoft.com/default.aspx?scid=kb;en-us;821751) I know for sure later today, if it is other, then I will change this message.
We ar running 4 instances on a cluster of Windows 2003, and using MSDTC thanks to this hotfix.
August 2, 2004 at 12:32 am
It may fix the issue with virtual instances, but the notes from the article are for instances that aren't in the cluster. We were getting a dispenser interface error, not the one given. I may have to follow back up with our TAM to see if it has ramifications beyond what is in the article.
K. Brian Kelley
@kbriankelley
August 2, 2004 at 5:01 am
The fix is part of the rollup fix. The guy from Microsoft who investigated the problem on-site confirmed this.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply