Failover testing Clustered SQL2000 with storagewor

Question

Post reply

Failover testing Clustered SQL2000 with storagewor

blanchas

Valued Member

Points: 59
More actions
October 21, 2002 at 5:04 am

#80246

Hi.
We are running Clustered SQL2000 Enterprise on 2 Win2K Advanced Servers. The data is held on a Compaq Storageworks SAN and each server has 2 fiber connections to the san for resilience. Our cluster failover tests all worked well until we simulated a loss of connection to the Storageworks disks by disconnecting the fiber connections to one of the servers. Instead of failing over to the other server we had a series of hard disk write failure messages. When we manually moved the clustering to the other server the SQL database would not start and had a corrupted master database.
When questioned our reseller theorised that when we pulled the fiber connections from the server there was still traffic in the fiber cable in the form of photons of light that caused the corruption.
I am not convinced by this answer as I would have thought that error checking algorithms in the fiber comms would prevent this happening.
I would like to know:
Has anyone successfully failed over a sql2000 cluster when the server’s connection to the storage is cut?
Has anyone a better theory on why the masterdb corrupted?
Thanks
October 28, 2002 at 6:31 pm

This was removed by the editor as SPAM

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Andy Warren SSC Guru Points: 119902 More actions · Answer 1

Gotta love the photons in the cable. Might be true I suppose...

Andy

http://www.sqlservercentral.com/columnists/awarren/

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

Wesley Brown SSChampion Points: 12045 More actions · Answer 2

Well, I think it is a load of crap personally. I have disconnected fibre channel arrays from the server with no issues. That is a VERY poor answer. I don't know why it didn't fail over the way it was expected. I do know you need to have your reseller contact compaq directly or storageworks and get the scoop from them. I'm sure they would be shocked to hear that it failed in that mannor. Heck, compaq doesn't even support clustering on anything but fibre channel anymore.

Wes

blanchas Valued Member Points: 59 More actions · Answer 3

Wes

The hard disks in the cluster did fail over to the other server. However the MSSQLSERVER service that was associated with the cluster group would not come up again as it was corrupted.

Thanks

Stephen Blanchard

Allen Cui-55137 SSC Guru Points: 51650 More actions · Answer 4

Can you see the master database physical files from the second node with Windows Explorer?

Brine SSC Rookie Points: 41 More actions · Answer 5

Were you using multi-path software such as Compaq Securepath to shield the o/s from the dual paths to the storage array?

I am building a similar setup using a Compaq EVA, so the question is a little too close for my liking. I will let you know How i go in the coming weeks, failover testing is part of our test roll-out as well.

Regards, Brian

blanchas Valued Member Points: 59 More actions · Answer 6

To answer Allen_Cui Yes we could see the master database file from the other server it was corrupted though. Database consistancy checks would not run on it and the rebuildm tool which tries to rebuild the master database failed also.

Brines question about Securepath was a good one. Yes we do use securepath on all the san attached servers. On a seperate note I heard a rumour this week that the very latest version of securepath (ver 4?) can bluescreen WIN2K servers on Service Pack 3. We use version 3.1 without any problems.

Brian Sutherland SSC Enthusiast Points: 156 More actions · Answer 7

We are running same config into a Compaq EVA.

If I drop a fibre connection the Shared drives remain on line.

No error noticed as yet in the SQL log , cluster log , application event

log or system event logs.

What version of securepath are you using?

It must be 4.x for EVA .

There is a known issue with Oracle 9i running with Securepath.

Are you running a SQL Virtual cluster or a standalone SQL on a single Node.

Regards, Brian

contiguous1@Notmail.com

quote:
Hi.
We are running Clustered SQL2000 Enterprise on 2 Win2K Advanced Servers. The data is held on a Compaq Storageworks SAN and each server has 2 fiber connections to the san for resilience. Our cluster failover tests all worked well until we simulated a loss of connection to the Storageworks disks by disconnecting the fiber connections to one of the servers. Instead of failing over to the other server we had a series of hard disk write failure messages. When we manually moved the clustering to the other server the SQL database would not start and had a corrupted master database.
When questioned our reseller theorised that when we pulled the fiber connections from the server there was still traffic in the fiber cable in the form of photons of light that caused the corruption.
I am not convinced by this answer as I would have thought that error checking algorithms in the fiber comms would prevent this happening.
I would like to know:
Has anyone successfully failed over a sql2000 cluster when the server’s connection to the storage is cut?
Has anyone a better theory on why the masterdb corrupted?
Thanks