BSOD, Idera, Cluster Failover.... HELP

  • Ok, I have 4 clusters running WIndows 2003, SQL 2005 Enterprise. We are using Idera SQLSafe at all 4 locations. Servers are a mix of Dells (3 locations) and MBX (1 location). All servers are 8 core with either 32 or 64 GB RAM. Servers are connected to a local SAN with a LUN assigned for storage of backup files.

    One location (MBX hardware and Hitachi SAN, using fiber optic NIC) has been experiencing failovers during the backup. This has been going on for over a YEAR and I am no closer to identifying the cause. Dump files have been sent to MIcrosoft and MBX. Responses have been replace RAM and Drivers. We have replace all the ram on both servers and updated all drivers. Of the 20 plus failovers (doesn't matter which node is online they both experience the same conditions), all of them have occurred during the backup process both with Idera and SQL Server Backup jobs. None of my other clusters are experiencing any issues (all Dell servers with EMC SANs).

    What do I check next? Has anyone experienced this type of issue?

    Raymond Laubert
    Exceptional DBA of 2009 Finalist
    MCT, MCDBA, MCITP:SQL 2005 Admin,
    MCSE, OCP:10g

  • Hello Raymond,

    I've asked TS to reopen the case you reported in January. If I'm reading the post correctly, the problem occurs during a SQLsafe backup as well as a native backup, is that correct? Either way, we're more than happy to take a look at it and see if we can help isolate it any although you've already worked with some pretty good folks. 🙂

    We'll be in touch via email.

    Heather Sullivan

    Director, SQL Server Products

    Idera

  • Yes it appear to be either way. Getting pretty bad. Had two failover this week. Use to go for almost 30 days between failovers. I really do not think it is Idera, but wanted to make sure all the information was available.

    Raymond Laubert
    Exceptional DBA of 2009 Finalist
    MCT, MCDBA, MCITP:SQL 2005 Admin,
    MCSE, OCP:10g

  • Heather/Ray,

    Thanks for the update and notes. Can someone post back here the results of your investigation so people that find this thread with Idera/Cluster in the title know what you have learned?

  • Hello Steve,

    Yes, we will. Thanks!

  • Been a little while since this was updated.

    To Date:

    We have updated BIOS, all drivers including QLOGIC SAN Drivers, Replace and upgraded memory, reseated the CPUs, numerous reboots, verified patches and OS level stuff.

    Since the last set of driver updates (QLOGIC) we have not had any failovers or other issues. It has been about 3 weeks.

    I am going to let it go for another 4 weeks to see how things go. I will update after that.

    Raymond Laubert
    Exceptional DBA of 2009 Finalist
    MCT, MCDBA, MCITP:SQL 2005 Admin,
    MCSE, OCP:10g

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply