SQL Mail Hangs after Domain Controller Reboots - Why?

  • Let me start by saying I plan on moving to SMTP mail later this year. 

    For many months our SQL Servers (only Standard Edition) periodically experience problems with email hanging.  I've read numerous forum posts and Microsoft articles about email problems, but none have corrected our problem.  I decided to put up with the problem until I can convert to SMTP mail later in the year, but today I came in and found 9 servers that require SQL Server to be restarted in order to fix email.  (Restarting only SQL Server Agent service does not correct the problem).  I also found out that our Domain Controllers were restarted during the night.  I have read where loss of a network connection can cause MAPI problems, although I'm not sure why only 9 of 45+ servers were impacted.  The impacted servers do not all share the same DC.

    One of the servers could not authenticate with the network today so we rebooted its domain controller.  After rebooting the DC, the network communication issue was resolved, but another SQL Server, which uses an entirely different DC, suddenly experienced email failure.  Has anyone encountered anything like this?  I'm starting to wonder if the email problems are more DC related then MAPI related, but my DC knowledge is very basic.  I've never experienced problems with SQL Server 2000 EE, only SE.

    Dave

  • Check if your servers that require reboot and the ones that did not require reboot for IP address assignment: Fixed IP or DHCP Reserved IP. It may or may not be a reason. Look in the network connection properties under TCP/IP properties.

    Regards,Yelena Varsha

  • SQL Server and Agent use different email sessions. If I remember correctly, a tech net article explained that the reason for hung email sessions was that the client displays a Retry? dialog that cannot be satisfied without user interaction. It is a basic design flaw in SQL Mail. There is no workaround, which is why most people move to SMTP mail.

    This perhaps explains the behavior you observed - only those SQL Mail services actually attempting email connectivity during a DC outage would hang.

  • (1) The IP addresses are static. 

    (2) After midnight all of our servers execute a schedule job to send a test email to the DBA.  This was setup to help detect email problems.  Last night I received emails from just over 30 servers.  There has to be some type of pattern.  I'm just not seeing it.  I've checked the version of Outlook on each server thinking this may be the problem, but I've seen success & failure on versions ranging from MS Exchange 5 to Outlook 2003.

    You could be correct about the reply message.  I've read about that problem and never got around to starting SQL Server as a session and not a service to see if this is the problem.

    Thanks,   Dave

  • Was the server rebooted a global catalog? Was it the only one in the site?

    K. Brian Kelley
    @kbriankelley

  • It's probably not the Domain Controller reboot that's killing you, it's the Exchange Server.  If the DC and Exchange Server are one and the same, that's your "why".

    You'll see this behavior EVERY time your Exchange Server is restarted.  Only way to clear/prevent the error is to issue and "xp_stopmail" then "xp_startmail" prior to attempting to send email via "xp_sendmail".  Needs to be done on every SQL server that uses that particular Exchange Server.

    I've seen repeated attempts to send email via "xp_sendmail" after losing connectivity to the Exchange Server actually cause SQL Server to hang.

     

     

  • We have four domain controllers and Exchange is not a DC.  xp_stopmail / xp_startmail fails when this problem occurs.  The only solution 9 out of 10 times is to restart SQL Server.  On occassion restarting only SQL Server Agent resolves the problem, but most of the time that does not work.

    I'm not sure if the DC that was rebooted is a global catalog.  I'll have to ask.

    Thanks,  Dave

  • If the DC being rebooted is a global catalog, Exchange uses it heavily. Even if you have multiple GCs in a site, you can see a slight delay with Exchange switching over. After all, it's expecting a server to be available and suddenly it isn't so it now has to go through the who process of figuring out what new server to go talk to. As a result, you can sometimes get a disconnect with respect to the Outlook client. At least, that's the type of behavior we've seen.

    K. Brian Kelley
    @kbriankelley

  • Three of our four domain controllers are global catalogs.  The DC that was rebooted yesterday is one of the three with a global catalog.  Please help me understand this a bit more.  How does SQL Server determine which of the three global catalogs/DCs to reference?

    Also, I've never encountered this problem with SQL Server EE, only SE.  Are you aware of any differences with the way MAPI is handled between EE and SE?

    Thanks,   Dave

  • It doesn't. Exchange does. And if Exchange is having problems talking to a GC, it then has to go find a new one. In cases like this we've sometimes seen Outlook clients fail to get a connection to Exchange. Usually it's very brief, but it pops up an error... which of course can't be seen with SQL Server.

    K. Brian Kelley
    @kbriankelley

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply