DTC fails to work on SQL Server after changing to cluster

  • The problem is that after reducing MSDTC security to "No Authentication", it mostly seems to work correctly, except that after a brief period of inactivity (a few minutes), the first attempt to use DTC fails, but subsequent attempts are successful, until inactivity again.

    The original environment was a single machine with SQL Server installed and a service on another machine that would connect to the same SQL Server but utilizing different connections concurrently within the same transaction (using TransactionScope).  This configuration worked fine.

    The single machine with SQL Server installed has been changed to a cluster of two machines.  Otherwise the configuration is the same.  The "Distributed Transaction Coordinator" service is running as the Network Service account in all cases.  Now the following error is generated when the second connection is made in the transaction (presumably promoting the transaction to DTC):

    System.Transactions.TransactionManagerCommunicationException

    Communication with the underlying transaction manager has failed.

    All of these machines are running Win2003 SP1.  I've also tried using my local machine (which is XP Pro SP2) as the service machine with the same results.  The current solution is to modify the MSDTC tab in Component Services from "Mutual Authentication Required" to "No Authentication Required".  This isn't an ideal solution, but seems to work...somewhat.  After about 15 minutes of inactivity, the first attempt to connect to SQL Server in the transaction generates the same error as above.  Also, the following entry now appears about 5 times in a row in the Security event log.  Any subsequent connections seem to work fine until another period of inactivity.

    Logon Failure:

      Reason:  Unknown user name or bad password

      User Name: SQLMACHINE$

      Domain:  MYDOMAIN

      Logon Type: 3

      Logon Process: NtLmSsp

      Authentication Package: NTLM

      Workstation Name: SQLMACHINE

      Caller User Name: -

      Caller Domain: -

      Caller Logon ID: -

      Caller Process ID: -

      Transited Services: -

      Source Network Address: xxx.xxx.xxx.xxx

      Source Port: 1623

    A Google search yielded few results about NtLmSsp and the error.  One solution I've found to prevent the initial failure is to run ipconfig /flushdns prior to the first call and that seems to prevent it.  However, after the 15 minute period of inactivity, it seems to start again.

    I've also tried the recommendations from the following links with no success.

    http://blogs.msdn.com/florinlazar/archive/2004/06/18/159127.aspx

    http://blogs.msdn.com/florinlazar/archive/2004/03/02/82916.aspx

    http://technet.microsoft.com/en-us/library/bb457156.aspx#EGAA

    The ideal solution would be to use the "Incoming Caller Authentication Required" or stronger and to prevent the initial connection failure.  Any suggestions on the cause of this or what might be done to fix it?

    Thanks

  • Is it win2003 try using Kerberos authentication method instead of NTLM.

    Cheers,
    Sugeshkumar Rajendran
    SQL Server MVP
    http://sugeshkr.blogspot.com

  • I've been told that I cannot use Kerberos.  Are there any other options?  Is it definitely due to NTLM?

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply