Why does everyone's SQL DB connection timeout at the same time every day?

  • Last thursday night we used Symantec Backup Exec System recovery to move an entire server to new hardware. We moved from HP G3, to a G5. We installed the latest PSP and everything seems to work fine. All devices are detected. There have been no errors in ANY of the event logs. However it seems that every day around 3:15pm all our users get a database timeout from an order system application they use and they have to exit and reenter the system. We have users across the country and it times out everyone in SQL. What could be causing this? It is SQL 2000. 1 Problem we did have is that it configured the server for DHCP and we had to change it to the correct IP address when it was first moved to new hardware. We checked WINS and DNS and made sure everything is up to date and accurate. For the most part it is business as usual but for the last few days everyone gets a SQL connection timeout at about 3:15pm each day and they have to reenter the order system application.

    Any help or suggestions on where to look would be appreciated!

  • Check the SQL Server Logs to see if there are any relevant massages. Also check the OS Event Log. Do any scheduled jobs run at this time that might interfere? (Not likely but definitely worth checking)

    Francis

  • fhanlon (10/9/2007)


    Check the SQL Server Logs to see if there are any relevant massages. Also check the OS Event Log. Do any scheduled jobs run at this time that might interfere? (Not likely but definitely worth checking)

    I have checked all of the system event logs and there is no errors of any kind. I checked SQl logs and the example below are the only messages and they are constant all day every day.

    EXAMPLE:

    Replication-Replication Distribution Subsystem: agent MIL-SERV-2-OCH_PDB-LA-SERV-1-86 succeeded. No replicated transactions are available..

    Error: 14150, Severity: 10, State: 1

    There are no scheduled jobs to run at the time that the problem occurs.

  • If everyone / everywhere is loosing connection could it be an application issue? Could the app be doing some backup / default / ??? process which forces the users off?

    Could it be a Network blip that is occurring at that time of day? Do you have a log on switches that you could look at?

    I've seen stranger things ....

  • Execution time of the same query may vary 100 times and even more. Like 1 second when data is in cache ("hot" execution) and 100 seconds in a "cold" mode, when SQL server is just started or after DBCC DROPCLEANBUFFERS

    Activity like making backups, defragmenting, database verification and other 'massive' operations usually changes the buffer cache contents completely, so after the night maintenance all executions are 'cold'

    It is also possible that at that time you have some jobs running (backups for example).

    So begin from checking schedules of all your jobs. Then, leave SQL profiler running at that time and analyze a trace.

  • UPDATE:

    The problem now seems to be random. We had a weird blip that disconnected a couple users at 10:45am this morning and again at 11:15am. It appears that it is just happening at random times throughout the day.

    Ok I have spent the last few hours scouring logs and jobs and found nothing. However I am finding that since I rebooted last night there has been 48 TCP recieve errors and under TOE statistics 748 TCP errors recieved. The number is climbing slowly but surely. I have proliant service pack 7.8 installed and it is a new G5. I think tonight I am going to install PSP 7.9A and possible install the newer broadcom drivers unless anyone else has a suggestion. I have a feeling though that this is the problem. No other servers in our infrastructure have tcp errors.

    Any other ideas on what causes TCP recieve errors and how to fix them?

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply