Max Worker Thread

  • We have been getting this timeout error randomly. It occurs 2 or 3 times a week. I always use the DMV to see if there is any blocking or locking but don't see anything. So I Googled search the error and this is what I found out. "A time-out occurred during a retrieval operation because the server or database was busy performing other operations. Causes include low system resources, such as CPU time, or server resources, such as thread congestion." We have enough resources and memory so that's not the issue. So I stated reading about the thread congestion and this is what I found "SQL Server would add the number of the threads when the workload is increasing. Since there are only limited number of the work threads, if all work threads has been used up, the SQL server would basically hang and cannot accept new user connection."

    Environment details:

    SQL Server 2008R2

    3 instances are running and each instance has only 1 DB.

    Size of the DBs

    28GB

    12GB

    13GB

    Timeout occurs for only 2 instances and not the 3rd 1.

    I don't know if should increase the value of MAX worker thread or not. Any suggestion?

    "He who learns for the sake of haughtiness, dies ignorant. He who learns only to talk, rather than to act, dies a hyprocite. He who learns for the mere sake of debating, dies irreligious. He who learns only to accumulate wealth, dies an atheist. And he who learns for the sake of action, dies a mystic."[/i]

  • Try run the following code on the instances with the timeout issues. You'll get a better description of the errors.

    query from http://thinknook.com/sql-server-connection-reset-for-connection-pooling-2012-02-26/%5B/url%5D

    [code="sql"]

    ;WITH connectivity_ring_buffer as

    (SELECT

    record.value('(Record/@id)[1]', 'int') as id,

    record.value('(Record/@type)[1]', 'varchar(50)') as type,

    record.value('(Record/ConnectivityTraceRecord/RecordType)[1]', 'varchar(50)') as RecordType,

    record.value('(Record/ConnectivityTraceRecord/RecordSource)[1]', 'varchar(50)') as RecordSource,

    record.value('(Record/ConnectivityTraceRecord/Spid)[1]', 'int') as Spid,

    record.value('(Record/ConnectivityTraceRecord/SniConnectionId)[1]', 'uniqueidentifier') as SniConnectionId,

    record.value('(Record/ConnectivityTraceRecord/SniProvider)[1]', 'int') as SniProvider,

    record.value('(Record/ConnectivityTraceRecord/OSError)[1]', 'int') as OSError,

    record.value('(Record/ConnectivityTraceRecord/SniConsumerError)[1]', 'int') as SniConsumerError,

    record.value('(Record/ConnectivityTraceRecord/State)[1]', 'int') as State,

    record.value('(Record/ConnectivityTraceRecord/RemoteHost)[1]', 'varchar(50)') as RemoteHost,

    record.value('(Record/ConnectivityTraceRecord/RemotePort)[1]', 'varchar(50)') as RemotePort,

    record.value('(Record/ConnectivityTraceRecord/LocalHost)[1]', 'varchar(50)') as LocalHost,

    record.value('(Record/ConnectivityTraceRecord/LocalPort)[1]', 'varchar(50)') as LocalPort,

    record.value('(Record/ConnectivityTraceRecord/RecordTime)[1]', 'datetime') as RecordTime,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/TotalLoginTimeInMilliseconds)[1]', 'bigint') as TotalLoginTimeInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/LoginTaskEnqueuedInMilliseconds)[1]', 'bigint') as LoginTaskEnqueuedInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/NetworkWritesInMilliseconds)[1]', 'bigint') as NetworkWritesInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/NetworkReadsInMilliseconds)[1]', 'bigint') as NetworkReadsInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/SslProcessingInMilliseconds)[1]', 'bigint') as SslProcessingInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/SspiProcessingInMilliseconds)[1]', 'bigint') as SspiProcessingInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/LoginTriggerAndResourceGovernorProcessingInMilliseconds)[1]', 'bigint') as LoginTriggerAndResourceGovernorProcessingInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]', 'int') as TdsInputBufferError,

    record.value('(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]', 'int') as TdsOutputBufferError,

    record.value('(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]', 'int') as TdsInputBufferBytes,

    record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/PhysicalConnectionIsKilled)[1]', 'int') as PhysicalConnectionIsKilled,

    record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/DisconnectDueToReadError)[1]', 'int') as DisconnectDueToReadError,

    record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NetworkErrorFoundInInputStream)[1]', 'int') as NetworkErrorFoundInInputStream,

    record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/ErrorFoundBeforeLogin)[1]', 'int') as ErrorFoundBeforeLogin,

    record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/SessionIsKilled)[1]', 'int') as SessionIsKilled,

    record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalDisconnect)[1]', 'int') as NormalDisconnect

    --record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalLogout)[1]', 'int') as NormalLogout

    FROM

    ( SELECT CAST(record as xml) as record

    FROM sys.dm_os_ring_buffers

    WHERE ring_buffer_type = 'RING_BUFFER_CONNECTIVITY') as tab

    )

    SELECT c.RecordTime,m.[text],*

    FROM connectivity_ring_buffer c

    LEFT JOIN sys.messages m ON c.SniConsumerError = m.message_id AND m.language_id = 1033

    ORDER BY c.RecordTime DESC[/code]

    You should also check your wait stats - to determine if you have tasks waiting on a worker (worker thread starvation). You'll need to record the results from the following query in a temp table, then wait for some connectivity errors (use the ring buffer query above to be sure) then compare the wait stats to your temp table results. You should see high THREADPOOL waits if worker threads are running low.

    The following query is from [url]https://www.simple-talk.com/sql/performance/a-performance-troubleshooting-methodology-for-sql-server/">from http://thinknook.com/sql-server-connection-reset-for-connection-pooling-2012-02-26/%5B/url%5D

    ;WITH connectivity_ring_buffer as

    (SELECT

    record.value('(Record/@id)[1]', 'int') as id,

    record.value('(Record/@type)[1]', 'varchar(50)') as type,

    record.value('(Record/ConnectivityTraceRecord/RecordType)[1]', 'varchar(50)') as RecordType,

    record.value('(Record/ConnectivityTraceRecord/RecordSource)[1]', 'varchar(50)') as RecordSource,

    record.value('(Record/ConnectivityTraceRecord/Spid)[1]', 'int') as Spid,

    record.value('(Record/ConnectivityTraceRecord/SniConnectionId)[1]', 'uniqueidentifier') as SniConnectionId,

    record.value('(Record/ConnectivityTraceRecord/SniProvider)[1]', 'int') as SniProvider,

    record.value('(Record/ConnectivityTraceRecord/OSError)[1]', 'int') as OSError,

    record.value('(Record/ConnectivityTraceRecord/SniConsumerError)[1]', 'int') as SniConsumerError,

    record.value('(Record/ConnectivityTraceRecord/State)[1]', 'int') as State,

    record.value('(Record/ConnectivityTraceRecord/RemoteHost)[1]', 'varchar(50)') as RemoteHost,

    record.value('(Record/ConnectivityTraceRecord/RemotePort)[1]', 'varchar(50)') as RemotePort,

    record.value('(Record/ConnectivityTraceRecord/LocalHost)[1]', 'varchar(50)') as LocalHost,

    record.value('(Record/ConnectivityTraceRecord/LocalPort)[1]', 'varchar(50)') as LocalPort,

    record.value('(Record/ConnectivityTraceRecord/RecordTime)[1]', 'datetime') as RecordTime,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/TotalLoginTimeInMilliseconds)[1]', 'bigint') as TotalLoginTimeInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/LoginTaskEnqueuedInMilliseconds)[1]', 'bigint') as LoginTaskEnqueuedInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/NetworkWritesInMilliseconds)[1]', 'bigint') as NetworkWritesInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/NetworkReadsInMilliseconds)[1]', 'bigint') as NetworkReadsInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/SslProcessingInMilliseconds)[1]', 'bigint') as SslProcessingInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/SspiProcessingInMilliseconds)[1]', 'bigint') as SspiProcessingInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/LoginTimers/LoginTriggerAndResourceGovernorProcessingInMilliseconds)[1]', 'bigint') as LoginTriggerAndResourceGovernorProcessingInMilliseconds,

    record.value('(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]', 'int') as TdsInputBufferError,

    record.value('(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]', 'int') as TdsOutputBufferError,

    record.value('(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]', 'int') as TdsInputBufferBytes,

    record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/PhysicalConnectionIsKilled)[1]', 'int') as PhysicalConnectionIsKilled,

    record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/DisconnectDueToReadError)[1]', 'int') as DisconnectDueToReadError,

    record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NetworkErrorFoundInInputStream)[1]', 'int') as NetworkErrorFoundInInputStream,

    record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/ErrorFoundBeforeLogin)[1]', 'int') as ErrorFoundBeforeLogin,

    record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/SessionIsKilled)[1]', 'int') as SessionIsKilled,

    record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalDisconnect)[1]', 'int') as NormalDisconnect

    --record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalLogout)[1]', 'int') as NormalLogout

    FROM

    ( SELECT CAST(record as xml) as record

    FROM sys.dm_os_ring_buffers

    WHERE ring_buffer_type = 'RING_BUFFER_CONNECTIVITY') as tab

    )

    SELECT c.RecordTime,m.[text],*

    FROM connectivity_ring_buffer c

    LEFT JOIN sys.messages m ON c.SniConsumerError = m.message_id AND m.language_id = 1033

    ORDER BY c.RecordTime DESC

    You should also check your wait stats - to determine if you have tasks waiting on a worker (worker thread starvation). You'll need to record the results from the following query in a temp table, then wait for some connectivity errors (use the ring buffer query above to be sure) then compare the wait stats to your temp table results. You should see high THREADPOOL waits if worker threads are running low.

    The following query is from https://www.simple-talk.com/sql/performance/a-performance-troubleshooting-methodology-for-sql-server/

    SELECT TOP 10

    wait_type ,

    max_wait_time_ms wait_time_ms ,

    signal_wait_time_ms ,

    wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms ,

    100.0 * wait_time_ms / SUM(wait_time_ms) OVER ( )

    AS percent_total_waits ,

    100.0 * signal_wait_time_ms / SUM(signal_wait_time_ms) OVER ( )

    AS percent_total_signal_waits ,

    100.0 * ( wait_time_ms - signal_wait_time_ms )

    / SUM(wait_time_ms) OVER ( ) AS percent_total_resource_waits

    FROM sys.dm_os_wait_stats

    WHERE wait_time_ms > 0 -- remove zero wait_time

    AND wait_type NOT IN -- filter out additional irrelevant waits

    ( 'SLEEP_TASK', 'BROKER_TASK_STOP', 'BROKER_TO_FLUSH',

    'SQLTRACE_BUFFER_FLUSH','CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT',

    'LAZYWRITER_SLEEP', 'SLEEP_SYSTEMTASK', 'SLEEP_BPOOL_FLUSH',

    'BROKER_EVENTHANDLER', 'XE_DISPATCHER_WAIT', 'FT_IFTSHC_MUTEX',

    'CHECKPOINT_QUEUE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',

    'BROKER_TRANSMITTER', 'FT_IFTSHC_MUTEX', 'KSOURCE_WAKEUP',

    'LAZYWRITER_SLEEP', 'LOGMGR_QUEUE', 'ONDEMAND_TASK_QUEUE',

    'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BAD_PAGE_PROCESS',

    'DBMIRROR_EVENTS_QUEUE', 'BROKER_RECEIVE_WAITFOR',

    'PREEMPTIVE_OS_GETPROCADDRESS', 'PREEMPTIVE_OS_AUTHENTICATIONOPS',

    'WAITFOR', 'DISPATCHER_QUEUE_SEMAPHORE', 'XE_DISPATCHER_JOIN',

    'RESOURCE_QUEUE' )

    ORDER BY wait_time_ms DESC

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply