March 25, 2014 at 8:43 pm
Hello,
we had this problem when sql server did backup of database and third(production) database showed this error and restart sql service.
It was one times and another backups are ok without problem from this time.
I found in our monitoring system Latches wait times in this time.
Next I checked this link http://support.microsoft.com/kb/2137408
When I was thinking about this, it is possible that somebody executed very poor query and during this backup the disk fall down ?
or it is possible that Database Collection caused this ? I have set for DiskUsage, Query statistics and Server Activity. For example that this job for collection was executed in same time like database backup and wrote to disk in same time...( I have MDW Database on same disk )
Because we have externist consultant and he said that probably caused one job which is execute one per minute, but I am not sure, because this job is almost without logical reads and scan very small table ( tracking table, without add rows = just update ) .
Next is that we have more worst jobs, which have more logical reads and executed every 30sec. So I dont think that it caused this jobs, but for sure we turned off..
And why sql service was restarted ? I found this problem because I wanted select statistics for report and it was restarted.
in error log i found just this
2014-03-21 18:07:43.41 spid8s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [G:\USERDB\MCS_MESDB_1.ldf] in database [MCS_MESDB] (8). The OS file handle is 0x0000000000000D08. The offset of the latest long I/O is: 0x000002e70cb000
Thanks a lot,
March 25, 2014 at 11:02 pm
check the code of job and study execution plan which will help you. Also ask your storage team to find out the details of disk issue at specific point of time or any other issue related to I/O.
HTH
---------------------------------------------------
"Thare are only 10 types of people in the world:
Those who understand binary, and those who don't."
March 25, 2014 at 11:14 pm
I'm not an expert in this area but the first thing I would do is suspect there was something wrong with the IO subsystem. I'd run SQLIOSIM to apply a load to see if it fails.
March 26, 2014 at 12:43 am
thanks for response, I created this job so I really know that it couldnt be cause...
Next point
- I checked that was full RAM - it was full about 3days before I think
is just my opinion, that during backup was some IO operation and somebody or program executed poor query and it had to use DISK instead RAM, and there can be problem, that DISK was not response>> lot of IO and disk couldnt handle ...what do you think ?
But still I dont know why sql service was restarted,, it is possible that it was not responded during this situation and it was automatically restart? Any experience with this?
March 26, 2014 at 10:51 am
That error shouldn't cause SQL Server restart. Could be a problem with the I\O Subsystem. Did you check for the SQL Server dump file and analyze it ? You need to dig deeper to find the issue. Also check the Application and System logs too.
--
SQLBuddy
March 26, 2014 at 3:32 pm
sqlbuddy123 (3/26/2014)
That error shouldn't cause SQL Server restart. Could be a problem with the I\O Subsystem. Did you check for the SQL Server dump file and analyze it ? You need to dig deeper to find the issue. Also check the Application and System logs too.--
SQLBuddy
I checked all SQL logs and I found just this error. But nothing why the sql service was restarted..
March 26, 2014 at 3:46 pm
Check the Windows Application and System logs in the Event Veiwer ( Control Panel --> Administrative Tools ).
Also check if SQL Server dump file has been created during the SQL Server restart. It's a .mdmp file found in SQL Server Log folder like
C:\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log
--
SQLBuddy
March 26, 2014 at 5:12 pm
sqlbuddy123 (3/26/2014)
Check the Windows Application and System logs in the Event Veiwer ( Control Panel --> Administrative Tools ).--
SQLBuddy
From this query below I found create date of tempdb - last restart service - 2014-03-21 18:09:43.863
SELECT create_date FROM sys.databases WHERE name = 'tempdb'
Event Viewer I checked also
this is from Application
21. 3. 2014 18:07:21 Log was backed up. Database: DBNAME, creation date(time): 2014/01/03(18:23:26), first LSN: 48073:10596:1, last LSN: 48079:25339:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'TDPSQL-00000C78-0000'}). This is an informational message only. No user action is required.
21. 3. 2014 18:07:43 SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [G:\USERDB\MCS_MESDB_1.ldf] in database [MCS_MESDB] (8). The OS file handle is 0x0000000000000D08. The offset of the latest long I/O is: 0x000002e70cb000
21. 3. 2014 18:08:31 SQLVDI: Loc=SignalAbort. Desc=Client initiates abort. ErrorCode=(0). Process=3192. Thread=10908. Client. Instance=. VD=Global\TDPSQL-00000C78-0000_SQLVDIMemoryName_0.
21. 3. 2014 18:08:41 Configuration option 'Agent XPs' changed from 1 to 0. Run the RECONFIGURE statement to install.
And Now I am seeing that I forgot System log, but time is after the problem, but maybe was problem with failoverclustering and it caused restart service,,, is it possible?
21. 3. 2014 18:08:39 Cluster resource 'SQL Server' of type 'SQL Server' in clustered role 'SQL Server (MSSQLSERVER)' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
21. 3. 2014 18:08:54 Cluster resource 'MES-SQL-CL' of type 'SQL Server Availability Group' in clustered role 'MES-SQL-CL' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
21. 3. 2014 18:09:22 A timeout (30000 milliseconds) was reached while waiting for a transaction response from the MSSQLSERVER service.
21. 3. 2014 18:09:26 Reset to device, \Device\RaidPort2, was issued.
21. 3. 2014 18:09:26 Cluster resource 'MES-SQL-CL' of type 'SQL Server Availability Group' in clustered role 'MES-SQL-CL' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
21. 3. 2014 18:09:26 The Cluster service failed to bring clustered service or application 'MES-SQL-CL' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
21. 3. 2014 18:09:26 Clustered role 'MES-SQL-CL' has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional attempts will be made to bring the role online or fail it over to another node in the cluster. Please check the events associated with the failure. After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.
21. 3. 2014 18:09:37 The SQL Server (MSSQLSERVER) service terminated unexpectedly. It has done this 1 time(s).
21. 3. 2014 18:09:39 The SQL Server (MSSQLSERVER) service entered the running state.
March 26, 2014 at 5:25 pm
sqlbuddy123 (3/26/2014)
Also check if SQL Server dump file has been created during the SQL Server restart. It's a .mdmp file found in SQL Server Log folder like
C:\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Log
--
SQLBuddy
no dump file was created, just error log, AlwaysOn_health and FDLAUNCHERRORLOG
March 26, 2014 at 5:40 pm
tony28 (3/26/2014)
And Now I am seeing that I forgot System log, but time is after the problem, but maybe was problem with failoverclustering and it caused restart service,,, is it possible?
Good that you didn't have a SQL Server dump file. And basically SQL Error Log entries are logged in Application log too.
Are you using Clustering with Always ON or just Always ON ?
First find the exact stop time of the SQL services from the error log. Note that stop time and check a bunch of log entries before that time.
Looks like we can find some clue from the system log. The system log you posted is not enough. It's better to go back a little bit. Can you post all the system log entries on that day before the SQL Server failure ? You can attach it as a text file.
Also check the same on the SQL\Application log file too i.e all the events before the SQL Server stop time (on that day).
--
SQLBuddy
--
SQLBuddy
March 26, 2014 at 6:32 pm
sqlbuddy123 (3/26/2014)
tony28 (3/26/2014)
And Now I am seeing that I forgot System log, but time is after the problem, but maybe was problem with failoverclustering and it caused restart service,,, is it possible?
Good that you didn't have a SQL Server dump file. And basically SQL Error Log entries are logged in Application log too.
Are you using Clustering with Always ON or just Always ON ?
First find the exact stop time of the SQL services from the error log. Note that stop time and check a bunch of log entries before that time.
Looks like we can find some clue from the system log. The system log you posted is not enough. It's better to go back a little bit. Can you post all the system log entries on that day before the SQL Server failure ? You can attach it as a text file.
Also check the same on the SQL\Application log file too i.e all the events before the SQL Server stop time (on that day).
--
SQLBuddy
--
SQLBuddy
1. I am not sure, because it was set up by external company, I can just handle it. But I think with Failover Cluster
We have 3nodes. First two are in SQL Server Failover Cluster and then this group is with node 3 in AlwaysOn Availability Group. This is according to presentation what I have.
2.
Date3/21/2014 6:08:41 PM
LogSQL Server (Archive #1 - 3/21/2014 6:08:00 PM)
Sourcespid198
Message
Configuration option 'Agent XPs' changed from 1 to 0. Run the RECONFIGURE statement to install.
Date3/21/2014 6:08:39 PM
LogSQL Server Agent (Archive #2 - 3/21/2014 6:08:00 PM)
Message
[131] SQLSERVERAGENT service stopping due to a stop request from a user, process, or the OS...
3. There is nothing more,
Just sometimes shows this error
Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface. hr = 0x80070005, Access is denied.
. This is often caused by incorrect security settings in either the writer or requestor process.
Operation:
Gathering Writer Data
Context:
Writer Class Id: {e8132975-6f93-4464-a53e-1050253ae220}
Writer Name: System Writer
Writer Instance ID: {fe75e400-b924-45d3-81d3-bb0ee1096cb3}
and this
Log Name: Application
Source: MSSQLSERVER
Date: 21. 3. 2014 18:08:41
Event ID: 15457
Task Category: Server
Level: Information
Keywords: Classic
User: User
Computer: NameOfServer
Description:
Configuration option 'Agent XPs' changed from 1 to 0. Run the RECONFIGURE statement to install.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="MSSQLSERVER" />
<EventID Qualifiers="16384">15457</EventID>
<Level>4</Level>
<Task>2</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2014-03-21T17:08:41.000000000Z" />
<EventRecordID>30331</EventRecordID>
<Channel>Application</Channel>
<Computer>NameOfServer</Computer>
<Security UserID="S-1-5-21-3074418465-2115644625-3052645785-1108" />
</System>
<EventData>
<Data>Agent XPs</Data>
<Data>1</Data>
<Data>0</Data>
<Binary>613C00000A0000000F0000004D00450053002D0048004100440052002D00560053005200560031000000050000006D007300640062000000</Binary>
</EventData>
</Event>
Log Name: Application
Source: SQLVDI
Date: 21. 3. 2014 18:08:31
Event ID: 1
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: NameOfServer
Description:
SQLVDI: Loc=SignalAbort. Desc=Client initiates abort. ErrorCode=(0). Process=3192. Thread=10908. Client. Instance=. VD=Global\TDPSQL-00000C78-0000_SQLVDIMemoryName_0.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="SQLVDI" />
<EventID Qualifiers="0">1</EventID>
<Level>2</Level>
<Task>0</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2014-03-21T17:08:31.000000000Z" />
<EventRecordID>30330</EventRecordID>
<Channel>Application</Channel>
<Computer>NameOfServer</Computer>
<Security />
</System>
<EventData>
<Data>SignalAbort</Data>
<Data>Client initiates abort</Data>
<Data>0</Data>
<Data>
</Data>
<Data>3192</Data>
<Data>10908</Data>
<Data>Client</Data>
<Data>
</Data>
<Data>Global\TDPSQL-00000C78-0000_SQLVDIMemoryName_0</Data>
</EventData>
</Event>
Log Name: Application
Source: MSSQLSERVER
Date: 21. 3. 2014 18:07:43
Event ID: 833
Task Category: Server
Level: Information
Keywords: Classic
User: N/A
Computer: NameOfServer
Description:
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [G:\USERDB\MCS_MESDB_1.ldf] in database [MCS_MESDB] (8). The OS file handle is 0x0000000000000D08. The offset of the latest long I/O is: 0x000002e70cb000
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="MSSQLSERVER" />
<EventID Qualifiers="16384">833</EventID>
<Level>4</Level>
<Task>2</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2014-03-21T17:07:43.000000000Z" />
<EventRecordID>30329</EventRecordID>
<Channel>Application</Channel>
<Computer>NameOfServer</Computer>
<Security />
</System>
<EventData>
<Data>1</Data>
<Data>15</Data>
<Data>G:\USERDB\MCS_MESDB_1.ldf</Data>
<Data>MCS_MESDB</Data>
<Data>8</Data>
<Data>0000000000000D08</Data>
<Data>0x000002e70cb000</Data>
<Binary>410300000A0000000F0000004D00450053002D0048004100440052002D0056005300520056003100000000000000</Binary>
</EventData>
</Event>
Then I find in FDLAUNCHERRORLOG
2014-03-21 18:09:39.877MSSQLFDLauncher service received control message.
2014-03-21 18:09:39.979MSSQLFDLauncher service successfully launched FDHost.exe Process(process id = 9124).
then in ERRORLOG
2014-03-21 09:17:43.97 Server The Service Broker endpoint is in disabled or stopped state
then just Log was backed up, and last backup wasnt, but backup in 22:00 was ok .. it was first time, but you know we work 24/5 and it could be problem...
2014-03-21 18:07:43.41 spid8s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [G:\USERDB\MCS_MESDB_1.ldf] in database [MCS_MESDB] (8). The OS file handle is 0x0000000000000D08. The offset of the latest long I/O is: 0x000002e70cb000
2014-03-21 18:08:41.91 spid198 Configuration option 'Agent XPs' changed from 1 to 0. Run the RECONFIGURE statement to install.
.
March 26, 2014 at 10:56 pm
This is what happened ..
SQL Server stopped with following error message :
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 21. 3. 2014 18:08:39
Event ID: 1069
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer: NameOfServer
Description:
Cluster resource 'SQL Server' of type 'SQL Server' in clustered role 'SQL Server (MSSQLSERVER)' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
SQL Server may stop unexpectedly due to different reasons :
Memory leaks from a process within the SQL Server memory space, such as from an extended stored procedure or from OLE automation.
Ending the sqlservr.exe process directly, such as from the Process tab on the Task Manager or with a program such as kill.exe from the Windows Resource Kit.
Some types of unhandled access violations.
Then SQL Server Agent got stopped ..
Date 3/21/2014 6:08:39 PM
Log SQL Server Agent (Archive #2 - 3/21/2014 6:08:00 PM)
Message
[131] SQLSERVERAGENT service stopping due to a stop request from a user, process, or the OS...
And later the message was ...
Log Name: System
Source: Service Control Manager
Date: 21. 3. 2014 18:09:37
Event ID: 7034
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: NameOfServer
Description:
The SQL Server (MSSQLSERVER) service terminated unexpectedly. It has done this 1 time(s).
This error message will be logged if SQL Server has been restarted by SQL Server Agent.
SQL Server Agent will only auto restart SQL Server if SQL Server stopped for some unexpected reason other than an explicit stop command from a user or application, and if the Auto restart SQL Server if it stops unexpectedly option is selected in SQL Server Agent properties. During the restart of SQL Server, SQL Server Agent will write this message to the application event log on the computer hosting SQL Server.
You need to check in SQL Server Agent properties in SSMS whether the following option has been enabled. Also check if SQL Server Agent has local admin rights on the box.
Auto restart SQL Server if it stops unexpectedly
--
SQLBuddy
March 27, 2014 at 12:43 am
Thank you, good job, very helpful and thanks for experience. I know that this option is in SQL server, but i wouldn't remember in that time
So Service state is both on ... so now i understand ...
and what do you think that it can be cause?
and I found one log yet, i couldnt open before, i dont like metro on servers..
2014-03-21 18:08:39 - ? [131] SQLSERVERAGENT service stopping due to a stop request from a user, process, or the OS...
2014-03-21 18:08:41 - ! [240] 1 engine thread(s) failed to stop after 2 seconds of waiting
2014-03-21 18:08:41 - ! [311] Thread 'JobManager - CmdExec step' (ID 5240) is still running
2014-03-21 18:08:41 - ! [311] Thread 'JobInvocationEngine' (ID 5340) is still running
2014-03-21 18:08:41 - ! [311] Thread 'JobManager - CmdExec step' (ID 5712) is still running
2014-03-21 18:08:41 - ! [311] Thread 'JobManager - TSQL step' (ID 6716) is still running
2014-03-21 18:08:41 - ! [311] Thread 'JobManager - TSQL step' (ID 9092) is still running
2014-03-21 18:08:41 - ! [311] Thread 'JobManager - TSQL step' (ID 9844) is still running
2014-03-21 18:08:41 - ! [311] Thread 'JobManager - TSQL step' (ID 10668) is still running
2014-03-21 18:08:41 - ? [000] Configuration option 'Agent XPs' changed from 1 to 0. Run the RECONFIGURE statement to install. [SQLSTATE 01000] (Message 15457)
2014-03-21 18:08:41 - + [098] SQLServerAgent terminated (forcefully)
March 27, 2014 at 8:33 am
tony28 (3/27/2014)
Thank you, good job, very helpful and thanks for experience. I know that this option is in SQL server, but i wouldn't remember in that timeSo Service state is both on ... so now i understand ...
and what do you think that it can be cause?
and I found one log yet, i couldnt open before, i dont like metro on servers..
2014-03-21 18:08:39 - ? [131] SQLSERVERAGENT service stopping due to a stop request from a user, process, or the OS...
2014-03-21 18:08:41 - ! [240] 1 engine thread(s) failed to stop after 2 seconds of waiting
2014-03-21 18:08:41 - ! [311] Thread 'JobManager - CmdExec step' (ID 5240) is still running
2014-03-21 18:08:41 - ! [311] Thread 'JobInvocationEngine' (ID 5340) is still running
2014-03-21 18:08:41 - ! [311] Thread 'JobManager - CmdExec step' (ID 5712) is still running
2014-03-21 18:08:41 - ! [311] Thread 'JobManager - TSQL step' (ID 6716) is still running
2014-03-21 18:08:41 - ! [311] Thread 'JobManager - TSQL step' (ID 9092) is still running
2014-03-21 18:08:41 - ! [311] Thread 'JobManager - TSQL step' (ID 9844) is still running
2014-03-21 18:08:41 - ! [311] Thread 'JobManager - TSQL step' (ID 10668) is still running
2014-03-21 18:08:41 - ? [000] Configuration option 'Agent XPs' changed from 1 to 0. Run the RECONFIGURE statement to install. [SQLSTATE 01000] (Message 15457)
2014-03-21 18:08:41 - + [098] SQLServerAgent terminated (forcefully)
You are welcome, Tony. You need to look for the log entries that have occured before 2014-03-21 18:08:39.
To find the reason for SQL Server failure, try to check the cluster log for errors. Check this for the cluster log ..
http://www.sqlserver-expert.com/2012/05/how-to-generate-clusterlog-for-windows.html
--
SQLBuddy
March 27, 2014 at 3:13 pm
Ok Now I will do log, after finish I will send..
do you think that is good this setting from your experience ? Auto restart SQL Server if it stops unexpectedly
I am thinking about this and maybe it is better that SQL server will be restart than frozen or something.
And please can you explain me something ?
SQL Server may stop unexpectedly due to different reasons :
Memory leaks from a process within the SQL Server memory space, such as from an extended stored procedure or from OLE automation.
and so is not missing the problem with Failover Clustering here ? Because you said 3 reasons, but from this log I see that problem was with Cluster.
Thank you.
Viewing 15 posts - 1 through 15 (of 22 total)
You must be logged in to reply to this topic. Login to reply