July 31, 2016 at 12:50 pm
Hi All,
I am trying to find out the root cause for SQL Server crash.
A timeout (30000 milliseconds) was reached while waiting for a transaction response from the MSSQLSERVER service.
The SQL Server (MSSQLSERVER) service terminated unexpectedly. It has done this 5 time(s).
AutoRestart: Unable to restart the MSSQLSERVER service (reason: An instance of the service is already running)
This file is generated by Microsoft SQL Server
version 12.0.4213.0
upon detection of fatal unexpected error. Please return this file,
the query or program that produced the bugcheck, the database and
the error log, and any other pertinent information with a Service Request.
Computer type is Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz.
Bios Version is Xen - 0
Revision: 1.221
32 X64 level 8664, 10 Mhz processor (s).
Windows NT 6.2 Build 9200 CSD .
Memory
MemoryLoad = 94%
Total Physical = 249999 MB
Available Physical = 14127 MB
Total Page File = 267338 MB
Available Page File = 28793 MB
Total Virtual = 134217727 MB
Available Virtual = 133598094 MB
***Stack Dump being sent to D:\tempdataroot\MSSQL12.MSSQLSERVER\MSSQL\LOG\SQLDump0039.txt
SqlDumpExceptionHandler: Process 159 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is
terminating this process.
* *******************************************************************************
*
* BEGIN STACK DUMP:
* 07/29/16 16:21:52 spid 159
*
*
* Exception Address = 000000007A93E615 Module(cwbodbc+000000000005E615)
* Exception Code = c0000005 EXCEPTION_ACCESS_VIOLATION
* Access Violation occurred reading address 000000006029E2CA
* Input Buffer 48 bytes -
* _GetTransfers
*
This dump is generated frequently. The SP name changes.
We have a linked server which gets the data from AS400 using cwbodbc(IBM iSeries) driver.
This is a primary node of AG setup. After primary node failed the automatic failover kicked in but dint succeed.
Following errors were reported.
The lease between availability group 'P1AG' and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster
AlwaysOn Availability Groups connection with secondary database terminated for primary database 'a' on the availability replica 'P2' with Replica ID: {}. This is an informational message only. No user action is required.
The availability group database "a" is changing roles from "PRIMARY" to "RESOLVING" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.
Unable to access availability database 'a' because the database replica is not in the PRIMARY or SECONDARY role. Connections to an availability database is permitted only when the database replica is in the PRIMARY or SECONDARY role. Try the operation again later.
Please help me troubleshoot this.
July 31, 2016 at 1:10 pm
Quick questions, what else is running on the server? What are the sql server memory configurations? Any relevant entries in the Windows Event Log? What is the storage configuration? Any relevant log entries from the IO subsystem? Any errors in the BIOS log?
😎
SQL Server is entirely dependent on the host OS for IO including disk/mem/network, likely there are indications in the logs of those sub-systems.
July 31, 2016 at 1:27 pm
SQL Server Memory : 220 GB
Total memory: 244 GB
Apart from SQL Server there is C# service which runs a bunch of Stored procedures and gets the data from AS400.
There is nothing much in Windows Event viewer.
The lease between availability group '' and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster.
Cluster resource '' of type 'SQL Server Availability Group' in clustered role '' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
No matching network interface found for resource '' IP address '' (return code was '5035'). If your cluster nodes span different subnets, this may be normal.
July 31, 2016 at 3:31 pm
With Access Violations causing SQL to terminate, you may be best off opening a case with Microsoft's Customer Support if it's a repeated crash. CSS have tools to read through stack dumps and identify the root cause.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 31, 2016 at 3:35 pm
Sure thanks. I will do that.
July 31, 2016 at 3:39 pm
I also noted wait type OLEDB just before crash.
wait_info
(596ms)OLEDB
(769ms)OLEDB
(926ms)OLEDB
(183ms)OLEDB
(848ms)OLEDB
(1977ms)OLEDB
(83ms)OLEDB
(1324ms)OLEDB
(274ms)OLEDB
(623ms)OLEDB
I am guessing its the cwbodbc driver which caused the crash.
July 31, 2016 at 3:44 pm
Could well be, but many DMVs internally use OLEDB, so not definitive.
See if there's an updated driver and, if there is, try that?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 31, 2016 at 3:49 pm
This is the output from whoisActive for all the sessions with OLEDB wait
sql_text: INSERT INTO @P
EXEC (@SQL) AT linkedservername
I will try getting the driver updated.
August 2, 2016 at 6:45 am
I had an issue in the past where a SQL server 2012 always on availability group would fail and cause SQL Server to crash with the error message "access violation".
This was when adding a new database into the availability group so I am not sure it is the same as your issue, but anyway the issue was that someone had created databases from scripts and had somehow created 2 databases with the same service broker GUID, once we dropped the database and recreated; a new GUID was issued and the problem went away.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply