August 15, 2006 at 1:36 am
So we have a cluster for SQL Server 2000. Several times now SQL Server spontanouesly restarts, sometimes a dumpfile is produced. Searching through the logs has not told me anything sofar. I have a dumpfile produced by SQL Server, but I don't begin to understand what it tells me.
I am told by WinAdmins(or Infrastructure employees), that for another issue, possibly related, MS Support has been contacted. This issues is about the config of the cluster setup. I don't know if that will gain anything, nor do I know how log it will take for MS to resolve it, and if the problem of spontanouersly restarting SQL Server will be solved.
This restarting occurs regularly, 3 times in a week time, not on simular times. There are no jobs starting at or shortly before the restart. There are several jobs which run every minute of every 5 minutes. The latest SP's are installed, operating system is W2000. There are plans to upgrade to W2003 and SQL2005, don't know when this will happen though.
So back to the dumopfile: is there any meaningfull info to be had for someone, who only speaks english?! At the end of the file(s) there is 1 or more lines saying:
***Unable to get thread context for spid nnn
Could this mean anything? I could rig some logging script to insert info on SPIds in a table, but would it mean anything? Also for every dumpfile a *.mdmp file exists; I can open this with Visual Studio, but it shows nothing at all.
Any hints apreciated.
Greetz,
Hans Brouwer
August 16, 2006 at 6:35 am
Whats the STOP hex code? (0x0000ABCDE). Those are the most easily translatable parts of the dump. This sites should help you figure out the root of the problem.
http://www.aumha.org/win5/kbestop.php
-or-
Google for "Windows stop error codes"
It may not even be a MS problem, some stop codes come from other applications.
August 16, 2006 at 8:08 am
Or search in the Windows SDK, most are in there.
August 16, 2006 at 12:22 pm
I would not bother with the SQL dump files (or mini-dumps). If your Windows Admins are correct, then a cluster configuration issue is pretty easy to fix in an active/active or active/passive 2 node cluster. They should be looking at the cluster.log file immediately after a failure. I say immediately because this file is a fixed size and is overwritten as needed like the Windows event logs.
RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."
August 17, 2006 at 1:38 am
Tnx for the responses, all.
I'll leave it in the 'capable' hands of the WinAdminsfor the time being.
Greetz,
Hans Brouwer
August 17, 2006 at 2:52 am
When I've had servers restarting for no apparent reason in the past, it has often been traced to the network card. Are your NICs teamed, by any chance?
John
August 17, 2006 at 6:36 am
Problem seems to be a clusterproblem. This weekend a fix is implemented, we will see if this is really it.
My question was, however, is it useful to try and read a filedump
Greetz,
Hans Brouwer
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply