How to read a dump...

  • So we have a cluster for SQL Server 2000. Several times now SQL Server spontanouesly restarts, sometimes a dumpfile is produced. Searching through the logs has not told me anything sofar. I have a dumpfile produced by SQL Server, but I don't begin to understand what it tells me.

    I am told by WinAdmins(or Infrastructure employees), that for another issue, possibly related, MS Support has been contacted. This issues is about the config of the cluster setup. I don't know if that will gain anything, nor do I know how log it will take for MS to resolve it, and if the problem of spontanouersly restarting SQL Server will be solved.

    This restarting occurs regularly, 3 times in a week time, not on simular times. There are no jobs starting at or shortly before the restart. There are several jobs which run every minute of every 5 minutes. The latest SP's are installed, operating system is W2000. There are plans to upgrade to W2003 and SQL2005, don't know when this will happen though.

    So back to the dumopfile: is there any meaningfull info to be had for someone, who only speaks english?! At the end of the file(s) there is 1 or more lines saying:

    ***Unable to get thread context for spid nnn

    Could this mean anything? I could rig some logging script to insert info on SPIds in a table, but would it mean anything? Also for every dumpfile a *.mdmp file exists; I can open this with Visual Studio, but it shows nothing at all.

    Any hints apreciated.

    Greetz,
    Hans Brouwer

  • Whats the STOP hex code? (0x0000ABCDE). Those are the most easily translatable parts of the dump. This sites should help you figure out the root of the problem.

    http://www.aumha.org/win5/kbestop.php

    -or-

    Google for "Windows stop error codes"

    It may not even be a MS problem, some stop codes come from other applications.

     

  • Or search in the Windows SDK, most are in there.

  • I would not bother with the SQL dump files (or mini-dumps). If your Windows Admins are correct, then a cluster configuration issue is pretty easy to fix in an active/active or active/passive 2 node cluster. They should be looking at the cluster.log file immediately after a failure. I say immediately because this file is a fixed size and is overwritten as needed like the Windows event logs.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • Tnx for the responses, all.

    I'll leave it in the 'capable' hands of the WinAdminsfor the time being.

    Greetz,
    Hans Brouwer

  • When I've had servers restarting for no apparent reason in the past, it has often been traced to the network card.  Are your NICs teamed, by any chance?

    John

  • Problem seems to be a clusterproblem. This weekend a fix is implemented, we will see if this is really it.

    My question was, however, is it useful to try and read a filedump

    Greetz,
    Hans Brouwer

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply