Diagnosing “dead” SQL Server instances
Several times in the past, I’ve had to deal with massive outages where tens or even hundreds of SQL Server instances have gone down and only a portion of those came back online. When trying to find the causes for why the SQL Server service doesn’t start, one of the best places to start is the SQL Server Error log.
So, what is the problem?
There is a drawback when the SQL Server Service is not running: the normal methods of displaying and analysing the Error Logs are not available. The option then is to use a manual method that means that one must locate the Error Log directory, open all the log files in a text editor, and go through each line. That can be quite a task for a single instance, let alone having to do this for many servers. To mitigate this, I have written a simple batch file and compiled a collection of search pattern files. The batch file searches for the error log directory and then parses each file, searching for error entries for each severity level by matching the search pattern files.
Having done this work and knowing that others might face the same problem, I decided to share my solution with this quick write-up.
Compatibility
An SQL Server is like a box of chocolates; you’ll never know what version or architecture you’re going to get. Hence, for this task, the best option is to use DOS/CMD commands as that does not require any additional tools or libraries being installed on the server.
Of those commands, the most helpful for this purpose are DIR, FINDSTR, and the control flow, IF. The FINDSTR is particularly useful, as one can compile the search patterns in a file and then pass the file name as a parameter.
FINDSTR has one drawback, which is that if the file being search is a Unicode file, then no match will be found unless one appends every character with the period, “.”, character in the search pattern file. Depending on the SQL Server version, the Error Log files can either be Unicode or ASCII text files, therefore in the effort of creating one-size-fits-all, two pattern files are used for each error level, one in ASCII format, here the file name of SEVERITY_LEVEL[NN].txt being prefixed with WINNT and the Unicode files simply called SEVERITY_LEVEL[NN].txt.
How it works
When invoked with the drive letter of the drive to search, the batch file simply recursively iterates through all directories and sub-directories of the drive until it locates the ERRORLOG directory, then it does a pattern matching on all the log files in that directory, very simple. Since this is not done in SQL, I’m not going to go any further into the actual code.
Usage
The prime requirement is to have an Administrative access to the system being diagnosed.
The first step is to upload/copy the zip archive and expand it on the target system. Since there are some older systems which do not have a command line unzip programmes installed, I normally do this from either a USB key with the directory uncompressed or from a network share.
The next step is to open a command prompt, navigate to the directory containing the TSSQLEX.bat file and then run it with the with the desired parameter / switches. To get a list of applicable switches, run the batch file without any parameters or switches.
A nifty little thing in the batch file is the /O switch, which writes the output to a file in the batch file directory. This means that one does not have to scrape the command prompt output to collect the results.
Syntax
TSSQLELX.bat [Drive letter] [/S] [/P] [/V] [/H] [/O]
Parameters:
The first parameter must either be a single character drive letter of the drive to search or the /H switch, if no arguments are passed, the syntax is printed.
Switches:
S Skip the functionality tests
The functionality test asserts the search functionality by looking for a pattern that appears in all error log files.
P Print the individual commands before executing
Useful for repeating searches for individual error level by copying the command and re-run it from the command prompt.
V Verbose, print additional information for each error level.
Prints the error level description text from BOL.
H prints this full help message.
The message can be found in the README.txt file in the batch file's directory.
O Writes the output to a file called COMPUTERNAME.USERDNSDOMAIN_ TSSQLEX.txt.
The file is created or overwritten in the batch file's directory.