Checking the Integrity of Your Notification and Alerting System

Question

Checking the Integrity of Your Notification and Alerting System

tew

Ten Centuries

Points: 1249
More actions
May 26, 2009 at 1:00 pm

#131941

My shop, probably like most, uses email as the alerting system for database servers. But...how do you efficiently determine if your alerting system is functioning as expected?
While performing some adhoc investigations, I noticed that our exchange server wasn't allowing connections from one of our database servers. It was an easy fix, but it'd sure be a lot nicer if I could some how set up a routine to show what servers are being denied access to exchange. It seems I can't assume that even though DBMail was set up successfully, it'll work forever...
My initial thought is to have each db server send an email with a pre-defined subject line such as "AM Server Check from " and have an SSIS package check for these emails and compare the ServerName in the subject line to a centrally stored list. Any missing emails indicate that the notification system for that server isn't functioning properly. Can this be done? Can SSIS parse emails?
As always, thanks for your help.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 1

We use "Hello" messages. The server monitoring generates mails independent if there is an error or not. If there is no error the server just sends a mail with subject like "Hello from SERVERNAME". In case of problems the subject is different and log files or more detailed information are attached.

tew Ten Centuries Points: 1249 More actions · Answer 2

Do you compare the "hello" emails to a list of servers that you expect responses from, or handle it manually? We have quite a few servers, so an automated check against a list of registered servers is preferable and any delta's are highlighted.

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 3

We have tables which contain all names of the servers. A central process runs continuously every five minutes and checks if new mails are available. If a new mail is available it will be scanned.

If the mail is a "Hello message" the table gets an update for the date/time of the last hello message.

The table has an additional column which specifies how many minutes are maximal allowed for a pending "Hello". If a hello message is pending for a server an error message will be sent to the specific DBA, support team and (if out of business time) the standby supporter gets a phone call.

If the message is an "Error" the process tries to check the reason. If reason is an index fragmentation or some other error which can wait it creates only the emails. If the error reason is either undefined or a defined "fatal" it starts the same procedure as for a missing "Hello".

tew Ten Centuries Points: 1249 More actions · Answer 4

Hi Florian,

Thanks for your patience.

Can you explain how you scan the emails? What technology do you use?

thanks again for your help,

Tim

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 5

Hi Tim

I'm sorry but I think I cannot help any more at this point. I'm a developer, not a DBA.

The server monitoring is written by our DBAs. They use Perl and text based mails. Since I have no idea about Perl I can't tell you how it works.

If you have any developers in your company it should be no problem for them to provide a little tool which takes a mail from server and parse the mail. Or, at least create a text file with mail subject as file name and its body as content. This should be quiet simple to handle by any scripting language.