June 19, 2019 at 12:00 am
Comments posted to this topic are about the item An Alert Philosophy
June 19, 2019 at 12:32 pm
This article reminds me of the alert fatigue that hospital staff (And fast food staff, it tends to be as much of a cacophony behind the counter these days) are subjected to. If everything is beeping at you all the time, the real alerts don't stand out.
I especially like the idea of refusing to set an alert for something you don't have to do anything about. That's a clear way to improve the signal/noise ratio.
June 19, 2019 at 3:08 pm
I developed a monitoring service on Windows that monitors a web application on a periodic basis by checking that it returns valid web login page and that it can establish a database connection. When an error is detected it notifies either the server hosting team, the DBA team, or the NOC team by email and then it switches to a longer polling period; once errors are no longer detected, it switches back to the regular polling period. It has a longer polling period for after hours, weekends, and holidays.
I've since used the monitoring service as a template to create other monitoring services for web applications that I support.
June 19, 2019 at 6:03 pm
Changing monitoring intervals, or suppressing additional alerts, is something that needs to happen as well. Too easy to keep sending the same thing over and over. However, that's a complex task. What I'd like is a "this is being worked" that limits alerts until someone notes it's fixed or they want alerts again.
June 20, 2019 at 12:26 pm
Changing monitoring intervals, or suppressing additional alerts, is something that needs to happen as well. Too easy to keep sending the same thing over and over. However, that's a complex task. What I'd like is a "this is being worked" that limits alerts until someone notes it's fixed or they want alerts again.
I like that idea, having the option to suppress or significantly reduce the frequency of the alerts until the anticipated correction reduces the tendency to build up a reflex of ignoring alerts.
June 20, 2019 at 1:12 pm
I want to keep sending the alerts out as long as the problem exists; otherwise, it may be viewed as a transient problem. Shifting to a longer polling frequency helps prevent overloading the responding team with alerts. One web application is used to register volunteers for disaster response and it is also used during disasters; the state agency where I work is a first responder for disasters, setting up shelters and taking care of health needs. For that service, I added a "Hurricane Mode" where it uses the normal working hours polling schedule for the weekends and after hours as long as the state's Emergency Management Division is activated.
Polling interval: Normal hours: 10 minutes, error detected: 60 minutes. After hours, weekend, holidays: 4 hours, error detected: 8 hours.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply