June 19, 2019 at 12:00 am
Comments posted to this topic are about the item An Alert Philosophy
June 19, 2019 at 12:32 pm
This article reminds me of the alert fatigue that hospital staff (And fast food staff, it tends to be as much of a cacophony behind the counter these days) are subjected to. If everything is beeping at you all the time, the real alerts don't stand out.
I especially like the idea of refusing to set an alert for something you don't have to do anything about. That's a clear way to improve the signal/noise ratio.
June 19, 2019 at 3:08 pm
I developed a monitoring service on Windows that monitors a web application on a periodic basis by checking that it returns valid web login page and that it can establish a database connection. When an error is detected it notifies either the server hosting team, the DBA team, or the NOC team by email and then it switches to a longer polling period; once errors are no longer detected, it switches back to the regular polling period. It has a longer polling period for after hours, weekends, and holidays.
I've since used the monitoring service as a template to create other monitoring services for web applications that I support.
June 19, 2019 at 6:03 pm
Changing monitoring intervals, or suppressing additional alerts, is something that needs to happen as well. Too easy to keep sending the same thing over and over. However, that's a complex task. What I'd like is a "this is being worked" that limits alerts until someone notes it's fixed or they want alerts again.
June 20, 2019 at 12:26 pm
Changing monitoring intervals, or suppressing additional alerts, is something that needs to happen as well. Too easy to keep sending the same thing over and over. However, that's a complex task. What I'd like is a "this is being worked" that limits alerts until someone notes it's fixed or they want alerts again.
I like that idea, having the option to suppress or significantly reduce the frequency of the alerts until the anticipated correction reduces the tendency to build up a reflex of ignoring alerts.
June 20, 2019 at 1:12 pm
I want to keep sending the alerts out as long as the problem exists; otherwise, it may be viewed as a transient problem. Shifting to a longer polling frequency helps prevent overloading the responding team with alerts. One web application is used to register volunteers for disaster response and it is also used during disasters; the state agency where I work is a first responder for disasters, setting up shelters and taking care of health needs. For that service, I added a "Hurricane Mode" where it uses the normal working hours polling schedule for the weekends and after hours as long as the state's Emergency Management Division is activated.
Polling interval: Normal hours: 10 minutes, error detected: 60 minutes. After hours, weekend, holidays: 4 hours, error detected: 8 hours.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy