April 7, 2018 at 4:07 pm
Comments posted to this topic are about the item What's Downtime?
April 9, 2018 at 10:28 am
Steve Yegge's (in)famous Google Platforms Rant from 2011 is a fascinating read, but I walked away with a number of important observations from it, one of which was:
monitoring and QA are the same thing. You'd never think so until you try doing a big SOA. But when your service says "oh yes, I'm fine", it may well be the case that the only thing still functioning in the server is the little component that knows how to say "I'm fine, roger roger, over and out" in a cheery droid voice. In order to tell whether the service is actually responding, you have to make individual calls. The problem continues recursively until your monitoring is doing comprehensive semantics checking of your entire range of services and data, at which point it's indistinguishable from automated QA. So they're a continuum.
April 10, 2018 at 7:28 am
With a scale-out architecture which allows for individual servers to fail over or be taken offline for maintenance, the questions becomes: At what point of degradation is the application or database "down"? I guess one definition would be when an end user or process needs access to a resource but it isn't available within the terms of an agreed upon SLA. But thanks to technologies like AOG and Azure, that scenario is (or should be) more rare.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
April 22, 2018 at 6:17 pm
What about if a system or service is very sloooooow, but still returns the correct result in the end? Is that up or down? A customer may have given up, assuming it to be down when, had they waited longer, it would have eventually worked.
April 24, 2018 at 8:35 am
robinwilson - Sunday, April 22, 2018 6:17 PMWhat about if a system or service is very sloooooow, but still returns the correct result in the end? Is that up or down? A customer may have given up, assuming it to be down when, had they waited longer, it would have eventually worked.
I'd count this as downtime. If the database is unable to complete a query within some multiple of what's expected, it's essentially down.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply