November 15, 2011 at 3:44 pm
We configured and are receiving performance alerts so we can resolve problems proactively.
However, I'm worried that I've missed additional alerts that I would probably want want to add to the list of monitor. Googling only give me a "How-To" article but not "this may be a good starter list to experiment".
I realize that thee's going to be no single template that'll fit everyone; I'll be happy with some kind of information about considerations I need to put in for building new alerts, and hopefully cut down on the error & trial process of either getting too many alerts or totally missing a problem because I didn't have the right alert set up.
Thanks for any pointers.
February 15, 2012 at 9:26 am
Bumping in hopes that a kind soul may know about such list available or at least a good article that'll help with figuring out what alerts may be appropriate for my purposes.
My primary problem is more that it's not necessarily obvious to me what alerts I actually want. Should I, for example, choose Process Blocked over Lock Wait if I want to be sure there's not excessive deadlocking or would I end up missing more nuanced information? Those kind of answers would help enormously.
Thanks!
February 15, 2012 at 10:38 am
The types of performance alerts that you are mentioning are really not necessary until a problem occurs. If you monitor the error log regularly, you can pretty much see what is going on. If you are concerned about CPU and Memory, then go that route. However, regularly monitoring waits and locks seems trivial unless you have problems. Others may have things that they regularly monitor, but it is probably because of the design of their databases and applications accessing it.
Jared
CE - Microsoft
February 15, 2012 at 10:44 am
Hmm. Interesting.
When I read up on about alerts, it sounded like one of use cases was to be more proactive when the database starts to push the server (and its underlying hardware) to the limit and that was what I was interested in. I want to be sure that we know that server is having a bad day before our users calls us.
It almost sounds like the only alerts we really need is CPU & memory utilization. I would have thought we should be also concerned about locking dynamics and other factors.
February 15, 2012 at 10:52 am
I didn't say that the only alerts you needed are cpu and memory, what I said was that it wil depend on your entire architecture. Hopefully your OPs team is monitoring server activity and disk space and notifying you in the process. We have replication, so we proactively monitor replication latency. We also have reports of login errors, page outs, things that we need to be aware of. Waits and locks do not concern us until a query is performing poorly or a server begins to show "signs" of problems. You can drive yourself nuts monitoring every little thing, but monitoring takes resources both on the machine and from your time. What metrics are important to you? That's what you have to ask yourself. If you don't know that, then try asking yourself what your company's database is used for. What is typically going on? Lots of users? Lots of inserts/updates? Lots of reporting queries? What would the users complain about IF there were something to complain about?
Jared
CE - Microsoft
February 15, 2012 at 7:53 pm
I've not had to setup such alerts for years but I used to setup for 95% CPU (each cpu), 95% memory, blocks that lasted longer than a minute, and disk I/O although I can't remember for the life of me what I used to use for the I/O settings.
On systems that I knew had problems, I'd set the CPU and memory thresholds to about 80 % during the day so I could get the alert in time to make sure I was watching closely.
Nowadays, I just look for "crap code" that takes longer than a second to execute (for starters), high row counts, and things that make TempDB grow beyond what I think it should. The boys in OPs monitor for the other things because they own the machines and the SANs.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply