Critical Events to Capture in MOM 2005

Question

Post reply

Critical Events to Capture in MOM 2005

Dexter Plaras

Ten Centuries

Points: 1060
More actions
February 9, 2005 at 7:45 am

#92233

Hi Everyone,
I'm about to implement monitoring solution on our environment for 200+ SQL servers using MOM 2005. I've configured everything and it seems like there's just too many events that are captured and I'm afraid that the monitoring serve will be bogged down when I turn on monitoring for these 200+ servers. For those of you who has MOM implemented already for 100+ servers, would you please send me a list of events/performance rules that are not necessary to monitor? I've read a couple of things on this site about monitoring 100+ servers and how it bogs down the monitoring server.
Thanks,
Dex
PS. If you want, would you please email me at this email address? dplaras@hotmail.com. Thanks again!

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Dexter Plaras Ten Centuries Points: 1060 More actions · Answer 1

Dexter Plaras

Ten Centuries

Points: 1060

February 9, 2005 at 1:52 pm

#541461

No one has responded to this yet. Please help....

Rudyx - the Doctor SSC-Forever Points: 43695 More actions · Answer 2

Good day. We have had MOM 2005 implemented for around 2-3 months now (still working out the finer points of events and such). Our setup is a separate application server, a database cluster and a web server. Our site has 250+ servers being monitored at present and growing as the datacenter expands. We've implemented all of the default monitoring, aded the SQL and Exchange management packs and have not had any performance issues thus far.

Granted the SQL pack moniitors alot, but the monitoring is performed by local services. We are also in the process of fine tuning this monitoring as well since it is just a 'one size fits all' generalization'. Many of the threshholds for our site are not quite what we would want them to be. An example would be CPU utilizatin at 90% for 15 minutes or longer. We've scaled is back to 80% for SQL. Also, the inheritance of actions on events is kind of iffy as well.

Additionally you will find that the management pack by default does not generate events (then email you) on things a DBA would be interested in like failed backups and tasks. So you have a considerable amount of 'fine tuning' ahead of you in order to make optimum use of the monitoring.

If you need more informatino contact me directly at: rudy.komacsar@porterhealth.org

RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Dexter Plaras Ten Centuries Points: 1060 More actions · Answer 3

Thanks! That's good to know that there's no performance degration when monitoring 200+ SqL Servers. I implemented all of the default monitoring on two test servers and felt comfortable that it should capture 80% on what's going on those servers. I'll do the same with the CPU utilization to make it 80% instead of 90%. I've created alerts rules so that it'll email me in case there's any critical errors. Oh yah, got another question for you. Is your alert rule set so that it'll email you when severity is at least WARNING or CRITICAL?

Thank you so much for your input! I'll wait for your response of the alert rule thing.

Thanks again. Have a good day!

Rudyx - the Doctor SSC-Forever Points: 43695 More actions · Answer 4

We have instituted the defaults across the board initially. Now on SQL we've set overrides to initiate notifications at the 'Warning' level for starters. It seems that what MOM considers an 'error' or 'critical' (by default) is more geared to an NT admin as opposed to the DBA 'bread and butter'. By setting things down to 'warning' (overriding) I now get notifications all of the SQL things that matter (failed JOBs, failed backups, excessive table scans, etc). Beware of the database space monitoring ... it's one size fits all. It's not really a good thing since it uses the same criteria whethere a database is 2 Mb or 20 Gb in size and does not care about the recovery model either ! At present we've disabled it until we can re-write the function and are using our previous database space monitoring solution. Let me know how things work out.

RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."