August 29, 2014 at 2:11 pm
I think logging to a central repository in a meaningful way or having a central process that scans log is a good thing.
I have had very bad experience with Microsoft MOM/SCOM with outages and it simply missing a bunch of stuff (e.g. not reporting on blocking for over 15 minutes).
I have had very good experience with Nagios and have heard good things about SPLUNK.
Still there are two issues that no one has brought up yet.
One is what is monitoring the monitoring systems?
The second is how do we analytically tie together disparate errors in a meaningful way to indicate some causal connection? This is particularly a challenge in a large enterprise. How do we correlate things that are actually related and how do we avoid spurious correlation.
August 29, 2014 at 6:05 pm
But if you are going to the Windows Event viewer, please make you own log and don't just use the Application log.
We had an app service that had deprecated using the SQL Authentication in favor of Windows Authentication. They hadn't completely removed all the SQL Authentication code. So we had 10K+ errors of "user xxxx can't login in" the app log. It made the app log practically useless for anything else. :crying:
----------------
Jim P.
A little bit of this and a little byte of that can cause bloatware.
August 29, 2014 at 7:36 pm
After years of working in software development and being the mouthpiece for the customers using the software, I have learned to log everything. I can't count the amount of times when something breaks and it becomes a pain for any team, operations or not, to dig and find the issue.
I put hooks and logs in everything relevant. I also ensure they are clear and understandable even if documentation is lost. I want someone to read them even it's their first day on the job.
Viewing 3 posts - 16 through 17 (of 17 total)
You must be logged in to reply to this topic. Login to reply