April 14, 2008 at 5:59 pm
This is definitely a good starting point for what should be monitored. The majority of this list can be automated relatively easily, which generally makes more sense than requiring an actual signature.
---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/
April 14, 2008 at 11:50 pm
Nice article. A Good list to carry with. 🙂
April 15, 2008 at 2:52 am
April 15, 2008 at 3:14 am
Thabks for a good article. In my previous company, I used to handle near about 300 db servers with data aggregate to 13+TB. I was assigned o build the production dba team. During the team build, the projects used to grow further so everything was difficult initially.
We used a 3 tier monitoring system, with a third part tool moniotoring the application response, round trip time, uptime and numerous other things. Any fall in the response time used to send us emails and pages. The next tier was a tool that used to check the server side - cpu, disk space, port checkings etc. The next level was our custom written wmi, SQL DMO scripts. DMO really helped in doing almost all types of audits. I would recommend using SMO (DMO being deprecated in 2005) and write our own custom scripts to check virtually everything.
April 15, 2008 at 7:30 am
Dear Richard,
Tks, I will try remedy given by you.
Shall contact you in case of any query.
Regards,
Ritesh Mehta
September 5, 2008 at 12:00 am
Hi, sorry I rated this with one star by mistake, meant to give it 5 as is very useful article, but clicked on first star to drag to five, which isn't how it works....doh
September 5, 2008 at 3:47 am
rubes (4/14/2008)
Nice article. I would just like to point out that for those of us that have numerous servers, automation of the checklist is critical. If you're dealing with only one server, manually checking these things does not take a lot of time. But imagine checking job failures or drive space on 50 sql servers. We get paid too much to perform these menial tasks by hand. There are many 3rd party tools out there that do this for us. It's also pretty easy to write your own scripts and sql jobs... many starter scripts could probably be found on this forum.One benefit of automating your checklist is time. The other benefit is proactive in nature. If a drive is out of space because tempdb exploded in size over night, it's better to get notified via email at 4 am. Sure, the cell phone disturbs your precious sleep, but you now have 4 hours to fix the situation before business opens at 8 am and people start screaming.
Also, if there are numerous DBAs on your team, automating these checks helps greatly with standardization.
A good article and I totally agree with Rubes regarding the number of servers in your environment. My daily check list is very similar to the one in the article but this has become a weekly check list due to the time its takes & number of sql servers. I now have time allocated to automating the process and will be hunting SQLServerCentral for automation topics. Automation it crucial in larger environments.
September 5, 2008 at 4:10 am
A great article. I have a handful of servers that are monitored daily and in addition to being proactive to failed jobs, we also monitor for those instances when jobs haven't run. (When another db admin has disabled a job in error or sql service stopped and the lan group didn't act/notify the dba..hmmmm)
I am lucky enough that we only have about 10 servers to monitor but on 2 servers there are several internal (non-maintenance) daily/weekly/monthly jobs that also need to be monitored and are highly critical for the production world.
I have this process down to approx. 10min/day max to check jobs (which includes the "paper trail" we need to keep our auditors satisfied).. This is as simple as knowing how many jobs should run and running a sql stmt which tells me the # of rows (representing jobs/steps) and their outcome.
Granted, this is quite a menial process of checking but when an issue arises, it generally requires more knowledge/skills to resolve that the norm.. plus I prefer early mornings for this as it also allows that quite time to actually multi-task and get some productive work accomplished before the real world awakens..
September 5, 2008 at 7:35 am
On the Oracle side of life, there is Oracle Enterprise Manager (OEM). Which can detect and act (e-mail, page) upon found problems. I'll stay tuned to see what SQL server has (since our group now supports both).
A good general tool is BigBrother. Which is/was free open source software and is also available in commercial form from Quest Software (who also make Toad and other database tools). BB is a great tool that our group has set up to monitor all sorts of things. We have a small group that manages many systems worldwide in a manufacturing setting (as in 24X7). The screaming in our world doesn't wait for 8AM. All of our critical BB messages go to a custom e-mail account which is synced to the group on-call phone (bat phone 😉 that we pass around (weekly schedule). We also use Sharepoint to host all our system log files where we keep notes of every change made. And to cap it all off, we have a running forum list of every incident worth mentioning. We probably monitor about 20 different systems, each with their own custom scripts of applications and stuff they're monitoring.
It is a bit of a pain to set up and make all this work well, but the payoff - we don't have a morning check list at all, all our notes, dates, times are online for anyone in the group to review via the web. Paper? Binder? That's so last century 😉
September 5, 2008 at 8:41 am
I agree with almost everything said in this thread. I just want to add, as an old school DBA (20 years Oracle and SQL Server from the early Sybase days) that the one thing I would add is that you should run your check list at least 3 times a day. If you are good at it, it should not take more than 10 minutes (in a smaller environment). In a big environment you will HAVE to use tools and scripts. I am currently in a very big environment (over 800 Production databases and over 3000 Test and Dev databases - 2/3 Oracle, 1/3 SQL Server) and you cannot manage this without a mixture of tools and scripts. I cannot agree more with the person that mentioned the 3 tiered monitoring environment. For those of you that do not know how to handle scripting in Windows, use Perl. In our current environment, I have a mailbox set up for Oracle and one for SQL Server. It gets 1000s of emails per day from these scripts and tools. These emails are in 3 basic buckets i.e. Informational (Size of objects/disks, Status of databases/objects i.e. SS Recovery Model), Warning (Disk has 10% free space) and Error (Cannot connect to DB). Errors automatically log a ticket. These scripts run on specific timed intervals i.e. Information is once a day or once a week. The others will fire every 30 minutes if the first one was not successful etc, etc, etc.
So, to stop the rambling - Run through your check list at least 3 times a day - OR - script your monitoring to continuously monitor and alert you.
BTW - I fully endorse the new Quest Spotlight for SQL Server Enterprise. It even monitors your OS and keeps a history that you can play back later. My only con for it is that it does not use SNMP so I cannot plug it into the monitoring software the rest of the company uses.
September 5, 2008 at 8:48 am
dawidjordaan - Only 3?
We have two DBA's, a backup DBA and two others who have rights to check as needed. In one of my roles, Data Security, I have access to all of the development environments and read in all of production. I also am checking on metadata and status of some of the data in certain databases beyond disk space etc.
It is critical that we use a checklist. I need to update mine, thanks for the excellent article and the reminder.
Miles...
Not all gray hairs are Dinosaurs!
September 5, 2008 at 10:54 am
Hi,
If are REALLY a DBA you automate all this stuff. Use alerts to tell you when your backups fail or when a server hits a system fault. I send mine to mainframe.
Use free tools that you can automate with batch or command files to get info on the disk space of the servers. I use srvinfo.exe to scan the entire network for SQL Servers and get the free space on each. I have this info in my inbox every morning.
That gives me plenty of time to catch up on the news while drinking my first cup of tea.
September 6, 2008 at 1:03 pm
All, I understand that checking backups, space and error can be tedious if you have to do that for a lot of servers. I designed simple tools using DTS packages in SQL 2000 to monitor 3 things out this checklist. Below are the articles. I wrote about how you can set up these tools in your environment.
I am in process of rewriting them using SQL 2005 SSIS and will post when its ready.
Backup SLA Report
http://www.sql-server-performance.com/articles/dba/monitor_with_dts_p1.aspx
Errorlog Monitoring of SQL Servers
http://www.sql-server-performance.com/articles/dba/monitor_with_dts2_p1.aspx
Space Monitoring of SQL Servers
http://www.sql-server-performance.com/articles/dba/monitor_with_dts3_p1.aspx
September 8, 2008 at 6:39 am
Thank you for taking time to read my article. I appreciate the insight of each poster. I posted my initial comments on 4/14/2008. However, let me reiterate that I wrote this article to challenge those who do not yet have a checklist to develop one. To the shops that do have a checklist, I desired to offer my ideas to improve their shops even more. I also appreciate your comments, so that I may improve my process.
Thanks,
Bill Richards, MCSE, MCDBA
Senior Database Analyst
September 8, 2008 at 3:44 pm
Automate this stuff.... why manually look... but do not have you database server watch itself. That is like an internal investigation. Not much checks and balances there.
You need an external computer that your helpdesk or console operators knows is up and running.
It should be able to watch the disk space, jobs, performance, etc... and most importantly contact you via email, page, etc...
I learned a long long long time ago, that a computer does a whole lot better job in watching your systems then you do manually.
Some example of monitors that I haved: Argent, kshost (simple, but it worked), and my own developed in house. There are a lot of applications out there.
SQL Silvey
Viewing 15 posts - 16 through 30 (of 54 total)
You must be logged in to reply to this topic. Login to reply