Monitoring Uptime/Availability

  • Hello,

    I'm looking for some advice on setting up monitoring solution to track up-time of a medium sized SQL estate (60-70 SQL Servers, various editions & HW Specs). Not looking to track performance, simply up-time/availability. Mostly stand alone instances, some clusters and some AGs. Basically this is to satisfy a monthly reporting requirement.

    Thanks in advance!

  • If that's all you want, set up an Agent job that polls the servers to see if they're online or not. It won't be much work. You can store the output in a table and off you go.

    Most people need monitoring and alerting for everything. Performance, sure, but more importantly, errors, state, drive space, backups, deadlocks, blocking, etc. When they finally realize that all that is in fact needed, I'd recommend looking at Redgate SQL Monitor.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • To add to Grant's post, Idera and Quest both have good monitoring tools too.

    But if all you care about is uptime, how frequently are you needing to check that for a monthly reporting requirement?  Do you need to know the second the instance goes down or the minute or the hour or the day or the week and so on...  Different tools and methods will offer different levels of granularity.  A SQL Agent job that fires every second to check if an instance is online is going to get messy quickly (table will grow very fast and in the event the 60+ instances take too long to reply, you may have the job firing more than once at a time).

    If you are looking for free tools, cacti has modules that allow you to monitor SQL Server, not specifically uptime, but if it stops monitoring one of the metrics (or something that must be non-zero comes back with zero such as Memory), then your instance is offline. https://docs.cacti.net/usertemplate:host:microsoft:sqlserver

    Cacti allows you to monitor a LOT of things and is highly customizable and "free".  I put that in quotes because if you DO run into a snag with it (I have where it suddenly told me I had 2 TB of memory in a server with 128 GB of memory so all alerts configured for memory being over 95% never got triggered), you are relying on community support.  Idera, Quest, and RedGate all have official support and they are really good.  I have worked briefly with all 3 tools (demo versions of 2 and paid version of 1) and working with their support teams has been a breeze for all 3 companies.

     

    My opinion is that you should be looking at getting a monitoring tool.  RedGate has built in graphs in their reports section for telling you uptime on your instances and it is all baked in.  No customizing or tweaking required.  Offhand, I do not remember if Idera or Quest offer that and Cacti does not.  But it depends on your budget too.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Simplest, make a DBA database, on startup, have a proc that writes the time to a table. use a job that writes to a second column of the latest entry in the table, the current time every hour. When you stop getting the updates to col2, the instance is down. Or agent is down, but you're not looking for exactness here. Set a restart and alert on the Agent service.

    A query can give you the uptime for each month by lagging 1 between the rows for downtime.

    As above, it's not worth building much more than this for anything beyond uptime. Once you try to do anything with performance, tracking versions, etc., buy a tool. I work for Redgate, and we have a good one, but buy anything. Watching people do MRTG or some custom solution is usually a time sink not worth the ROI.

  • Thanks Grant, there's already some homegrown monitoring in place for basic metrics like agent job failures and disk space etc. Unfortunately, and as usual, there is no budget for purchasing SW.

    Love the SQL Server Execution Plans book(s) too, really helped me up a few jobs back when I actually had performance issues to work on, unfortunately there are none interesting problems in this environment. Might be time to look for a new role.

  • Thanks Steve, simple and effective. Makes me wonder why I didn't think of it first. I was originally thinking of going with DBATools polling services and logging to tables etc. Yours is a much better idea. Love the site too, it's helped me a lot over the years. Long live SSC!

  • Good luck on it.

    One point, document the time you spend on building & maintaining monitoring. Time absolutely is money. It's time you could be spending elsewhere on much more important things.

    Happy the book was helpful. Hopefully you're using the 3rd edition. It fixes a lot of problems with earlier editions and adds tons of material.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply