All Things Monitoring

  • Hi,

    I'm looking to put together a custom SQL Server monitoring solution using a combination of SQL Server scripts which would run as part of SQL Server scheduled jobs and Performance Monitor counters which would store the perfmon data directly in SQL Server.

    A "Master" server (Windows 2008 R2) hosting 1 SQL Server 2008 R2 instance would be used to pull and store all the information from the servers in the Production domain.

    The items listed under the headings "MEMORY", "CPU", "DISK" and "MISC" will be collected via Performance Monitor. The items listed under the heading "ADDITIONAL OS/SQL MONITORING" would be developed using custom SQL scripts.

    My questions are:

    1. Are there any additional items you think I should be collecting?

    2. Logging the perform data directly in SQL Server requires a "System DSN", therefore the "Master" server would have 1 "System DSN" entry for each server it was collecting information from. To begin with there are 50 servers to monitor, could so many "System DSN" entries cause issues?

    3. If you don't agree with the use of Performance Monitor/"System DSN", is there a better way?

    We want the monitoring to be as light weight as possible so the counters are limited. Here is the information which will be collected...

    ######

    MEMORY

    ######

    Available Mbytes

    Pages/sec

    Paging File - % Usage

    SQLServer:Memory Manager - Memory Grants Pending

    SQLServer:Memory Manager: Total Server Memory (KB)

    SQLServer:Memory Manager: Target Server Memory (KB)

    Server Buffer Manager Object: Cache Size (pages)

    SQL Server Buffer Mgr: Page Life Expectancy

    SQL Server Buffer Mgr: Lazy Writes/Sec

    SQL Server Buffer Mgr: Checkpoint Pages/Sec

    ########

    DISK

    ########

    Avg.Disk sec/Read

    Avg.Disk sec/Write

    Disk Reads/sec

    Disk Writes/sec

    Avg.Disk Queue Length

    ###

    CPU

    ###

    Processor - % Processor Time

    System - Processor Queue Length

    ####

    MISC

    ####

    SQLServer:Batch Statistics - Compilations/sec and Recompilations/sec

    SQLServer:General Statistics - User Connections

    Page Splits (Access Methods object: Page Splits/sec)

    SQLServer: SQL Statistics: Batch Requests/Sec

    SQLServer: Databases: Log Flushes/sec counter

    SQL Batch/sec

    System Context Switches/sec

    SQL Server Locks Object: Number of Deadlocks/sec

    SQL Server Locks Object: Average Wait Time (ms)

    SQL Server Access Methods Object: Full Scans/sec

    SQL Server Backup Device Object: Device Throughput Bytes/sec

    Latch Waits/sec

    ############################################################

    ADDITIONAL OS/SQL MONITORING

    ############################################################

    Database backup information (Success/Failed)

    Block processes

    Lock processes

    Cluster information (where required)

    Host server configuration information

    SQL Server instance configuration information

    Database configuration information

    Table information

    Index information

    Host drive information

    Host drive fragmentation

    SQL Server security information

    Scheduled job information

    Success/Failed login information

    Warning/Critial Windows Event Viewer information raised in the last 1/24hrs

    SQL Server Mirroring monitoring

    Purchasing a 3rd party product is not an option.

    Thanks in advance.

    www.sqlAssociates.co.uk

  • It looks like you've got the basics well covered. One thing I would add for sure is the ability to capture wait states on the servers. You're going to want to know when and how things are slowing down.

    Another thing to consider is capturing trace data so that you can see how individual queries are performing and so you can aggregate performance over time. That's a lot tougher than capturing PerfMon & DMV stats. You'll have to plan for the amount of data you'll be collecting and plan for a mechanism for aggregating that data AND for what you want to do with it long term. It's a lot of work, but it also has a lot of payoff.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hi Grant,

    Thanks for your comments, much appreciated.

    With regards to looking at what code the users are executing and SQL Server query plans I was planning on using something like:

    select qs.execution_count

    , qs.total_elapsed_time, qs.last_elapsed_time

    , qs.min_elapsed_time, qs.max_elapsed_time

    , substring(st.text, (qs.statement_start_offset/2)+1

    , ((case qs.statement_end_offset

    when -1 then datalength(st.text)

    else qs.statement_end_offset

    end - qs.statement_start_offset)/2) + 1) as statement_text

    , qp.query_plan

    from sys.dm_exec_query_stats as qs

    cross apply sys.dm_exec_sql_text(qs.sql_handle) as st

    cross apply sys.dm_exec_query_plan (qs.plan_handle) as qp

    Given the complexities you mentioned around the trace flag configurations, do you think this is a valid alternative?

    Also, do you have any thought/experiences around the use of a "Master" server with many "System DSN" connections to the servers it monitors? I've ran several tests and cannot see any issues from a network or host perspective.

    Many thanks,

    Chris

    www.sqlAssociates.co.uk

  • Using the DMVs in that manner is fine. You just need to know, and be aware, that they only reflect what is in cache when you run them. If stuff has left the cache, it won't be in the query.

    Most monitoring processes work from a central server that connects to multiple other servers, so it must be a good model. I've had a lot of experience using the Central Management Server in SQL Server to manage Agent jobs and Policy Based Management. It works well. I don' see why it wouldn't work with monitoring too.

    BTW, have you looked at Microsofts Data Collector and Management Data Warehouse? They have some limitations (2008 and above being one of the biggest), but it might get you where you need to go faster.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks again for your help, it's greatly appreciated. I'm just having a play around with the "Data Collection" tool now and from what I have seen I will definitely be including it in my documentation as an option to consider.

    www.sqlAssociates.co.uk

  • I've recently been spending a lot of time studying and scripting repsonses to Brent Ozars SQL Blitz! script.

    It identifies a great set of things to look for, and has been a lot of fun for me to create my own responses to any items identified in his script.

    This is the way i interpret his script 's testing points:

    --http://www.BrentOzar.com/go/blitz

    1.Identifies any database with no backups at all.

    2.Identifies any backups that are on the same disk, maybe should be copied to another server.

    3.Identifies any databases that are in FULL recovery, but no Transaction Log backups.

    4.Identifies the oldest MSDB entry; if it's old, the MSDB backup history might need to be cleaned up.

    5.Identifies the Last Time DBCC CHECKDB (EachDatabaseName) was executed(if ever).

    6.Identifies any and all jobs on the server and their statuses.

    7.Identifies any other sysadmins so you can decide if they should have their privileges or not.

    8.Identifies the servers advanced options for reference.

    9.identifies any procedures which are set to start when the server starts

    A.Identifies if SQL 2008+ auditing_status info.

    B.Identifies various endpoints, resource governor settings,server triggers and linked servers.

    C.Tests id database mail is setup (assuming a default profile exists)

    D.sets up a default set of notifications for problems.

    E.Identifies any user tables that should exist like spt_values,

    F.Identifies all databases for reference.

    G.Identifies any encrypted databases for reference, so you can better plan disaster recovery.

    H.Identifies any databases that are using features that are Enterprise Edition only

    I.Identifies location of Data Files, so you can decide if they need to be moved off of the OS's C:\ drive or not.

    J. identifies database triggers for troubleshooting purposes.

    K. Identifies a baseline for wait time performance.

    L. Identifies duplicate indexes

    M. Identifies Index fragmentation.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks Lowell, I found the list extremely useful.

    www.sqlAssociates.co.uk

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply