All Things Monitoring

Question

All Things Monitoring

Chris Kitchen

Hall of Fame

Points: 3090
More actions
March 14, 2011 at 4:41 am

#233837

Hi,
I'm looking to put together a custom SQL Server monitoring solution using a combination of SQL Server scripts which would run as part of SQL Server scheduled jobs and Performance Monitor counters which would store the perfmon data directly in SQL Server.
A "Master" server (Windows 2008 R2) hosting 1 SQL Server 2008 R2 instance would be used to pull and store all the information from the servers in the Production domain.
The items listed under the headings "MEMORY", "CPU", "DISK" and "MISC" will be collected via Performance Monitor. The items listed under the heading "ADDITIONAL OS/SQL MONITORING" would be developed using custom SQL scripts.
My questions are:
1. Are there any additional items you think I should be collecting?
2. Logging the perform data directly in SQL Server requires a "System DSN", therefore the "Master" server would have 1 "System DSN" entry for each server it was collecting information from. To begin with there are 50 servers to monitor, could so many "System DSN" entries cause issues?
3. If you don't agree with the use of Performance Monitor/"System DSN", is there a better way?
We want the monitoring to be as light weight as possible so the counters are limited. Here is the information which will be collected...
######
MEMORY
######
Available Mbytes
Pages/sec
Paging File - % Usage
SQLServer:Memory Manager - Memory Grants Pending
SQLServer:Memory Manager: Total Server Memory (KB)
SQLServer:Memory Manager: Target Server Memory (KB)
Server Buffer Manager Object: Cache Size (pages)
SQL Server Buffer Mgr: Page Life Expectancy
SQL Server Buffer Mgr: Lazy Writes/Sec
SQL Server Buffer Mgr: Checkpoint Pages/Sec
########
DISK
########
Avg.Disk sec/Read
Avg.Disk sec/Write
Disk Reads/sec
Disk Writes/sec
Avg.Disk Queue Length
###
CPU
###
Processor - % Processor Time
System - Processor Queue Length
####
MISC
####
SQLServer:Batch Statistics - Compilations/sec and Recompilations/sec
SQLServer:General Statistics - User Connections
Page Splits (Access Methods object: Page Splits/sec)
SQLServer: SQL Statistics: Batch Requests/Sec
SQLServer: Databases: Log Flushes/sec counter
SQL Batch/sec
System Context Switches/sec
SQL Server Locks Object: Number of Deadlocks/sec
SQL Server Locks Object: Average Wait Time (ms)
SQL Server Access Methods Object: Full Scans/sec
SQL Server Backup Device Object: Device Throughput Bytes/sec
Latch Waits/sec
############################################################
ADDITIONAL OS/SQL MONITORING
############################################################
Database backup information (Success/Failed)
Block processes
Lock processes
Cluster information (where required)
Host server configuration information
SQL Server instance configuration information
Database configuration information
Table information
Index information
Host drive information
Host drive fragmentation
SQL Server security information
Scheduled job information
Success/Failed login information
Warning/Critial Windows Event Viewer information raised in the last 1/24hrs
SQL Server Mirroring monitoring
Purchasing a 3rd party product is not an option.
Thanks in advance.
www.sqlAssociates.co.uk

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 1

It looks like you've got the basics well covered. One thing I would add for sure is the ability to capture wait states on the servers. You're going to want to know when and how things are slowing down.

Another thing to consider is capturing trace data so that you can see how individual queries are performing and so you can aggregate performance over time. That's a lot tougher than capturing PerfMon & DMV stats. You'll have to plan for the amount of data you'll be collecting and plan for a mechanism for aggregating that data AND for what you want to do with it long term. It's a lot of work, but it also has a lot of payoff.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Chris Kitchen Hall of Fame Points: 3090 More actions · Answer 2

Hi Grant,

Thanks for your comments, much appreciated.

With regards to looking at what code the users are executing and SQL Server query plans I was planning on using something like:

select qs.execution_count

, qs.total_elapsed_time, qs.last_elapsed_time

, qs.min_elapsed_time, qs.max_elapsed_time

, substring(st.text, (qs.statement_start_offset/2)+1

, ((case qs.statement_end_offset

when -1 then datalength(st.text)

else qs.statement_end_offset

end - qs.statement_start_offset)/2) + 1) as statement_text

, qp.query_plan

from sys.dm_exec_query_stats as qs

cross apply sys.dm_exec_sql_text(qs.sql_handle) as st

cross apply sys.dm_exec_query_plan (qs.plan_handle) as qp

Given the complexities you mentioned around the trace flag configurations, do you think this is a valid alternative?

Also, do you have any thought/experiences around the use of a "Master" server with many "System DSN" connections to the servers it monitors? I've ran several tests and cannot see any issues from a network or host perspective.

Many thanks,

Chris

www.sqlAssociates.co.uk

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 3

Using the DMVs in that manner is fine. You just need to know, and be aware, that they only reflect what is in cache when you run them. If stuff has left the cache, it won't be in the query.

Most monitoring processes work from a central server that connects to multiple other servers, so it must be a good model. I've had a lot of experience using the Central Management Server in SQL Server to manage Agent jobs and Policy Based Management. It works well. I don' see why it wouldn't work with monitoring too.

BTW, have you looked at Microsofts Data Collector and Management Data Warehouse? They have some limitations (2008 and above being one of the biggest), but it might get you where you need to go faster.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Chris Kitchen Hall of Fame Points: 3090 More actions · Answer 4

Thanks again for your help, it's greatly appreciated. I'm just having a play around with the "Data Collection" tool now and from what I have seen I will definitely be including it in my documentation as an option to consider.

www.sqlAssociates.co.uk

Lowell SSC Guru Points: 323518 More actions · Answer 5

I've recently been spending a lot of time studying and scripting repsonses to Brent Ozars SQL Blitz! script.

It identifies a great set of things to look for, and has been a lot of fun for me to create my own responses to any items identified in his script.

This is the way i interpret his script 's testing points:

--http://www.BrentOzar.com/go/blitz
1.Identifies any database with no backups at all.
2.Identifies any backups that are on the same disk, maybe should be copied to another server.
3.Identifies any databases that are in FULL recovery, but no Transaction Log backups.
4.Identifies the oldest MSDB entry; if it's old, the MSDB backup history might need to be cleaned up.
5.Identifies the Last Time DBCC CHECKDB (EachDatabaseName) was executed(if ever).
6.Identifies any and all jobs on the server and their statuses.
7.Identifies any other sysadmins so you can decide if they should have their privileges or not.
8.Identifies the servers advanced options for reference.
9.identifies any procedures which are set to start when the server starts
A.Identifies if SQL 2008+ auditing_status info.
B.Identifies various endpoints, resource governor settings,server triggers and linked servers.
C.Tests id database mail is setup (assuming a default profile exists)
D.sets up a default set of notifications for problems.
E.Identifies any user tables that should exist like spt_values,
F.Identifies all databases for reference.
G.Identifies any encrypted databases for reference, so you can better plan disaster recovery.
H.Identifies any databases that are using features that are Enterprise Edition only
I.Identifies location of Data Files, so you can decide if they need to be moved off of the OS's C:\ drive or not.
J. identifies database triggers for troubleshooting purposes.
K. Identifies a baseline for wait time performance.
L. Identifies duplicate indexes
M. Identifies Index fragmentation.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Chris Kitchen Hall of Fame Points: 3090 More actions · Answer 6

Thanks Lowell, I found the list extremely useful.

www.sqlAssociates.co.uk