February 8, 2010 at 2:40 pm
Hello all,
I ran into an issue today where my SQL service was still "running", but was unresponsive...I ended up recycling the box, and whatever was causing the hang up, was resolved...our preliminary research is leading towards failing hardware...
anyway, this lead to an idea that we should proactively check our databases for responsiveness...
I would like to create a job that will alert us if a database becomes unresponsive, then I can be alerted prior to finding out after a user asked us what the problem was....several hours after the initial problem occured.
The code below is my preliminary thought....
I am wondering of any of you have a better solution...or can confirm that DBCC would accurately test a databases responsiveness....not just query system tables for data.
Also, I'm not sure the best way to "try again" if the initial attempt to connect is unsuccessfull. Aside from initiating a counter and entering a loop to test connectivity until it's successful, or reaches n attempts.
it's hard for me to test now....since everything is responding at this moment...
Any ideas/feedback?
Thank you all!
/*
Need to Create a job that will connect to all the databases on the server.
If unable to connect, try again.
If still unable to connect, raise an error.
*/
EXEC master..sp_MSForeachdb
'
USE [?]
SET NOCOUNT ON
DECLARE @msg VARCHAR(100)
DECLARE @i INT
DECLARE @n INT
SET @i = 1 -- count of current connection attempts
SET @n = 2 -- number of attempts to connect before RAISERROR
/*** this excludes system dbs ***/
IF DB_ID(''?'') > 4
BEGIN
WHILE @i < @n
BEGIN
/*** this is only used as a way to connect to the db***/
/*** could be replaced with any other SQL that is deemed better.***/
DBCC CHECKDB(''?'') WITH NO_INFOMSGS, ESTIMATEONLY, PHYSICAL_ONLY
IF @@error <> 0
BEGIN
SET @msg = ''Unable to Connect :: '' + @@SERVERNAME + ''/'' + ''?''
RAISERROR(@msg, 10, 1)
END
ELSE
BREAK
SET @i = @i + 1
END
END
'
February 9, 2010 at 2:42 pm
It might be easier to just query sys.databases state and state_desc columns to see the current state of the database. If you're dead set on touching the db you could use db_name()
to return the name of the current database which you've set with the USE [?]
in your example.
A potential problem I see is that, at least in my experience, if SQL Server is unresponsive, it isn't reliable for self-testing or alerting. However, if a db goes offline for some reason and SQL Server is still otherwise functioning normally it should be fine.
February 9, 2010 at 3:33 pm
looks like a good idea to me.
I agree on the 'who polices the police' point made by SQLPirate.
This shouldn't run on the agent of each system because the job depends on system availability: it can't alert you of its own failure.
We used to have a fairly elaborate system of agent jobs pushing to a central DB and updating a table.
a second job would poll that table and make sure everything was up and working.
It worked surprisingly well!
It could also run in a hub and spoke manner with data pulls.
Two small things: I think checkdb fails on read only db's and I think it would be good to include your system databases.
you should follow up and let us know what you end up doing.
I'd be curious to see how it turns out.
Craig Outcalt
February 9, 2010 at 3:52 pm
Thank you for taking the time to respond! I really do appreciate it.
I like using the db_name() instead of dbcc...runs so much faster when I print the results..instead of selecting the results of dbcc.
Problem/question.
1. I need to trap an error if a db is not responding...will db_name() throw an error in this situation?
i'm not sure how to replicate this situation to test it....that's why i'm throwing it out to this community...just in case someone out there happens to know...
When I do find a solution,I do plan on deploying on a separate server...which will reach out to the other instances in our environment...and poll for any db's that may be hung.
An I will absolutely share anything I learn with the community! That is how it continues to thrive and maintain it's value!
February 9, 2010 at 3:59 pm
You know...
I was just thinking....that I am way overthinking this solution...
I am hung up on ensuring I have a way to touch each db, and trap an error if it is hung/not responding.
I believe my "USE [?]" that is exexuted in the sp_MSForeachdb, will effectively test a connection to each db that is online
What I am unable to test...is how to trap an error on a particular db connection, and continue to the next.
Here is the latest version of SQL...
/*
Need to Create a job that will connect to all the databases on the server.
If unable to connect, try again.
If still unable to connect, raise an error.
*/
EXEC master..sp_MSForeachdb
'
SET NOCOUNT ON
DECLARE @msg VARCHAR(100)
DECLARE @i INT
DECLARE @n INT
SET @i = 1 -- count of current connection attempts
SET @n = 2 -- number of attempts to connect before RAISERROR
/*** this excludes system dbs ***/
IF DB_ID(''?'') > 4
BEGIN
WHILE @i < @n
BEGIN
USE [?]
IF @@error = 0
BEGIN
BREAK
END
ELSE
BEGIN
SET @msg = ''Unable to Connect :: '' + @@SERVERNAME + ''/'' + ''?''
RAISERROR(@msg, 10, 1)
END
SET @i = @i + 1
END
END
'
February 9, 2010 at 4:01 pm
I believe my "USE [?]" that is exexuted in the sp_MSForeachdb, will effectively test a connection to each db that is online
I was just about to respond that the trapping would probably go with the "USE [?]". Not sure why I didn't think of that earlier :hehe:
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply