April 15, 2013 at 12:25 pm
The CEO of our company wants to know what all the different departments within IT (server operations, network operations, DBAs, etc.) do during a "production incident" to help diagnose the problem. As one of the DBAs, I've been asked to work on the database part of this request.
Realizing that this is a very wide and broad subject, what tools do you use, actions do you perform, etc. when "production incidents" occur in your environments? Some things I've identified, in no particular order, include:
Check disk space and database file space
Execute sp_who2 or a derivative thereof to see what other SQL processes / jobs may be running, causing blocking, etc.
Check Task Manager to see what other applications, services, etc. are running on the server, consumption of CPU and memory resources
Check SQL and Windows error logs
My plan is to develop / plagiarize a script or series of scripts that I can "pull the trigger on" at the beginning of an incident to gather all of the above information and anything else that I'm overlooking, get a one-stop place to evaluate the results and make a decision on how to proceed.
The majority of the SQL Server instances in our environment are SQL Server 2008 R2, of various editions, with a very few SQL 2005 instances hanging on, and just getting started in SQL Server 2012. Do any of you have suggestions on additional areas to focus? Scripts that you use? Blogs that have been read / written that I can refer to?
Thanks in advance for any suggestions, advice, etc.
Mike
April 15, 2013 at 12:30 pm
Mike Stuart (4/15/2013)
The CEO of our company wants to know what all the different departments within IT (server operations, network operations, DBAs, etc.) do during a "production incident" to help diagnose the problem. As one of the DBAs, I've been asked to work on the database part of this request.Realizing that this is a very wide and broad subject, what tools do you use, actions do you perform, etc. when "production incidents" occur in your environments? Some things I've identified, in no particular order, include:
Check disk space and database file space
Execute sp_who2 or a derivative thereof to see what other SQL processes / jobs may be running, causing blocking, etc.
Check Task Manager to see what other applications, services, etc. are running on the server, consumption of CPU and memory resources
Check SQL and Windows error logs
My plan is to develop / plagiarize a script or series of scripts that I can "pull the trigger on" at the beginning of an incident to gather all of the above information and anything else that I'm overlooking, get a one-stop place to evaluate the results and make a decision on how to proceed.
The majority of the SQL Server instances in our environment are SQL Server 2008 R2, of various editions, with a very few SQL 2005 instances hanging on, and just getting started in SQL Server 2012. Do any of you have suggestions on additional areas to focus? Scripts that you use? Blogs that have been read / written that I can refer to?
Thanks in advance for any suggestions, advice, etc.
Mike
This is like asking your CEO, "What do you when your child misbehaves?". It is so open ended that there is no answer. All that question does is generate more questions so you can understand what the actual question is. The answer to the above will be VASTLY different based on a number of factors (child's age, what was the behavior, etc...).
What defines a "production incident"? I would try to explain that the question is entirely too vague. That doesn't mean you can't have a sort of base standard operation in place.
This is usually something like:
1) Figure out the details of the issue.
2) Adjust the next X steps based on the answer to #1.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 15, 2013 at 12:38 pm
Yep - you're preaching to the choir.
April 15, 2013 at 12:52 pm
Step 1: Determine the severity of the issue.
Step 2a: If severity is small, fix the issue.
Step 2b: If severity is major, lock the door, take my phone off the hook and order pizza.
I know I am preaching to the choir but this is asking for details to a question that has no definition. Your idea of running a script isn't going to do any good if the production incident is a result of the network going down. It is pointless if the issue is because of a power failure. You could write reams and reams of a response and never come close to a complete answer.
I think about all you can do is offer the details yourself and explain how you will respond to those incidents. Then explain that anything not mentioned in your discussion will have to be evaluated at the time of the emergency. I know that the CEO just wants to make sure that there is some planning in place for these types of things. Honestly, most CEOs can't understand what you tell them about this level of technical stuff. I also have never met one that can't see the absurdity of this type of question when discussed with them rationally. At some level there has to be some trust that the team can best determine the appropriate response. The team are the experts and that is why they were hired right?
Good luck with this one. It is so hard to come up with a plan to the never ending question.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 15, 2013 at 12:54 pm
I don't know how big your shop is but the first action should be - appoint an incident manager if the incident is serious enough (you would have to have criteria by which that is judged)
Then its down to getting decent information on the problem and using that to guide you as to where to initially look for error information.
---------------------------------------------------------------------
April 15, 2013 at 1:01 pm
ugh; sounds more like a "Create a Blame-Thrower Tracking Device" to me;
I'd try to turn this into a list of the proactive things you've put into place to avoid disasters instead, if it can be turned before it gets any steam rolling.
[insert your kill it with fire meme of your choice]
Lowell
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply