A user calls to say the app or server is slow today. Here’s a quick summary of how I get started. It depends on monitoring where you can compare baselines to recent activity, and I include links to all of that code I use.
The initial question is too vague. You can’t blame this individual user because this is how 95% of issues make it to you. The initial follow-up will be the following questions I ask the user along with the corresponding question your boss is asking you:
- Can anyone else duplicate your issue?
- Is this the user’s machine or a server issue? Not a 100% conclusive answer, but it’s a start.
- Is everything running slow or just one process or screen?
- Is the whole server running slow or just one stored procedure?
- Were any changes made to the system recently? Depending on the user, this may be a question for someone else.
- Did we cause it by making a change or did this just come up?
However, this is enough for me to do a very quick overall health check while I wait for those answers. It seems like a lot at first glance, but it’s practically automated with the scripts I have on hand. This is only opening 5 pre-written scripts, running them, and peeking at the results.
Quick Health:
- Look at historical wait stats for the last two hours and the same time window both 1 and 7 days ago. Documented in the Wait Stats post.
- Look at memory and CPU contention for the same times as wait stats. You’re specifically looking for high CPU usage, PLE drops, and changes to target memory. Documented in the OS Performance Counters post.
- Look for blocking in the last hour, even though it would typically show up in wait stats. You can compare this to previous days if needed, Documented in the Blocking post
- Look at the large queries that completed in the last two hours along with the queries currently running to see if anything pops out. This is just a quick glance looking for glaring issues, and you can look for details after getting answers to the follow-up questions. Unfortunately, this information isn’t as cut-and-dry as the rest, and it’s going to rely on you being a little familiar with what normal is on this server. Completed queries are documented in the Tracing Creation and Reading posts, and currently running queries are documented in the Running Processes post.
- Look at the queries recompiled recently to see if a new execution plan is taking excessive resources. Documented in the Recently Recompiled Resource Hogs post.
So, where do you go from here? Yeah, you get the typical “it depends” answer from me. The information you just looked at will normally comes up with something that doesn’t look right and needs to be looked into further. However, there are so many directions this can go from here that a single document can’t cover it. The purpose of this post is simply to answer the question “Where do I start looking with the most common question a user throws at me”, nothing more. Also, I wanted a way to tie all of the random posts together so you could see them working together as the full solution.
Related articles
- Blocking – Capturing and Monitoring (simplesqlserver.com)
- Wait Stats – sys.dm_os_wait_stats – Monitoring and Using (simplesqlserver.com)
- Creation and Reading Traces (simplesqlserver.com)
- Recently Recompiled Resource Hogs (simplesqlserver.com)
- Running Processes (simplesqlserver.com)
Filed under: Key Posts, Monitoring, SQL Server, Troubleshooting Tagged: Blocking, Recompile, sys.dm_exec_query_stats, sys.dm_exec_requests, sys.dm_exec_sessions, sys.dm_os_performance_counters, sys.dm_os_wait_stats, Wait Stats