April 27, 2012 at 6:25 am
Currently we have a web front end running IIS, and a SQL 2005 back end (SP3), we also have a 3rd server which is a reporting server (SSRS). Periodically we will get lockups/slowdowns where our web front end will basically just be a white screen. When I hit all three servers, CPU and Memory are low, they all seem to not be doing much of anything. I'm also running a trial version of SQL Diagnostics Manager but don't see that there are deadlocks, etc. When I restart SQL however, the issue goes away. I also have noticed that the number of user connections to the database are high when we get our slowdowns, which I think is the start of my "paper trail". For example, at say 11:21am our user connections are arund 65 and the site is running fine. At 11:22am, the site is appearing frozen or at best very sluggish and there are 140 user connections. User connections at 11:44am go back down to 74, and the site is again very responsive.
Any suggestions what I could try, to try and pinpoint what is causing these?
April 27, 2012 at 6:47 am
Deadlocks aren't going to be the problem. It's blocking that you need to look for. The process that is blocking others is likely to be the issue.
Also, look at the wait stats, sys.dm_os_wait_stats, to determine what is causing things to slow down on the system.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 27, 2012 at 7:15 am
Grant, going back to our last slowdown from yesterday, there was no blocking at that time. I do see some from the day before, but it was about 15 minutes into my slowdown so I think whatever else is causing it, caused the blocking at that time.
April 27, 2012 at 7:45 am
Is is possible that the issue is not SQL Server? No blocking, low cpu, normal memory, low disk... It sure sounds like SQL Server might be OK (except for the restart).
I'd need more metrics to hazard a guess. It has to be something.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 27, 2012 at 7:50 am
I have suspected other things like network utilization, IIS, etc. If a SQL restart resolves it, it could just be because it disconnects all users that are connected to the "app" as a whole. Can I insert an image to the posts here? I do have some graphs showing the timeframe yesterday when we had the issue. One thing that is weird to me is that the open sessions (user connections) jumps up during this time but looking at this graph, 98% of them are "idle" and not active. Possibly their connections are not getting terminated properly at logoff? I'm throwing darts here...
April 27, 2012 at 7:53 am
Image....
April 27, 2012 at 8:10 am
Interesting your not seeing anything in Diagnostic Manager..
I would check the following. Go back to the time of the slowdown and check
Sessions - Details
Sort the Wait Time (ms) graph to decending. Navigate before and after the slowdown period. See if any sessions show a wait.
Next go to a little after the time then navigate to Resources. Does there look to be any spike at all? If so drill down that area...ie Disk.
Review your Memory Usage..Make sure your not reaching your max...If you are you should see some paging or procedure cache being pushed out.
Check the SQL Logs too. Verify that your not having anything in those logs.
If the system is "freezing" it should show it self as a wait somewhere.
April 27, 2012 at 8:16 am
This might help also.
SQL SERVER TROUBLESHOOTING CHECKLIST
http://www.brentozar.com/wp-content/uploads/2012/03/FirstResponderKit_CheckList.pdf
April 27, 2012 at 8:28 am
Or, for that matter, check out Jonathan Kehayias' & Ted Krueger's book. It's a free download.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 27, 2012 at 8:38 am
I appreciate all the help! I'll be going through a couple of those resources to start working through and see where I can increase our efficiency. I checked out my timeframes and looks like as far as wait times, there is "Other: CXPACKET" at the top of nearly every chart and then "Non-Page Latch: LATCH_EX" is right below it. And after this timeframe there is a spike in SQL Server Physical I/O and then back down. I checked the logs and there is nothing indicating an issue around that time. I do see that some trace files were started and stopped and I copied and pasted those below, but that is about it.
2012-04-25 16:12:48.96 spid89 DBCC TRACEON 3604, server process ID (SPID) 89. This is an informational message only; no user action is required.
2012-04-25 16:12:49.02 spid89 DBCC TRACEOFF 3604, server process ID (SPID) 89. This is an informational message only; no user action is required.
2012-04-25 16:12:52.37 spid89 DBCC TRACEON 3604, server process ID (SPID) 89. This is an informational message only; no user action is required.
2012-04-25 16:12:52.44 spid89 DBCC TRACEOFF 3604, server process ID (SPID) 89. This is an informational message only; no user action is required.
April 27, 2012 at 8:48 am
billo (4/27/2012)
I appreciate all the help! I'll be going through a couple of those resources to start working through and see where I can increase our efficiency. I checked out my timeframes and looks like as far as wait times, there is "Other: CXPACKET" at the top of nearly every chart and then "Non-Page Latch: LATCH_EX" is right below it. And after this timeframe there is a spike in SQL Server Physical I/O and then back down. I checked the logs and there is nothing indicating an issue around that time. I do see that some trace files were started and stopped and I copied and pasted those below, but that is about it.2012-04-25 16:12:48.96 spid89 DBCC TRACEON 3604, server process ID (SPID) 89. This is an informational message only; no user action is required.
2012-04-25 16:12:49.02 spid89 DBCC TRACEOFF 3604, server process ID (SPID) 89. This is an informational message only; no user action is required.
2012-04-25 16:12:52.37 spid89 DBCC TRACEON 3604, server process ID (SPID) 89. This is an informational message only; no user action is required.
2012-04-25 16:12:52.44 spid89 DBCC TRACEOFF 3604, server process ID (SPID) 89. This is an informational message only; no user action is required.
CXPacket is an indication of parallelism, but nothing to panic about. It's frequently the top wait on most systems, but isn't, by itself an indication of a problem. But, combine it with LATCH_EX, I think you might be looking at I/O issues. Plus the spike in physical I/O. Yep. That's what it sounds like. You've got problems at the disk. What specifically, that's going to require more checking.
The TRACE stuff, guessing here, is probably Idera. Not sure it's an issue or not, but I wouldn't sweat it yet.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 1, 2012 at 1:10 pm
My few cents here.
1. run wait stats query available in below link and share top 10 records.
http://www.sqlskills.com/BLOGS/PAUL/post/Wait-statistics-or-please-tell-me-where-it-hurts.aspx
2. isolate the db having the pain. sys.dm_io_virtual_file_stats dmv will help you to find that.
3. identify painful objects in key databases. sql server sys.dm_db_index* dmv's will help you
4. you have mentioned that there is a surge in number of user connections.
capture all the users list and understand group by the type of connection
i mean, connections from SSRS/Applications/direct sql connections.
5. check if there is an entry in job history during the slow down.
6. take list of tables having huge volume of records, find their dependencies (stored proc)
and check whether those sp's are tuned. (run profiler with filter and capture long running queries/sp's)
7. include usable indexes/(nolocks) if possible.
8. anti virus is installed in sql box?
run Glenn's queries (available in below link) and check whether you can find something useful.
my fair guess is, IO issue could be the reason. adding good indexes/tuning some key stored procedures will help.
May 2, 2012 at 5:03 am
can u tell us the hardware configuration of server box? no if databases ? size of databases? disks ? and is any reporting intsance sitting on same box?
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
May 2, 2012 at 5:04 am
Ramkumar (LivingForSQLServer) (5/1/2012)
My few cents here.2. isolate the db having the pain. sys.dm_io_virtual_file_stats dmv will help you to find that.
3. identify painful objects in key databases. sql server sys.dm_db_index* dmv's will help you
Ram
can you please explain above terms?
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
May 2, 2012 at 6:02 am
For everyone on the thread, a couple things I have come across... first of all I just started at this place about 3 weeks ago so I just got to start diving into issues. Turns out, this server has only one physical disk. Logs, database, tempdb, OS, all on the same physical disk. They are spread out among different partitions, but same overall disk. Secondly, I have found that there are many missing indexes. The first table in my analysis showed the table gets a seek/scan around 230,000 times per day so this was the first one I put indexes on.
As far as server hardware, its an HP server, 64 GB of memory, quad 2.66 processor. Windows 2008 Enterprise Server SP2. 1 600GB hard drive with C, E, F partitions. There are 7 databases, 6 are smaller databases. Sales apps, virus definition db, idera repository. Our main DB is around 28GB, the rest are less than 5GB all together.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply