June 8, 2016 at 8:05 am
We ran into an issue with our ERP's database last night where out of nowhere any of our queries that were hitting a specific table in the database were failing.
Verified that there weren't any issues with the ERP application host and our database server wasn't showing anything out of the ordinary.
To get the end users running we were going to query the data directly so they could build their files manually.
That's when we found that one of the tables was taking far longer than normal to return. Trying to retrieve a week's worth of results usually takes seconds, last night it was taking several minutes.
Due to time constraints we resorted to restarting the server. Once everything was back up querying the table was running normally again and our users were able to finish their processes normally.
I've tried searching for how to identify what caused this issue, and found pages and pages of better ways to write queries.
What I'm hoping you'll be able to help me with is:
- What may have caused the problem
- How can we determine the source of the issue
- Ways we may be able to detect it before the end users report it
- How can we resolve the issue without having to restart the server
June 8, 2016 at 8:37 am
I understand when stuff goes wrong in production usually the panic sets in. However, before you reset SQL did you at least capture an execution plan of the long requests. How about your wait stats at the time? It might be really tough to determine what was the cause at this point. If restarting SQL fixed it for you it could have been a bad plan being used in cache that you flushed but who knows. The truth is there isn't enough information to go on here.
June 8, 2016 at 8:48 am
Unfortunately not. We were testing with a very basic query 'select * from table where date >= date1 and date <= date2' type query. Something we'll have to keep in mind the next time something happens.
At this point I'm not really expecting to find out why exactly it happened. If I can find out possible causes and ways to identify in the future I'll be happy.
You mentioned the possibility of a bad query plan in the cache, how could I determine and resolve that?
Should also mention that it was any query that was touching that table, we have a couple views in use that were failing for that table as well.
June 8, 2016 at 9:46 am
Erik de Jonge (6/8/2016)
Unfortunately not. We were testing with a very basic query 'select * from table where date >= date1 and date <= date2' type query. Something we'll have to keep in mind the next time something happens.At this point I'm not really expecting to find out why exactly it happened. If I can find out possible causes and ways to identify in the future I'll be happy.
You mentioned the possibility of a bad query plan in the cache, how could I determine and resolve that?
Should also mention that it was any query that was touching that table, we have a couple views in use that were failing for that table as well.
Right, but if you had long running queries, it could cause blocking while other resources are waiting for access to the table. This in turn could have led to timeouts. The truth is here I don't want to lead you down a rabbit hole. At this point you'll need to roll up your sleeves and do a little bit of work.
I can at least suggest:
-Review your indexes on that table. Are they appropriate?
-Try to replicate the issue in a dev environment. Try a wide variety of values for your predicate; don't assume all predicate values will return the data with the same efficiency.
-Keep an eye out for long running queries. Do you have any kind of monitoring setup? (I don't mean traces, those can be bad for performance)
-Look at your wait stats.
This is just off the top of my head. There are far more experience folks on these forums that may have better suggestions for you.
June 8, 2016 at 10:20 am
Right, but if you had long running queries, it could cause blocking while other resources are waiting for access to the table. This in turn could have led to timeouts. The truth is here I don't want to lead you down a rabbit hole. At this point you'll need to roll up your sleeves and do a little bit of work.
I can at least suggest:
-Review your indexes on that table. Are they appropriate?
-Try to replicate the issue in a dev environment. Try a wide variety of values for your predicate; don't assume all predicate values will return the data with the same efficiency.
-Keep an eye out for long running queries. Do you have any kind of monitoring setup? (I don't mean traces, those can be bad for performance)
-Look at your wait stats.
This is just off the top of my head. There are far more experience folks on these forums that may have better suggestions for you.
That's part of this whole headache, since it's part of our ERP system we don't readily have access to the queries that are being run. Also makes up the frustrating bit in trying to research since most of the result I'm finding are relating to optimizing the queries and table structure and not how to identify other sources. Most of the time we can't change what's being run but if we do find something that's causing problems we can at least kill it.
Our monitoring probably isn't where it should be, largely in part to us not writing or having access to 90+% of the queries being run. That being said do you have any recommendations?
I appreciate you giving the help that you can. The folks far more experienced than me is part of the reason I'm reaching out.
June 8, 2016 at 12:50 pm
SQL Server has built-in performance reports that could help you here.
Using SSMS, for the instance that contains the db having the problem:
1) right-click on the instance name
2) select "Reports"
3) select "Standard Reports"
4a) if the slow query is currently running, you can look at the "Blocked" report(s) to get info on what could be blocked by the query / blocking the query
4b) you can also look at "Top Queries ..." reports, since io is generally the most critical factor, and CPU could be a factor too.
For best overall performance without having to / being able go thru every query, look at index stats. I can provide a query to do that if you would like (it's rather lengthy and the analysis is not simple, but getting the best clustered index, in particular, can have a dramatic overall effect on SQL performance across the board).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 8, 2016 at 3:40 pm
Erik de Jonge (6/8/2016)
We ran into an issue with our ERP's database last night where out of nowhere any of our queries that were hitting a specific table in the database were failing.Verified that there weren't any issues with the ERP application host and our database server wasn't showing anything out of the ordinary.
To get the end users running we were going to query the data directly so they could build their files manually.
That's when we found that one of the tables was taking far longer than normal to return. Trying to retrieve a week's worth of results usually takes seconds, last night it was taking several minutes.
Due to time constraints we resorted to restarting the server. Once everything was back up querying the table was running normally again and our users were able to finish their processes normally.
I've tried searching for how to identify what caused this issue, and found pages and pages of better ways to write queries.
What I'm hoping you'll be able to help me with is:
- What may have caused the problem
- How can we determine the source of the issue
- Ways we may be able to detect it before the end users report it
- How can we resolve the issue without having to restart the server
This sounds like a classic case of the bad kind of "Parameter Sniffing". Next time it happens, run DBCC FREEPROCCACHE instead of rebooting. There will be a minute or two where things seem to run slower because of the recompiles that will occur but it's faster than a reboot and won't kick people off the system.
If that fixes the problem, then it really is likely that the bad kind of "Parameter Sniffing" actual was the problem and there are multiple ways to keep that one query from experiencing such a problem in the future. Google is your friend there.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 10, 2016 at 7:52 am
Thanks for the advice. Lord knows when exactly this is going to happen again but I'll definitely be checking out these options when it does.
All goes well we'll be able to get away from requiring restarts and start identifying where these issue are coming from.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply