October 2, 2020 at 3:33 pm
I am seeing the execution plan shows proper indexing but the actual execution of the query takes more amount of time. This particular query is executed from application i.e. Java connection is going away with SQL query to follow we commonly agree is a valid execution plan. But this particular query runs from application and make abnormal CPU spike and high reads and goes down once. I would expect blocking but that has not been noticed in this case. Clearing the open sessions did not resolve this. However, restarting the database server did not resolve this either, so restarting the application servers has cleared the connections. I am trying to understand what could be wrong at the database side other than the symptom of abnormal CPU. Nothing in the logs and event viewer. Any thoughts? Thanks in Advance!
October 3, 2020 at 4:10 pm
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
October 3, 2020 at 7:13 pm
I am trying to understand what could be wrong at the database side other than the symptom of abnormal CPU. Nothing in the logs and event viewer. Any thoughts? Thanks in Advance!
It could be any of or a combination of a dozen different things.
There's likely more things to check but those are places to start. Setting up an SQL Profiler or Extended Events session to look for what the code is waiting on could provide and important lead to what's causing the spike and the longer run.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 5, 2020 at 11:52 am
Everything that Jeff says, plus, under resourced servers, network configuration issues, and a whole slew of other system, network, OS, or server configuration issues. Just based on the information, "things are slow, we don't think it's blocking, the plan looks ok" it's really hard to say. Get more data. Capture wait statistics for the query specifically so you understand what it's waiting on. Look at all the resources on the system. Capture performance metrics over the process to see where things are slowing down. That's what's needed.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 6, 2020 at 6:36 pm
Thanks guys! We noticed this issue again. So this time I did not restart the database server at all and instead advised to restart application servers only and everything came back online after application server restart. To me it looks like the issue mostly on application side rather than a database server. However, the only thing I am still not getting is CPU usage is going high during this event on the database server, and it does not go down until after the application server restart. Does exceeding the number connections on the application server could trigger this problem? Please advise?
October 6, 2020 at 9:08 pm
How about query time-out to exit the query when it has been executed over default 10 minutes to unlimited 0?
October 7, 2020 at 12:19 pm
Without a lot more details, it's really hard to say what could be the cause. Sounds like the application server is doing something funky which results in excess CPU use. However, what exactly? Not enough information to tell you. I'd suggest using Extended Events to capture query behaviors. See what queries are being called, how often, by what accounts/apps. Aggregate that data, especially across one of these periods where it all goes badly to understand what's happening.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 7, 2020 at 4:37 pm
How about query time-out to exit the query when it has been executed over default 10 minutes to unlimited 0?
If you have a front end query that's taking longer than 10 minutes, you need to fix the query. You can set it to unlimited but that's a bit like putting a bandaid on a deep stab wound. If you change the timeout, be prepared to find and fix the query because it could take quite some time indeed. Or, you can cut out what will become obvious and find/fix the query now. The only exception to that as a general rule for me is for long data transfers, especially backups to offsite servers. That's about it for exceptions.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 7, 2020 at 4:45 pm
Thanks guys! We noticed this issue again. So this time I did not restart the database server at all and instead advised to restart application servers only and everything came back online after application server restart. To me it looks like the issue mostly on application side rather than a database server. However, the only thing I am still not getting is CPU usage is going high during this event on the database server, and it does not go down until after the application server restart. Does exceeding the number connections on the application server could trigger this problem? Please advise?
To me, it sounds like either a "Connection Leak" or a "Memory Leak". Those can be really difficult to first prove and then isolate but, if either one is the problem (and I think one or the other may be), you folks are going to need to spend the time finding the source and then fixing it.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 14, 2020 at 8:04 pm
Here are the additional findings hope someone have more inputs. High CPU during this period only a front application reboot will resolve. There are alot connections but do not see any. Queries show no orphaned queries. Found a problematic query. Problem is that bad plan used non clustered index instead of clustered index which is a better index for the inner join. This index reads 100s of million rows. Forced a good plan with query store and left query store on to capture all. So far the server is operating well, does not know if this continues to work, or if further plans need to be forced if variations of the query text come in. I run Ola Hallengren index rebuild. Do you think other than Forced plan I am thinking of running update stats with full scan every day on top of it. Do you think this would help for high cpu?
October 14, 2020 at 9:47 pm
I would run the stats rebuilds long before I ran the current industry standard for supposed "Best Practices" (Reorg between 5 and 30% logical fragmentation, REBUILD for more than 30%)... especially on any index where you don't know the exact INSERT/UPDATE pattern. Like I've said in the past, it's better to do no index maintenance than it is to do it wrong.
No... I am NOT saying that Ola's code is bad. In fact, I agree with Paul Randal in calling it the "Gold Standard" for index maintenance. The problem isn't with Ola's code... the problem is with how people use it. It's not actually an index "optimizer" (it's not capable of analyzing indexes and assigning the correct Fill Factor, for example). It's an excellent tool for doing index maintenance. If you use it following the industry wide supposed "Best Practices", then you're using Ola's code incorrectly (IMHO).
If your analysis for Fill Factor is based only on logical fragmentation, then you'll be doing that wrong, as well, because a whole lot of INSERT/UPDATE patterns are going to cause fragmentation even if you set the Fill Factor as low as 10%.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 15, 2020 at 12:50 pm
If you're rebuilding the indexes, then don't update the statistics. An index rebuild updates the statistics with a full scan. You don't need to run that after a rebuild.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 15, 2020 at 12:56 pm
If you're rebuilding the indexes, then don't update the statistics. An index rebuild updates the statistics with a full scan. You don't need to run that after a rebuild.
Just to add a qualifier to that...
Index rebuilds DO inherently rebuild the INDEX statistics because rebuilds rebuild the index to a new copy of the index and, once committed, it drops the old index. It does NOT, however, rebuild COLUMNAR statistics. You still have to maintain those.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 16, 2020 at 9:51 pm
Thanks. Can you comment on Query store, Forced a good plan with query store and left query store on to capture all. So far the server is operating well. Would you think this would help for the cause of high CPU. I have query store max size of 100MB. What happens if the query store reach to max size. Would there be any impact to database other than the collection of data?
October 17, 2020 at 1:49 am
Ah... I just thought of something else it could be. Have the Developers check the connection strings for the apps. If it doesn't explicitly turn off MARS (multiple active result sets), that can definitely be an issue (it killed us until we figured it out). You may even find that the connection strings are explicitly turning it on. It should be off in about 99.999999% of the cases.
Note that turning MARS off does NOT prevent the app code from receiving more than one result set from a given proc.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply