Execution Plan

Question

Post reply

Execution Plan

Admingod

SSCertifiable

Points: 5997
More actions
October 2, 2020 at 3:33 pm

#3792540

I am seeing the execution plan shows proper indexing but the actual execution of the query takes more amount of time. This particular query is executed from application i.e. Java connection is going away with SQL query to follow we commonly agree is a valid execution plan. But this particular query runs from application and make abnormal CPU spike and high reads and goes down once. I would expect blocking but that has not been noticed in this case. Clearing the open sessions did not resolve this. However, restarting the database server did not resolve this either, so restarting the application servers has cleared the connections. I am trying to understand what could be wrong at the database side other than the symptom of abnormal CPU. Nothing in the logs and event viewer. Any thoughts? Thanks in Advance!

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Site Owners SSC Guru Points: 80323 More actions · Answer 1

Thanks for posting your issue and hopefully someone will answer soon.

This is an automated bump to increase visibility of your question.

Jeff Moden SSC Guru Points: 1004506 More actions · Answer 2

Admingod wrote:

I am trying to understand what could be wrong at the database side other than the symptom of abnormal CPU. Nothing in the logs and event viewer. Any thoughts? Thanks in Advance!

It could be any of or a combination of a dozen different things.

Non-parameterized queries that causes a long recompile. (We had some 100ms code that took 2-22 SECONDS to recompile EVERY TIME it was used and it got used tens of thousand of times per hour).
The fact that the connections made when in SSMS are usually different that connections made from an application like Java. SET ARTITHMETIC ABORT default to ON in SSMS and defaults to OFF for other connections. I'm not sure why but this setting in particular does seem to affect performance in a lot of cases.
The fact that when you run something in SSMS, you might miss the fact that it had to compile and will recompile the code even if you even just add or delete one space never mind any literals you may have in the code but frequently does so much more quickly in SSMS because of existing/persisted connections, etc. See problem 1 above.
The network path between the app and SQL Server will typically be different than the one you use with SSMS.
You are probably not using SSMS on a busy web server (and shouldn't be). I'm just talking about connection availability here again.
You could be running into the issue of taking time to make a connection from the app instead of it persisting like your SSMS session does.
Crap code on the Java side (see number 1 again) and some performance issues with Java itself, mostly but certainly not limited to because someone may not have configured the Java connections to the database correctly.
Communications settings in the registry of the WebServer.

There's likely more things to check but those are places to start. Setting up an SQL Profiler or Extended Events session to look for what the code is waiting on could provide and important lead to what's causing the spike and the longer run.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 398973 More actions · Answer 3

Everything that Jeff says, plus, under resourced servers, network configuration issues, and a whole slew of other system, network, OS, or server configuration issues. Just based on the information, "things are slow, we don't think it's blocking, the plan looks ok" it's really hard to say. Get more data. Capture wait statistics for the query specifically so you understand what it's waiting on. Look at all the resources on the system. Capture performance metrics over the process to see where things are slowing down. That's what's needed.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Admingod SSCertifiable Points: 5997 More actions · Answer 4

Thanks guys! We noticed this issue again. So this time I did not restart the database server at all and instead advised to restart application servers only and everything came back online after application server restart. To me it looks like the issue mostly on application side rather than a database server. However, the only thing I am still not getting is CPU usage is going high during this event on the database server, and it does not go down until after the application server restart. Does exceeding the number connections on the application server could trigger this problem? Please advise?

Admingod SSCertifiable Points: 5997 More actions · Answer 5

How about query time-out to exit the query when it has been executed over default 10 minutes to unlimited 0?

Grant Fritchey SSC Guru Points: 398973 More actions · Answer 6

Without a lot more details, it's really hard to say what could be the cause. Sounds like the application server is doing something funky which results in excess CPU use. However, what exactly? Not enough information to tell you. I'd suggest using Extended Events to capture query behaviors. See what queries are being called, how often, by what accounts/apps. Aggregate that data, especially across one of these periods where it all goes badly to understand what's happening.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004506 More actions · Answer 7

Admingod wrote:

How about query time-out to exit the query when it has been executed over default 10 minutes to unlimited 0?

If you have a front end query that's taking longer than 10 minutes, you need to fix the query. You can set it to unlimited but that's a bit like putting a bandaid on a deep stab wound. If you change the timeout, be prepared to find and fix the query because it could take quite some time indeed. Or, you can cut out what will become obvious and find/fix the query now. The only exception to that as a general rule for me is for long data transfers, especially backups to offsite servers. That's about it for exceptions.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004506 More actions · Answer 8

Admingod wrote:

Thanks guys! We noticed this issue again. So this time I did not restart the database server at all and instead advised to restart application servers only and everything came back online after application server restart. To me it looks like the issue mostly on application side rather than a database server. However, the only thing I am still not getting is CPU usage is going high during this event on the database server, and it does not go down until after the application server restart. Does exceeding the number connections on the application server could trigger this problem? Please advise?

To me, it sounds like either a "Connection Leak" or a "Memory Leak". Those can be really difficult to first prove and then isolate but, if either one is the problem (and I think one or the other may be), you folks are going to need to spend the time finding the source and then fixing it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Admingod SSCertifiable Points: 5997 More actions · Answer 9

Here are the additional findings hope someone have more inputs. High CPU during this period only a front application reboot will resolve. There are alot connections but do not see any. Queries show no orphaned queries. Found a problematic query. Problem is that bad plan used non clustered index instead of clustered index which is a better index for the inner join. This index reads 100s of million rows. Forced a good plan with query store and left query store on to capture all. So far the server is operating well, does not know if this continues to work, or if further plans need to be forced if variations of the query text come in. I run Ola Hallengren index rebuild. Do you think other than Forced plan I am thinking of running update stats with full scan every day on top of it. Do you think this would help for high cpu?

Jeff Moden SSC Guru Points: 1004506 More actions · Answer 10

I would run the stats rebuilds long before I ran the current industry standard for supposed "Best Practices" (Reorg between 5 and 30% logical fragmentation, REBUILD for more than 30%)... especially on any index where you don't know the exact INSERT/UPDATE pattern. Like I've said in the past, it's better to do no index maintenance than it is to do it wrong.

No... I am NOT saying that Ola's code is bad. In fact, I agree with Paul Randal in calling it the "Gold Standard" for index maintenance. The problem isn't with Ola's code... the problem is with how people use it. It's not actually an index "optimizer" (it's not capable of analyzing indexes and assigning the correct Fill Factor, for example). It's an excellent tool for doing index maintenance. If you use it following the industry wide supposed "Best Practices", then you're using Ola's code incorrectly (IMHO).

If your analysis for Fill Factor is based only on logical fragmentation, then you'll be doing that wrong, as well, because a whole lot of INSERT/UPDATE patterns are going to cause fragmentation even if you set the Fill Factor as low as 10%.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 398973 More actions · Answer 11

If you're rebuilding the indexes, then don't update the statistics. An index rebuild updates the statistics with a full scan. You don't need to run that after a rebuild.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004506 More actions · Answer 12

Grant Fritchey wrote:

If you're rebuilding the indexes, then don't update the statistics. An index rebuild updates the statistics with a full scan. You don't need to run that after a rebuild.

Just to add a qualifier to that...

Index rebuilds DO inherently rebuild the INDEX statistics because rebuilds rebuild the index to a new copy of the index and, once committed, it drops the old index. It does NOT, however, rebuild COLUMNAR statistics. You still have to maintain those.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Admingod SSCertifiable Points: 5997 More actions · Answer 13

Thanks. Can you comment on Query store, Forced a good plan with query store and left query store on to capture all. So far the server is operating well. Would you think this would help for the cause of high CPU. I have query store max size of 100MB. What happens if the query store reach to max size. Would there be any impact to database other than the collection of data?

Jeff Moden SSC Guru Points: 1004506 More actions · Answer 14

Ah... I just thought of something else it could be. Have the Developers check the connection strings for the apps. If it doesn't explicitly turn off MARS (multiple active result sets), that can definitely be an issue (it killed us until we figured it out). You may even find that the connection strings are explicitly turning it on. It should be off in about 99.999999% of the cases.

Note that turning MARS off does NOT prevent the app code from receiving more than one result set from a given proc.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)