Query maxing out 24 CPUs

Question

Post reply

Query maxing out 24 CPUs

M_E_K

SSChasing Mays

Points: 613
More actions
September 20, 2011 at 7:46 am

#244027

Hello SQL Performance Experts,
I could really use some assistance troubleshooting a long running query which maxes out 24 CPUs in our prod environment.
This is for an OLAP database which on the whole has very good performance. All stats are up to date, indexes exist for all columns in the predicate and are rebuilt on a weekly basis.
Running this will cause all 24 CPUs to jump to about 30% in usage... running 3-4 of these queries simultaneously will max out all CPUs.
The UAT environment which a very similar data shape does not have this problem. Execution plans are different between the two. If I use the execution plan from UAT with the use plan hint in PROD it finishes in a split second.
SQL server versions are identical (2005 enterprise x64 on Windows 2003 R2 x64 (9.00.4035.00) )
Tables, indexes are identical. SQL Server configuration is identical, so my question is:
What else can cause such drastically different execution plans?

Viewing 15 posts - 1 through 15 (of 34 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

Ninja's_RGR'us

SSC Guru

Points: 294069

September 20, 2011 at 7:49 am

#1383851

double post.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 2

Ninja's_RGR'us

SSC Guru

Points: 294069

September 20, 2011 at 7:49 am

#1383852

Please post both good & bad ACTUAL execution plans.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Please post table definitions, index definitions and execution plan, as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

M_E_K SSChasing Mays Points: 613 More actions · Answer 4

Hi Folks,

Thanks so much!

I've attached the requested information.

CAVEAT: I can't get actual execution plan from PROD because the query never completes (i've let it run close to 8 hours), so plan is only estimate. But you'll see it's vastly different from plan from UAT.

Let me know if you want any index stats or other troubleshooting info from the servers.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Massive difference is an understatement.

Unfortunately most of my tricks work on the basis of having the actual plan.

Hopefully Gail has a few more magic tricks she can share!

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 6

It's always hard to know if the estimated values are going to be the same as the actual, but taking it as it exists, 87% of the cost of that query is going against an RID lookup. That means you have tables without clustered index, and it means your indexes that you do have, are not covering the query.

I'd suggest making clustered indexes on the tables, even OLAP tables.

The output list for the RID lookup operation is only a couple of columns. See if you can add them as an INCLUDE in one of the other indexes.

You've also got a table scan on [OLAP].[dbo].[WC_EPCR_GA_OWNERSHIP_H] [T346736]. It's feeding to hash join that is probing on [OLAP].[dbo].[WC_EPCR_GA_OWNERSHIP_H].GA_ACCNT_ID and a residual probe on: [OLAP].[dbo].[W_PARTY_ORG_D].[INTEGRATION_ID] as [T67365].[INTEGRATION_ID]=[OLAP].[dbo].[WC_EPCR_GA_OWNERSHIP_H].[GA_ACCNT_ID] as [T346736].[GA_ACCNT_ID]. It seems like whatever indexes you have there, they're not being well used.

That's what I saw from a quick look through.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

M_E_K SSChasing Mays Points: 613 More actions · Answer 7

Hi Grant,

Thanks for your suggestions.

Making clustered indexes is not an option right now. There are no indexes on WC_EPCR_GA_OWNERSHIP_H, which is why you see the full scan. There are less than 10K records in this table. Also, we can take out the WC_EPCR_GA_OWNERSHIP_H table from the query and it still has the same problem.

Also, the UAT environment has exactly the same tables, indexes, and similar row counts on the tables, but the execution plan is very different and the query finishes in a split second. Indexes are rebuilt every weekend, stats are up to date, so what else could cause such a huge difference in execution plan? note, I've run DBCC FREEPROCCACHE to make sure it's not a cached plan, but still the same issue...

Any one have any tools or methodologies for probing how SQL server optimizer is constructing the execution plan?

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 8

Whole books have been writen on the subject and you have one of the best authors on the subject here. So if he doesn't know, then likely noone knows except the programmer at M$.

Simlar rowcount is very different from exact match. That's more than enough to completely change the exec plan (especially if your near the tipping point(s) to change the plan).

More to the point, the optimizer gives you a decent fast plan, it's always more of a guess than exact science. So it's possible that you get 2 different plans based on very similar queries / contexts.

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 9

In addition to what the ninja says, you need to check all the fundamentals, server settings, is the memory identically configured, are the cpu's identically configured, do you have the same threshold for parallelism, do you have the same default ANSI settings. Any or all of these can affect execution plans.

Then, yeah, the data is the big one. It's not just the number of rows, it's the precise distribution of data within the rows that can seriously disrupt the execution plan generated. If you're not compare precise copies of the database on exactly the same servers... you can't really compare the two.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

And by precises he means 100%.

In theory, even 99.999999999% equivalent can change the plan. Of course you'd be hitting a bit of bad luck here.

It's just a metter of crossing the threshold.

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 11

You've got a GIANT disparity in statistics between the two data sets. One estimates 3.7 million, the other estimates 174. Right there is the main point of difference that I can see. That's just starting with the SELECT operator and not looking at the rest of the plan. The fact that it also returns zero rows makes comparing it's performance to something that returns... something I assume, difficult.

Statistics are probably the #1 driving force behind execution plans. If you're comparing two completely different sets of statistics, and it sure looks like you are, you're getting an apples to hammers type of comparison.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 12

Grant Fritchey (9/20/2011)
You've got a GIANT disparity in statistics between the two data sets. One estimates 3.7 million, the other estimates 174. Right there is the main point of difference that I can see. That's just starting with the SELECT operator and not looking at the rest of the plan. The fact that it also returns zero rows makes comparing it's performance to something that returns... something I assume, difficult.
Statistics are probably the #1 driving force behind execution plans. If you're comparing two completely different sets of statistics, and it sure looks like you are, you're getting an apples to hammers type of comparison.

And with the actual plan we would be able to see the compile values VS actual values.

Not 100% sure if the plan in cache has that info, I would guess it does (I don't usually use that to pull out plans).

That would have been my next step... along with stats & actual vs estimated, missing indexes & plan timeout message + other errors.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 13

Ninja's_RGR'us (9/20/2011)
Not 100% sure if the plan in cache has that info, I would guess it does (I don't usually use that to pull out plans).

No, it doesn't.

If the cached plan was reused 5 times, which one's run-time info should it have? Add to that, enabling actual plans adds overhead to query execution. SQL's not going to do that unless asked (profiler trace or a set statement)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 14

The thing is, both plans show as having done FULL optimization, so these aren't terribly complex plans. But the costs are very high, and the table scans and RID lookups are contributing to the complexity and performance hits. I'd focus on eliminating those.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning