Same query different actual execution plan

Question

Same query different actual execution plan

Jay@Work

SSCrazy

Points: 2633
More actions
March 5, 2018 at 4:59 pm

#343863

I guess I could summarise my question by asking if a query plan is created based on available resources.
Here is the long winded version.
We have a database which we regularly restore to our development environment.
Recently a piece of code has become very sluggish in the development environment. The dev environment behaves well on the whole.
This database is identical in both environments in that the SQL version is the same, the indexes and statistics are maintained weekly.
The only difference is production is 4 CPUs and 32GB RAM where as the dev is 2 CPUs and 16 GB RAM.
Now if we were talking a few seconds difference I would put it down to specs but the difference is instant for production and around 1 minute for dev.
Little else is happening in Dev, there is no locking/blocking/stress.
I look at the actual execution plan for the same query on both servers and there is a stand out difference. I can send screenshots/XML if it helps?
Basically production uses a key lookup with a 100% cost where as dev is using a "hash mash (inner join)" for 53% cost and a few repartition(?) steps (query performance is not my strong point)!
On dev the EP tells me I should create an index to reduce the time by 89% whereas production just spits it our in under a second.
Both tables have the same number of rows, the same indexes, keys everything!
Would this change of execution plan be based on the available CPU/RAM at the time the query ran. I'm at a complete loss to understand the difference in EP and performance across the 2 servers.
I've even had the VM guy check under the hood to make sure there is no disk/VM reasons.
The perplexing thing is that all other queries/functions I've looked at in this DB/Application behave almost identically bar a few milliseconds

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Jay@Work - Monday, March 5, 2018 4:59 PM
I guess I could summarise my question by asking if a query plan is created based on available resources.
Here is the long winded version.
We have a database which we regularly restore to our development environment.
Recently a piece of code has become very sluggish in the development environment. The dev environment behaves well on the whole.
This database is identical in both environments in that the SQL version is the same, the indexes and statistics are maintained weekly.
The only difference is production is 4 CPUs and 32GB RAM where as the dev is 2 CPUs and 16 GB RAM.
Now if we were talking a few seconds difference I would put it down to specs but the difference is instant for production and around 1 minute for dev.
Little else is happening in Dev, there is no locking/blocking/stress.
I look at the actual execution plan for the same query on both servers and there is a stand out difference. I can send screenshots/XML if it helps?
Basically production uses a key lookup with a 100% cost where as dev is using a "hash mash (inner join)" for 53% cost and a few repartition(?) steps (query performance is not my strong point)!
On dev the EP tells me I should create an index to reduce the time by 89% whereas production just spits it our in under a second.
Both tables have the same number of rows, the same indexes, keys everything!
Would this change of execution plan be based on the available CPU/RAM at the time the query ran. I'm at a complete loss to understand the difference in EP and performance across the 2 servers.
I've even had the VM guy check under the hood to make sure there is no disk/VM reasons.
The perplexing thing is that all other queries/functions I've looked at in this DB/Application behave almost identically bar a few milliseconds

Would help if you posted the two execution plans (as *.sqlplan files). Hard to say anything without that information.

RandomStream Hall of Fame Points: 3735 More actions · Answer 2

This article has been invaluable to me:
Slow in the Application, Fast in SSMS?

Jay@Work SSCrazy Points: 2633 More actions · Answer 3

RandomStream - Monday, March 5, 2018 5:33 PM
This article has been invaluable to me:
Slow in the Application, Fast in SSMS?

The query is slow both in the application and SSMS. But instant in both production areas

Jay@Work SSCrazy Points: 2633 More actions · Answer 4

Lynn Pettis - Monday, March 5, 2018 5:07 PM
Jay@Work - Monday, March 5, 2018 4:59 PM
I guess I could summarise my question by asking if a query plan is created based on available resources.
Here is the long winded version.
We have a database which we regularly restore to our development environment.
Recently a piece of code has become very sluggish in the development environment. The dev environment behaves well on the whole.
This database is identical in both environments in that the SQL version is the same, the indexes and statistics are maintained weekly.
The only difference is production is 4 CPUs and 32GB RAM where as the dev is 2 CPUs and 16 GB RAM.
Now if we were talking a few seconds difference I would put it down to specs but the difference is instant for production and around 1 minute for dev.
Little else is happening in Dev, there is no locking/blocking/stress.
I look at the actual execution plan for the same query on both servers and there is a stand out difference. I can send screenshots/XML if it helps?
Basically production uses a key lookup with a 100% cost where as dev is using a "hash mash (inner join)" for 53% cost and a few repartition(?) steps (query performance is not my strong point)!
On dev the EP tells me I should create an index to reduce the time by 89% whereas production just spits it our in under a second.
Both tables have the same number of rows, the same indexes, keys everything!
Would this change of execution plan be based on the available CPU/RAM at the time the query ran. I'm at a complete loss to understand the difference in EP and performance across the 2 servers.
I've even had the VM guy check under the hood to make sure there is no disk/VM reasons.
The perplexing thing is that all other queries/functions I've looked at in this DB/Application behave almost identically bar a few milliseconds
Would help if you posted the two execution plans (as *.sqlplan files). Hard to say anything without that information.

I have edited the post and added the 2 files

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

The estimates are way out, but I’m guessingthat’s your data rather than stale stats.

It would be interesting to see the plan fromthe dev environment, with a nested loops join forced:

select t.charge_ctr,

t.transaction_date,

sum(d.AMOUNT),

t.status,

COUNT(*)

from nucChargeTransaction t

INNER loop JOINNUCCHARGETRANSACTION d

ON d.CHARGE_CTR= t.CHARGE_CTR

where t.transaction_ctr =50469380

and d.status in ('C','F')

group by t.CHARGE_CTR, t.TRANSACTION_DATE, t.STATUS

Check that the index [CHARGECTR_AK] is thesame in both DB’s.

Add column [Amount] to the INCLUDE part of thisindex in both DB’s to eliminate the key lookup.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

Several things looking at the execution plans.
One, drop the (nolock) hint. This is not a "go fast button" and could potentially cause erroneous data to be returned.
Two are the tables nucChargeTransaction and NUCCHARGETRANSACTION the same tables? Is the join between them a self join? If so, why?
Three, the difference I see between dev and production is the volume of data being returned. In dev you are returning close to 45 million rows of data for status in 'C' or 'F'. In production, however, you are only returning 5 rows of data for status in 'C' or 'F'.

Please post the DDL (CREATE TABLE statement) for the table(s) involved including all indexes defined. Also, please post some sample data (no more than 10 rows of data) as INSERT statements using Table Value Constructor format, https://docs.microsoft.com/en-us/sql/t-sql/queries/table-value-constructor-transact-sql.
Also, based on the sample data, show us what the expected results should be from the query.

Jay@Work SSCrazy Points: 2633 More actions · Answer 7

Lynn Pettis - Tuesday, March 6, 2018 8:23 AM
Several things looking at the execution plans.
One, drop the (nolock) hint. This is not a "go fast button" and could potentially cause erroneous data to be returned.
Two are the tables nucChargeTransaction and NUCCHARGETRANSACTION the same tables? Is the join between them a self join? If so, why?
Three, the difference I see between dev and production is the volume of data being returned. In dev you are returning close to 45 million rows of data for status in 'C' or 'F'. In production, however, you are only returning 5 rows of data for status in 'C' or 'F'.
Please post the DDL (CREATE TABLE statement) for the table(s) involved including all indexes defined. Also, please post some sample data (no more than 10 rows of data) as INSERT statements using Table Value Constructor format, https://docs.microsoft.com/en-us/sql/t-sql/queries/table-value-constructor-transact-sql.
Also, based on the sample data, show us what the expected results should be from the query.

Thanks I'll post the results of the above query - yes the tables are the same. The data is pretty much identical in both environments as we restore production over the top of dev.
This is a 3rd party application so I can't explain the why's. We could replace the underlying SP with one of our own if the performance becomes an issue (in production it takes a couple of milliseconds) but my query more relates to the difference in execution plan and would this be the result of the 2 vastly different execution times?

Jay@Work SSCrazy Points: 2633 More actions · Answer 8

Jay@Work - Tuesday, March 6, 2018 12:03 PM
Lynn Pettis - Tuesday, March 6, 2018 8:23 AM
Several things looking at the execution plans.
One, drop the (nolock) hint. This is not a "go fast button" and could potentially cause erroneous data to be returned.
Two are the tables nucChargeTransaction and NUCCHARGETRANSACTION the same tables? Is the join between them a self join? If so, why?
Three, the difference I see between dev and production is the volume of data being returned. In dev you are returning close to 45 million rows of data for status in 'C' or 'F'. In production, however, you are only returning 5 rows of data for status in 'C' or 'F'.
Please post the DDL (CREATE TABLE statement) for the table(s) involved including all indexes defined. Also, please post some sample data (no more than 10 rows of data) as INSERT statements using Table Value Constructor format, https://docs.microsoft.com/en-us/sql/t-sql/queries/table-value-constructor-transact-sql.
Also, based on the sample data, show us what the expected results should be from the query.
Thanks I'll post the results of the above query - yes the tables are the same. The data is pretty much identical in both environments as we restore production over the top of dev.
This is a 3rd party application so I can't explain the why's. We could replace the underlying SP with one of our own if the performance becomes an issue (in production it takes a couple of milliseconds) but my query more relates to the difference in execution plan and would this be the result of the 2 vastly different execution times?

Sorry forgot to mention that ultimately both environments return the same number or rows (which is 1). Why there is such a huge difference in the rows being handled mid-code I don't know. I just (guessed) put it down to the different ways in which the EP was extracting it

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 9

Jay@Work - Tuesday, March 6, 2018 12:03 PM
Lynn Pettis - Tuesday, March 6, 2018 8:23 AM
Several things looking at the execution plans.
One, drop the (nolock) hint. This is not a "go fast button" and could potentially cause erroneous data to be returned.
Two are the tables nucChargeTransaction and NUCCHARGETRANSACTION the same tables? Is the join between them a self join? If so, why?
Three, the difference I see between dev and production is the volume of data being returned. In dev you are returning close to 45 million rows of data for status in 'C' or 'F'. In production, however, you are only returning 5 rows of data for status in 'C' or 'F'.
Please post the DDL (CREATE TABLE statement) for the table(s) involved including all indexes defined. Also, please post some sample data (no more than 10 rows of data) as INSERT statements using Table Value Constructor format, https://docs.microsoft.com/en-us/sql/t-sql/queries/table-value-constructor-transact-sql.
Also, based on the sample data, show us what the expected results should be from the query.
Thanks I'll post the results of the above query - yes the tables are the same. The data is pretty much identical in both environments as we restore production over the top of dev.
This is a 3rd party application so I can't explain the why's. We could replace the underlying SP with one of our own if the performance becomes an issue (in production it takes a couple of milliseconds) but my query more relates to the difference in execution plan and would this be the result of the 2 vastly different execution times?

See point 3. In dev the query is pulling close to 45 million rows of data from nucChargeTransaction table and only 5 rows of data from the same table in production for the same criteria.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

And again, how about the DDL for the table, sample data for the table (not production data, and about 10 rows or so), and the expected results based on the sample data. There may be a better way to write the query.

Jay@Work SSCrazy Points: 2633 More actions · Answer 11

ChrisM@Work - Tuesday, March 6, 2018 3:15 AM
The estimates are way out, but Iâ€™m guessingthatâ€™s your data rather than stale stats.
It would be interesting to see the plan fromthe dev environment, with a nested loops join forced:

select t.charge_ctr,
        t.transaction_date,
        sum(d.AMOUNT),
        t.status,
        COUNT(*)
from   nucChargeTransaction t
INNER loop JOINNUCCHARGETRANSACTION d
    ON d.CHARGE_CTR= t.CHARGE_CTR
where t.transaction_ctr =50469380
       and d.status in ('C','F')
group by t.CHARGE_CTR, t.TRANSACTION_DATE, t.STATUS

Check that the index [CHARGECTR_AK] is thesame in both DBâ€™s.
Add column [Amount] to the INCLUDE part of thisindex in both DBâ€™s to eliminate the key lookup.

I have attached the EP for this version of the code - the results were returned pretty much instantly (just like they are when the original code is run in production).
Both indexes exist and identical in both environments. I'm not able to add indexes to the tables due to it being 3rd party. Code adjustments yes, in certain cases, but not DDL.
I've worked with this product (with this organisation) since 2002 and am very familiar with the application screen where this code is run. I have never seen this behaviour on this screen ever.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 12

Okay, just curious, what does this return:
SELECT [t].[charge_ctr] , [t].[transaction_date] , SUM([t].[AMOUNT]) AS 'SumAmt' , [t].[status] , COUNT(*) AS 'RecCnt' FROM [dbo].[nucChargeTransaction] [t] WHERE [t].[transaction_ctr] = 50469380 AND [t].[status] IN ('C', 'F') GROUP BY [t].[CHARGE_CTR] , [t].[TRANSACTION_DATE] , [t].[STATUS];

Jay@Work SSCrazy Points: 2633 More actions · Answer 13

Lynn Pettis - Tuesday, March 6, 2018 12:12 PM
And again, how about the DDL for the table, sample data for the table (not production data, and about 10 rows or so), and the expected results based on the sample data. There may be a better way to write the query.

I imagine there is a better way to write the query, this application/DB has spanned many versions of SQL Server and each time we patch the code will revert back to it's out of the box when the schema is rebuilt.
. But this query has worked instantly hundreds of times a day for hundred of users for 16 years. Until now (and only in Dev). The tables have not changed, the code has not changed, the indexes/stats are maintained, the hardware is working fine, there is no locking or blocking so all I really want to get to the bottom of is why are the 2 environments treating the exact same code so differently on 2 different servers? I thought maybe there is a server/Db config that has some how been altered, or a corrupt index (if there is such a thing) that may need rebuilding?
Same query/data/results

select t.charge_ctr,

t.transaction_date,

sum(d.AMOUNT),

t.status,

COUNT(*)

from nucChargeTransaction t(nolock)

INNER JOIN NUCCHARGETRANSACTIONd(nolock) ON d.CHARGE_CTR = t.CHARGE_CTR

where t.transaction_ctr = 50469380

and d.status in ('C','F')

group by t.CHARGE_CTR, t.TRANSACTION_DATE, t.STATUS

Jay@Work SSCrazy Points: 2633 More actions · Answer 14

Lynn Pettis - Tuesday, March 6, 2018 12:20 PM
Okay, just curious, what does this return:
SELECT [t].[charge_ctr] , [t].[transaction_date] , SUM([d].[AMOUNT]) AS 'SumAmt' , [t].[status] , COUNT(*) AS 'RecCnt' FROM [dbo].[nucChargeTransaction] [t] WHERE [t].[transaction_ctr] = 50469380 AND [t].[status] IN ('C', 'F') GROUP BY [t].[CHARGE_CTR] , [t].[TRANSACTION_DATE] , [t].[STATUS];

charge_ctr transaction_date SumAmt status RecCnt

----------- ----------------------- --------------------------------------- ------ -----------

3901610 2017-06-19 00:00:00.000 -60.000000 C 1

(1 row(s) affected)