order of criteria in where clause changes query palan

Question

Post reply

order of criteria in where clause changes query palan

Adi Cohn-120898

SSC-Dedicated

Points: 33944
More actions
November 4, 2011 at 3:55 am

#251177

Inna is a colleague of mine that came across an interesting problem. She had a query that had a wrong query plan. She modified the order of the conditions in the where clause, and got a better query plan and the query ran much faster. She even wrote a small script that reproduces this behavior. You’ll might need to run the script 2 or 3 times because it is based on random data, but it does show different query plans for 2 queries that are logically the same. I think that this is a bug, but I might be wrong. Can anyone have an explanation for that?
Use tempdb
GO
CREATE TABLE Inna (ID int identity(1,1), DataCLInd datetime constraint DF_Inna_DataCLInd default (getdate()),
SecField INT, Third INT,
constraint PK_Inna_DataCLInd_SecField_Third Primary Key Clustered (DataCLInd,SecField,Third)
)
insert into Inna (DataCLInd,
SecField,
Third)
Values ('20110101' , 2,23)
DECLARE @ROWCOUNT int
select @ROWCOUNT = 1
--Inserting data. Since we are inserting random data, we will have few errors, but at the end we'll have
--more then 4000000 records and very fast
While @ROWCOUNT < 2000000
BEGIN
insert into Inna (DataCLInd,
SecField,
Third)
SELECT dateadd(MINUTE,+datepart(MILLISECOND,GETDATE())*datepart(MINUTE,GETDATE()),DataCLInd),
ID * 1.5,
ID * 4.5
FROM Inna
SELECT @ROWCOUNT = @@ROWCOUNT
END
go
--First query
declare @LastRunTime datetime
select @LastRunTime = '2011-01-16 00:00:52.870'
;WITH RankedRows AS (
SELECT ID,DataCLInd,SecField,Third,
RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum
FROM Inna
WHERE DataCLInd >= @LastRunTime AND DataCLInd >= DATEADD(MONTH,-1,getutcdate()))
SELECT ID,DataCLInd,SecField,Third
FROM RankedRows
WHERE RowNum = 1
go
--Second query
declare @LastRunTime datetime
select @LastRunTime = '2011-01-16 00:00:52.870'
;WITH RankedRows AS (
SELECT ID,DataCLInd,SecField,Third,
RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum
FROM Inna
WHERE DataCLInd >= DATEADD(MONTH,-1,getutcdate()) AND DataCLInd >= @LastRunTime)
SELECT ID,DataCLInd,SecField,Third
FROM RankedRows
WHERE RowNum = 1
go
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

Save the two different plans.

Swap the two queries around in the script, run until you get two different plans again.

Compare the four plans.

The results suggest that plan generation is dependant upon the resources available at the time the query is run - how could it be otherwise?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Lowell SSC Guru Points: 323518 More actions · Answer 2

A couple of Grant Fitychey's presentations he did on execution plans might be relevant;

as i remember it, the key to look for is the property "Reason For Early Termination of Statement Optimization"

if the value is anything other than "Good Enough Plan Found" in the actual execution plan, it means it gave up and used what was considered the best "so far"...so in that case the order of the joins and wheres have an impact on which plan to use.

for a large-ish query, at a certain point the query engine decides it would be more work to build a "perfect" execution plan than it is to use an exisitng one.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 3

Lowell (11/4/2011)
A couple of Grant Fitychey's presentations he did on execution plans might be relevant;
as i remember it, the key to look for is the property "Reason For Early Termination of Statement Optimization"
if the value is anything other than "Good Enough Plan Found" in the actual execution plan, it means it gave up and used what was considered the best "so far"...so in that case the order of the joins and wheres have an impact on which plan to use.
for a large-ish query, at a certain point the query engine decides it would be more work to build a "perfect" execution plan than it is to use an exisitng one.

In both cases the optimize level was full.

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

Adi Cohn-120898 (11/4/2011)
Lowell (11/4/2011)
A couple of Grant Fitychey's presentations he did on execution plans might be relevant;
as i remember it, the key to look for is the property "Reason For Early Termination of Statement Optimization"
if the value is anything other than "Good Enough Plan Found" in the actual execution plan, it means it gave up and used what was considered the best "so far"...so in that case the order of the joins and wheres have an impact on which plan to use.
for a large-ish query, at a certain point the query engine decides it would be more work to build a "perfect" execution plan than it is to use an exisitng one.
In both cases the optimize level was full.
Adi

One parallel, one not parallel (serial?) ?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

Adi Cohn-120898 (11/4/2011)
Lowell (11/4/2011)
A couple of Grant Fitychey's presentations he did on execution plans might be relevant;
as i remember it, the key to look for is the property "Reason For Early Termination of Statement Optimization"
if the value is anything other than "Good Enough Plan Found" in the actual execution plan, it means it gave up and used what was considered the best "so far"...so in that case the order of the joins and wheres have an impact on which plan to use.
for a large-ish query, at a certain point the query engine decides it would be more work to build a "perfect" execution plan than it is to use an exisitng one.
In both cases the optimize level was full.

Lowell's not talking about optimisation level (which can be full or trivial). He's talking about the 'reason for early abort'

Can you post both queries and both plans?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 6

Those are the 2 queries:

--First query

declare @LastRunTime datetime

select @LastRunTime = '2011-01-16 00:00:52.870'

;WITH RankedRows AS (

SELECT ID,DataCLInd,SecField,Third,

RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum

FROM Inna

WHERE DataCLInd >= @LastRunTime AND DataCLInd >= DATEADD(MONTH,-1,getutcdate()))

SELECT ID,DataCLInd,SecField,Third

FROM RankedRows

WHERE RowNum = 1

go

--Second query

declare @LastRunTime datetime

select @LastRunTime = '2011-01-16 00:00:52.870'

;WITH RankedRows AS (

SELECT ID,DataCLInd,SecField,Third,

RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum

FROM Inna

WHERE DataCLInd >= DATEADD(MONTH,-1,getutcdate()) AND DataCLInd >= @LastRunTime)

SELECT ID,DataCLInd,SecField,Third

FROM RankedRows

WHERE RowNum = 1

go

In my first message there is also a script to create the table and insert datat into the tables. The attached files contain the query plan for each one of the queries.

Thank you for your help.

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Lowell SSC Guru Points: 323518 More actions · Answer 7

i had to use winmerge to be sure, the two files you posted are identical; could you have copied the same plan as two different names?

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 8

Lowell (11/4/2011)
i had to use winmerge to be sure, the two files you posted are identical; could you have copied the same plan as two different names?

You are correct that they are the same. Each time that I right clicked on one of the query plans, it saved both. You can open just one of the files and see both query plans.

Sorry for my mistake.

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Lowell SSC Guru Points: 323518 More actions · Answer 9

well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?

i thought it would say something else...i've got to look at my notes on this.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

Lowell (11/4/2011)
well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?

The other did find a plan, it just had to go through all stages of optimisation before it found one that it considered good enough, whereas the first one aborted with a good enough plan without having to go through all the stages.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 11

GilaMonster (11/4/2011)
Lowell (11/4/2011)
well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?
The other did find a plan, it just had to go through all stages of optimisation before it found one that it considered good enough, whereas the first one aborted with a good enough plan without having to go through all the stages.

Gail, I'm sure I've read somewhere that the process of plan checking is susceptible to system pressures, that is, fewer plans can be checked within the allocated time - but I haven't found a reference for this anywhere. Do you know if this is indeed the case?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 12

Lowell (11/4/2011)
well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?
i thought it would say something else...i've got to look at my notes on this.

I see. It is interesting how come with one query the optimizer found a good enough plan and with the other one it didn't.

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Adi Cohn-120898 SSC-Dedicated Points: 33944 More actions · Answer 13

ChrisM@Work (11/4/2011)
GilaMonster (11/4/2011)
Lowell (11/4/2011)
well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?
The other did find a plan, it just had to go through all stages of optimisation before it found one that it considered good enough, whereas the first one aborted with a good enough plan without having to go through all the stages.
Gail, I'm sure I've read somewhere that the process of plan checking is susceptible to system pressures, that is, fewer plans can be checked within the allocated time - but I haven't found a reference for this anywhere. Do you know if this is indeed the case?

I'm running the queries on my laptop. I'm the only one that works with it and there are no jobs. I still get different queries for each procedure. I don't think that it has anything to do with resources

Adi

--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 14

Adi Cohn-120898 (11/4/2011)
Inna is a colleague of mine that came across an interesting problem. She had a query that had a wrong query plan. She modified the order of the conditions in the where clause, and got a better query plan and the query ran much faster. She even wrote a small script that reproduces this behavior. You’ll might need to run the script 2 or 3 times because it is based on random data, but it does show different query plans for 2 queries that are logically the same. I think that this is a bug, but I might be wrong. Can anyone have an explanation for that?
Use tempdb
GO
CREATE TABLE Inna (ID int identity(1,1), DataCLInd datetime constraint DF_Inna_DataCLInd default (getdate()),
SecField INT, Third INT,
constraint PK_Inna_DataCLInd_SecField_Third Primary Key Clustered (DataCLInd,SecField,Third)
)
insert into Inna (DataCLInd,
SecField,
Third)
Values ('20110101' , 2,23)
DECLARE @ROWCOUNT int
select @ROWCOUNT = 1
--Inserting data. Since we are inserting random data, we will have few errors, but at the end we'll have
--more then 4000000 records and very fast
While @ROWCOUNT < 2000000
BEGIN
insert into Inna (DataCLInd,
SecField,
Third)
SELECT dateadd(MINUTE,+datepart(MILLISECOND,GETDATE())*datepart(MINUTE,GETDATE()),DataCLInd),
ID * 1.5,
ID * 4.5
FROM Inna
SELECT @ROWCOUNT = @@ROWCOUNT
END
go
--First query
declare @LastRunTime datetime
select @LastRunTime = '2011-01-16 00:00:52.870'
;WITH RankedRows AS (
SELECT ID,DataCLInd,SecField,Third,
RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum
FROM Inna
WHERE DataCLInd >= @LastRunTime AND DataCLInd >= DATEADD(MONTH,-1,getutcdate()))
SELECT ID,DataCLInd,SecField,Third
FROM RankedRows
WHERE RowNum = 1
go
--Second query
declare @LastRunTime datetime
select @LastRunTime = '2011-01-16 00:00:52.870'
;WITH RankedRows AS (
SELECT ID,DataCLInd,SecField,Third,
RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum
FROM Inna
WHERE DataCLInd >= DATEADD(MONTH,-1,getutcdate()) AND DataCLInd >= @LastRunTime)
SELECT ID,DataCLInd,SecField,Third
FROM RankedRows
WHERE RowNum = 1
go
Adi

Adi,

Different lesson and just a suggestion to speed up writing and executing test table construction... try the following. The ISNULLs are there to make NOT NULL columns. I think you'll find it much faster than your loop. Easier to control the absolute size and range of the data, as well.

USE TempDB

;

SELECT TOP 2000000

ID = IDENTITY(INT,1,1),

DataCLInd = ISNULL(RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'2011','2012')+CAST('2011' AS DATETIME),0),

SecField = ISNULL(ABS(CHECKSUM(NEWID()))%1000000,0),

Third = ISNULL(ABS(CHECKSUM(NEWID()))%1000000,0)

INTO dbo.Inna

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

;

ALTER TABLE dbo.Inna

ADD CONSTRAINT PK_Inna_DataCLInd_SecField_Third PRIMARY KEY CLUSTERED (DataCLInd,SecField,Third)

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)