November 4, 2011 at 3:55 am
Inna is a colleague of mine that came across an interesting problem. She had a query that had a wrong query plan. She modified the order of the conditions in the where clause, and got a better query plan and the query ran much faster. She even wrote a small script that reproduces this behavior. You’ll might need to run the script 2 or 3 times because it is based on random data, but it does show different query plans for 2 queries that are logically the same. I think that this is a bug, but I might be wrong. Can anyone have an explanation for that?
Use tempdb
GO
CREATE TABLE Inna (ID int identity(1,1), DataCLInd datetime constraint DF_Inna_DataCLInd default (getdate()),
SecField INT, Third INT,
constraint PK_Inna_DataCLInd_SecField_Third Primary Key Clustered (DataCLInd,SecField,Third)
)
insert into Inna (DataCLInd,
SecField,
Third)
Values ('20110101' , 2,23)
DECLARE @ROWCOUNT int
select @ROWCOUNT = 1
--Inserting data. Since we are inserting random data, we will have few errors, but at the end we'll have
--more then 4000000 records and very fast
While @ROWCOUNT < 2000000
BEGIN
insert into Inna (DataCLInd,
SecField,
Third)
SELECT dateadd(MINUTE,+datepart(MILLISECOND,GETDATE())*datepart(MINUTE,GETDATE()),DataCLInd),
ID * 1.5,
ID * 4.5
FROM Inna
SELECT @ROWCOUNT = @@ROWCOUNT
END
go
--First query
declare @LastRunTime datetime
select @LastRunTime = '2011-01-16 00:00:52.870'
;WITH RankedRows AS (
SELECT ID,DataCLInd,SecField,Third,
RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum
FROM Inna
WHERE DataCLInd >= @LastRunTime AND DataCLInd >= DATEADD(MONTH,-1,getutcdate()))
SELECT ID,DataCLInd,SecField,Third
FROM RankedRows
WHERE RowNum = 1
go
--Second query
declare @LastRunTime datetime
select @LastRunTime = '2011-01-16 00:00:52.870'
;WITH RankedRows AS (
SELECT ID,DataCLInd,SecField,Third,
RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum
FROM Inna
WHERE DataCLInd >= DATEADD(MONTH,-1,getutcdate()) AND DataCLInd >= @LastRunTime)
SELECT ID,DataCLInd,SecField,Third
FROM RankedRows
WHERE RowNum = 1
go
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
November 4, 2011 at 5:13 am
Save the two different plans.
Swap the two queries around in the script, run until you get two different plans again.
Compare the four plans.
The results suggest that plan generation is dependant upon the resources available at the time the query is run - how could it be otherwise?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 4, 2011 at 6:28 am
A couple of Grant Fitychey's presentations he did on execution plans might be relevant;
as i remember it, the key to look for is the property "Reason For Early Termination of Statement Optimization"
if the value is anything other than "Good Enough Plan Found" in the actual execution plan, it means it gave up and used what was considered the best "so far"...so in that case the order of the joins and wheres have an impact on which plan to use.
for a large-ish query, at a certain point the query engine decides it would be more work to build a "perfect" execution plan than it is to use an exisitng one.
Lowell
November 4, 2011 at 8:36 am
Lowell (11/4/2011)
A couple of Grant Fitychey's presentations he did on execution plans might be relevant;as i remember it, the key to look for is the property "Reason For Early Termination of Statement Optimization"
if the value is anything other than "Good Enough Plan Found" in the actual execution plan, it means it gave up and used what was considered the best "so far"...so in that case the order of the joins and wheres have an impact on which plan to use.
for a large-ish query, at a certain point the query engine decides it would be more work to build a "perfect" execution plan than it is to use an exisitng one.
In both cases the optimize level was full.
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
November 4, 2011 at 8:38 am
Adi Cohn-120898 (11/4/2011)
Lowell (11/4/2011)
A couple of Grant Fitychey's presentations he did on execution plans might be relevant;as i remember it, the key to look for is the property "Reason For Early Termination of Statement Optimization"
if the value is anything other than "Good Enough Plan Found" in the actual execution plan, it means it gave up and used what was considered the best "so far"...so in that case the order of the joins and wheres have an impact on which plan to use.
for a large-ish query, at a certain point the query engine decides it would be more work to build a "perfect" execution plan than it is to use an exisitng one.
In both cases the optimize level was full.
Adi
One parallel, one not parallel (serial?) ?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 4, 2011 at 8:42 am
Adi Cohn-120898 (11/4/2011)
Lowell (11/4/2011)
A couple of Grant Fitychey's presentations he did on execution plans might be relevant;as i remember it, the key to look for is the property "Reason For Early Termination of Statement Optimization"
if the value is anything other than "Good Enough Plan Found" in the actual execution plan, it means it gave up and used what was considered the best "so far"...so in that case the order of the joins and wheres have an impact on which plan to use.
for a large-ish query, at a certain point the query engine decides it would be more work to build a "perfect" execution plan than it is to use an exisitng one.
In both cases the optimize level was full.
Lowell's not talking about optimisation level (which can be full or trivial). He's talking about the 'reason for early abort'
Can you post both queries and both plans?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 4, 2011 at 9:01 am
Those are the 2 queries:
--First query
declare @LastRunTime datetime
select @LastRunTime = '2011-01-16 00:00:52.870'
;WITH RankedRows AS (
SELECT ID,DataCLInd,SecField,Third,
RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum
FROM Inna
WHERE DataCLInd >= @LastRunTime AND DataCLInd >= DATEADD(MONTH,-1,getutcdate()))
SELECT ID,DataCLInd,SecField,Third
FROM RankedRows
WHERE RowNum = 1
go
--Second query
declare @LastRunTime datetime
select @LastRunTime = '2011-01-16 00:00:52.870'
;WITH RankedRows AS (
SELECT ID,DataCLInd,SecField,Third,
RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum
FROM Inna
WHERE DataCLInd >= DATEADD(MONTH,-1,getutcdate()) AND DataCLInd >= @LastRunTime)
SELECT ID,DataCLInd,SecField,Third
FROM RankedRows
WHERE RowNum = 1
go
In my first message there is also a script to create the table and insert datat into the tables. The attached files contain the query plan for each one of the queries.
Thank you for your help.
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
November 4, 2011 at 9:14 am
i had to use winmerge to be sure, the two files you posted are identical; could you have copied the same plan as two different names?
Lowell
November 4, 2011 at 9:36 am
Lowell (11/4/2011)
i had to use winmerge to be sure, the two files you posted are identical; could you have copied the same plan as two different names?
You are correct that they are the same. Each time that I right clicked on one of the query plans, it saved both. You can open just one of the files and see both query plans.
Sorry for my mistake.
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
November 4, 2011 at 9:46 am
well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?
i thought it would say something else...i've got to look at my notes on this.
Lowell
November 4, 2011 at 9:58 am
Lowell (11/4/2011)
well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?
The other did find a plan, it just had to go through all stages of optimisation before it found one that it considered good enough, whereas the first one aborted with a good enough plan without having to go through all the stages.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 4, 2011 at 10:05 am
GilaMonster (11/4/2011)
Lowell (11/4/2011)
well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?The other did find a plan, it just had to go through all stages of optimisation before it found one that it considered good enough, whereas the first one aborted with a good enough plan without having to go through all the stages.
Gail, I'm sure I've read somewhere that the process of plan checking is susceptible to system pressures, that is, fewer plans can be checked within the allocated time - but I haven't found a reference for this anywhere. Do you know if this is indeed the case?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 4, 2011 at 10:05 am
Lowell (11/4/2011)
well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?i thought it would say something else...i've got to look at my notes on this.
I see. It is interesting how come with one query the optimizer found a good enough plan and with the other one it didn't.
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
November 4, 2011 at 10:11 am
ChrisM@Work (11/4/2011)
GilaMonster (11/4/2011)
Lowell (11/4/2011)
well, the part i was trying to talk about was the screenshot below; see the difference for the "good enough plan found" vs the other that did not find a plan?The other did find a plan, it just had to go through all stages of optimisation before it found one that it considered good enough, whereas the first one aborted with a good enough plan without having to go through all the stages.
Gail, I'm sure I've read somewhere that the process of plan checking is susceptible to system pressures, that is, fewer plans can be checked within the allocated time - but I haven't found a reference for this anywhere. Do you know if this is indeed the case?
I'm running the queries on my laptop. I'm the only one that works with it and there are no jobs. I still get different queries for each procedure. I don't think that it has anything to do with resources
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
November 4, 2011 at 8:09 pm
Adi Cohn-120898 (11/4/2011)
Inna is a colleague of mine that came across an interesting problem. She had a query that had a wrong query plan. She modified the order of the conditions in the where clause, and got a better query plan and the query ran much faster. She even wrote a small script that reproduces this behavior. You’ll might need to run the script 2 or 3 times because it is based on random data, but it does show different query plans for 2 queries that are logically the same. I think that this is a bug, but I might be wrong. Can anyone have an explanation for that?
Use tempdb
GO
CREATE TABLE Inna (ID int identity(1,1), DataCLInd datetime constraint DF_Inna_DataCLInd default (getdate()),
SecField INT, Third INT,
constraint PK_Inna_DataCLInd_SecField_Third Primary Key Clustered (DataCLInd,SecField,Third)
)
insert into Inna (DataCLInd,
SecField,
Third)
Values ('20110101' , 2,23)
DECLARE @ROWCOUNT int
select @ROWCOUNT = 1
--Inserting data. Since we are inserting random data, we will have few errors, but at the end we'll have
--more then 4000000 records and very fast
While @ROWCOUNT < 2000000
BEGIN
insert into Inna (DataCLInd,
SecField,
Third)
SELECT dateadd(MINUTE,+datepart(MILLISECOND,GETDATE())*datepart(MINUTE,GETDATE()),DataCLInd),
ID * 1.5,
ID * 4.5
FROM Inna
SELECT @ROWCOUNT = @@ROWCOUNT
END
go
--First query
declare @LastRunTime datetime
select @LastRunTime = '2011-01-16 00:00:52.870'
;WITH RankedRows AS (
SELECT ID,DataCLInd,SecField,Third,
RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum
FROM Inna
WHERE DataCLInd >= @LastRunTime AND DataCLInd >= DATEADD(MONTH,-1,getutcdate()))
SELECT ID,DataCLInd,SecField,Third
FROM RankedRows
WHERE RowNum = 1
go
--Second query
declare @LastRunTime datetime
select @LastRunTime = '2011-01-16 00:00:52.870'
;WITH RankedRows AS (
SELECT ID,DataCLInd,SecField,Third,
RANK() OVER (Partition by SecField ORDER BY DataCLInd DESC) AS RowNum
FROM Inna
WHERE DataCLInd >= DATEADD(MONTH,-1,getutcdate()) AND DataCLInd >= @LastRunTime)
SELECT ID,DataCLInd,SecField,Third
FROM RankedRows
WHERE RowNum = 1
go
Adi
Adi,
Different lesson and just a suggestion to speed up writing and executing test table construction... try the following. The ISNULLs are there to make NOT NULL columns. I think you'll find it much faster than your loop. Easier to control the absolute size and range of the data, as well.
USE TempDB
;
SELECT TOP 2000000
ID = IDENTITY(INT,1,1),
DataCLInd = ISNULL(RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'2011','2012')+CAST('2011' AS DATETIME),0),
SecField = ISNULL(ABS(CHECKSUM(NEWID()))%1000000,0),
Third = ISNULL(ABS(CHECKSUM(NEWID()))%1000000,0)
INTO dbo.Inna
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
ALTER TABLE dbo.Inna
ADD CONSTRAINT PK_Inna_DataCLInd_SecField_Third PRIMARY KEY CLUSTERED (DataCLInd,SecField,Third)
;
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply