Performance problem... how can I prevent it?

Question

Post reply

Performance problem... how can I prevent it?

Jeff Moden

SSC Guru

Points: 1004425
More actions
February 22, 2008 at 11:25 am

#187005

Practicing what I preach, here's the data (creates a million row test table called jbmEmployee)...
--=============================================================================
-- Create a temporary test table with a million employees to stress test the
-- code with. THIS IS NOT PART OF THE SOLUTION!!!!
--=============================================================================
--===== If the temp table exists, drop it
-- IF OBJECT_ID('TempDB..jbmEmployee','U') IS NOT NULL
-- DROP TABLE jbmEmployee
--===== Create the temp table
CREATE TABLE jbmEmployee
(
EmployeeID INT PRIMARY KEY CLUSTERED, --don't name PK's on temp tables
FirstName VARCHAR(30),
LastName VARCHAR(30)
)
;WITH cteTally AS
(--==== Generates 1 million test table with predictable names in about 23 seconds
SELECT TOP 1000000
ROW_NUMBER() OVER (ORDER BY ac1.Object_ID) AS N
FROM Master.sys.ALL_Columns ac1
CROSS JOIN Master.sys.ALL_Columns ac2
)
INSERT INTO jbmEmployee
(EmployeeID, FirstName, LastName)
SELECT EmployeeID = t.N,
FirstName = 'FirstName'+RIGHT(REPLICATE('0',10)+CAST(t.N AS VARCHAR(10)),10),
LastName = 'LastName' +RIGHT(REPLICATE('0',10)+CAST(t.N AS VARCHAR(10)),10)
FROM cteTally t
I have the following code which runs just nasty fast because of some wonderful merge-joins when I use the hardcoded values below...
--===== Find and produce the page
SELECT e.EmployeeID,e.FirstName,e.LastName
FROM jbmEmployee e,
(
SELECT TOP (100) EmployeeID
FROM jbmEmployee
WHERE EmployeeID NOT IN (SELECT TOP (400000)
EmployeeID
FROM jbmEmployee
ORDER BY EmployeeID)
ORDER BY EmployeeID
) d
WHERE e.EmployeeID = d.EmployeeID
Returns in the blink of an eye in SSMS.
Now, when I convert the hard-coded constants to preset variables and run it...
--===== Set the local variables for pagesize and pagenumber
-- PageSize and PageNum would be parameters in a stored proc
SET @PageSize = 100
SET @PageNum = 4000
SET @Skip = @PageSize*@PageNum
--===== Find and produce the page
SELECT e.EmployeeID,e.FirstName,e.LastName
FROM jbmEmployee e,
(
SELECT TOP (@PageSize) EmployeeID
FROM jbmEmployee
WHERE EmployeeID NOT IN (SELECT TOP (@Skip)
EmployeeID
FROM jbmEmployee
ORDER BY EmployeeID)
ORDER BY EmployeeID
) d
WHERE e.EmployeeID = d.EmployeeID
...and it takes about a million years because the optimizer converts the merge-join to a nested-loop for this.
How can I keep this from happening? Am I going to have to use join hints to make this work or is there a setting I might have accidently set somewhere or ???
I'm thinking I don't like 2k5 very much if the optimizer is going to do the parameter sniffing thing even on such simple code...
--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
Change is inevitable... Change for the better is not.
Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Michael David Right there with Babe Points: 788 More actions · Answer 1

Did you try and build your query as a dynamic SQL. Then Excute it...

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 2

Oh yeah... and, of course, it works just fine. But, any idea on how to make it work without being dynamic though? Seems the optimizer is making a really poor choice and there's gotta be some way to get it to hammer through this without have to resort to optimizer "tricks".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 3

That is an interesting issue. I can't seem to see a way to force your syntax to be smart about it.

Of course - my next reaction was - why use a join?

Declare @fun int

set @fun=40000;

--===== Find and produce the page

SELECT Top 100 e.EmployeeID,e.FirstName,e.LastName

FROM

(

SELECT TOP (@fun+100)

EmployeeID,FirstName,LastName

FROM jbmEmployee

ORDER BY EmployeeID

) e

ORDER by EmployeeID desc

The same part is - if you hard-code the values - this isn't as performant as your method. But it does blow it away in the "parameterized test"...

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 4

Well, good! At least it's not just me! 😀 Thanks for the help, Matt and Michael.

I also ran into this the other night on something totally different in 2k5... I got around that one by forcing the join hints for the joins that were present during the hard-coding. Seems like 2k5 is starting to make the same poor choices that Oracle sometimes makes 😛

HEY! If any of you MVP'S are watching this, this would be a REAL good fix for the proposed SP3 for 2k5.

Anyone know Itzek? Maybe send him my test code and the problem... see what he comes up with... :w00t:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

For what it's worth - I've found that it's often necessary to blow away your indexing assumptions, and try others. Meaning - it seems that 2005 might prefer an entirely different indexing scheme than 2000....

Like - a multi-column index might need to be "flipped" (reverse the column order) for 2005 to want to use it. It's to the point now where I've built BOTH when I am not sure - and run for a while to see which one gets all of the action.....

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 6

Understood... but this doesn't even have anything to do with indexing... changing a constant to a variable causes a massively detrimental change in the execution plan. This is a bit like testing a car on high octane gas and then the system throws vegetable oil in the tank. On the similar problem I had, forcing the join type to what it was during constants caused it to work just as fast with the variables. The optimizer really should be smart enough to know that.

And this isn't even parameter sniffing because it's a script!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 7

Jeff Moden (2/22/2008)
Understood... but this doesn't even have anything to do with indexing... changing a constant to a variable causes a massively detrimental change in the execution plan. This is a bit like testing a car on high octane gas and then the system throws vegetable oil in the tank. On the similar problem I had, forcing the join type to what it was during constants caused it to work just as fast with the variables. The optimizer really should be smart enough to know that.
And this isn't even parameter sniffing because it's a script!

No disagreement there - just jumping on the "why is the optimizer wrecking my old code" bandwagon...:):w00t:

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 8

Heh... sorry about that, Matt. Thanks for the help...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

Jeff Moden (2/22/2008)
And this isn't even parameter sniffing because it's a script!

It's caused by the same thing. When the optimiser gets a query, it knows the value of any parameters (stored proc, auto parameterisation, or sp_executeSQL) and the value of any constants. However, it doesn't know the value of the variables. This is because optimisation happens for an entire batch and before anything is executed.

Imagine something like this

DECLARE @Var1 DATETIME

SELECT @Var1 = Somevalue From Sometable where SomeColumn = 3

SELECT Somecolumns FROM SomeLargeTable WHERE AnotherColumn > @Var1

At the point where the batch is compiled and optimised, none of the queries have run and the value of @Var1 is unknown.

Since the optimiser doesn't know the value of the variable, it can't use the statistics to see how many rows would be affected, because it doesn't know what data to check in the stats. Since it doesn't know the number of rows, it makes an estimate. I believe its estimate is 30% of the total rows in the table.

This can lead to really bad cardinality estimates (If you check your exec plan, you'll probably notice that estimated and actual rows are way different) and hence a very bad plan.

What I would recommend, rather than forcing the join hint (which is fixing the symptoms and not the root problem) is to use the OPTIMISE FOR hint, to give the optimiser a value to use to estimate cardinality.

HEY! If any of you MVP'S are watching this, this would be a REAL good fix for the proposed SP3 for 2k5.

I doubt this is something that can be fixed in a service pack. Would likely need large-scale changes in how (and when) the optimiser compiles a query.

p.s. If you would like some more details, pages 284-288 of Itzik's book 'Inside SQL Server 2005: T-SQL Programming' covers this, with some diagrams, sample queries and execution plans.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

matt stockham SSCrazy Eights Points: 9942 More actions · Answer 10

Gail is right on the mark. I wouldn't necessarily recommend "optimize for" though - you can use statement level recompile (new in 2005) to get the optional query plan using those variable values.

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 11

matt stockham (2/23/2008)
Gail is right on the mark. I wouldn't necessarily recommend "optimize for" though - you can use statement level recompile (new in 2005) to get the optional query plan using those variable values.

Thanks, Matt... do you, by any chance, know what the lookup in BOL is for "statement level recompile" (tried that, didn't work) might be so I can study it? That definitely sounds like a good possibility for known compile problems... better than a trip through Dynamic SQL, for sure.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 12

GilaMonster (2/23/2008)
It's caused by the same thing. When the optimiser gets a query, it knows the value of any parameters (stored proc, auto parameterisation, or sp_executeSQL) and the value of any constants. However, it doesn't know the value of the variables. This is because optimisation happens for an entire batch and before anything is executed.

That's kinda the conclusion I came to. Thanks for the great explanation, Gail.

You don't happen to have a good URL for how to overcome parameter sniffing, do you? I like Matt's idea because not all recompiles are bad and it seems to fit perfectly for something like this, but I don't want to limit possibilities... knowing more than one method is always good.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 13

Jeff Moden (2/23/2008)
matt stockham (2/23/2008)
Gail is right on the mark. I wouldn't necessarily recommend "optimize for" though - you can use statement level recompile (new in 2005) to get the optional query plan using those variable values.
Thanks, Matt... do you, by any chance, know what the lookup in BOL is for "statement level recompile" (tried that, didn't work) might be so I can study it? That definitely sounds like a good possibility for known compile problems... better than a trip through Dynamic SQL, for sure.

Never mind... I found it... OPTION (RECOMPILE)... right?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 14

Ok folks... the statement level OPTION (RECOMPILE) that Matt spoke of really did the trick both on this current problem and the previous problems I had. Estimated execution plan still shows the nested loop but the actual execution plan shows a nasty fast merge join and the difference in performance is just awesome, as expected.

The recompile takes less than a ms and, I guess, has the advantage of being able to handle anything rather than optimizing for a set number.

Awesome stuff. Thanks for the help, everyone!

Just for reference, here's the final test code including test table generation and the fix using OPTION (RECOMPILE)...

--=============================================================================

-- Create a temporary test table with a million employees to stress test the

-- code with. THIS IS NOT PART OF THE SOLUTION!!!!

--=============================================================================

--===== If the temp table exists, drop it

IF OBJECT_ID('TempDB..jbmEmployee','U') IS NOT NULL

DROP TABLE jbmEmployee

--===== Create the temp table

CREATE TABLE jbmEmployee

(

EmployeeID INT PRIMARY KEY CLUSTERED, --don't name PK's on temp tables

FirstName VARCHAR(30),

LastName VARCHAR(30)

)

;WITH cteTally AS

(--==== Generates 1 million test table with predictable names in about 23 seconds

SELECT TOP 1000000

ROW_NUMBER() OVER (ORDER BY ac1.Object_ID) AS N

FROM Master.sys.ALL_Columns ac1

CROSS JOIN Master.sys.ALL_Columns ac2

)

INSERT INTO jbmEmployee

(EmployeeID, FirstName, LastName)

SELECT EmployeeID = t.N,

FirstName = 'FirstName'+RIGHT(REPLICATE('0',10)+CAST(t.N AS VARCHAR(10)),10),

LastName = 'LastName' +RIGHT(REPLICATE('0',10)+CAST(t.N AS VARCHAR(10)),10)

FROM cteTally t

GO

---------------------------------------------------------------------------------------------------

--===== Hard-coding where variables would be produces nasty fast merge join.

SET STATISTICS TIME ON

--===== Find and produce the page

SELECT e.EmployeeID,e.FirstName,e.LastName

FROM jbmEmployee e,

(

SELECT TOP (100) EmployeeID

FROM jbmEmployee

WHERE EmployeeID NOT IN (SELECT TOP (400000)

EmployeeID

FROM jbmEmployee

ORDER BY EmployeeID)

ORDER BY EmployeeID

) d

WHERE e.EmployeeID = d.EmployeeID

GO

---------------------------------------------------------------------------------------------------

--===== Use of variables in this case cause a bad execution plan. This has the same values as

-- above, but uses loop joins because of the "parameter sniffing" problem. Adding the

-- OPTION (RECOMPILE) to the problematic statement with the parameters fixes the problem and

-- the code runs as fast as it's hard-coded cousin above.

SET STATISTICS TIME ON

DECLARE @PageSize INT

DECLARE @PageNum INT

DECLARE @Skip INT

--===== Set the local variables for pagesize and pagenumber

-- PageSize and PageNum would be parameters in a stored proc

SET @PageSize = 10

SET @PageNum = 90000

SET @Skip = @PageSize*@PageNum

--===== Find and produce the page

SELECT e.EmployeeID,e.FirstName,e.LastName

FROM jbmEmployee e,

(

SELECT TOP (@PageSize) EmployeeID

FROM jbmEmployee

WHERE EmployeeID NOT IN (SELECT TOP (@Skip)

EmployeeID

FROM jbmEmployee

ORDER BY EmployeeID)

ORDER BY EmployeeID

) d

WHERE e.EmployeeID = d.EmployeeID

OPTION (RECOMPILE) --LOOK!!! This overcomes the problem of parameter sniffing

--Without it, it'll do nested loops instead of a merge join.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)