Low Cardinality estimate in Hash match operator

Question

Post reply

Low Cardinality estimate in Hash match operator

jesijesijesi

Right there with Babe

Points: 761
More actions
August 13, 2016 at 2:03 am

#329338

Hi Guys,
Sorry to post this in 2014 forum, I did that because the problem seems to be fixed in the new cardinality estimator.
I am trying to fix a performance problem with one of our reporting queries. I have just included the part of the query that is causing low estimate. This part is further joined with other tables and since the estimates for this one is so low further joins end up being nested loop and causing the query to run forever.
select distinct nath.Transactionid
from nath
WHERE StatusId = 3 and
Date IS NOT NULL and
NOT EXISTS (SELECT 1 FROM nath
WHERE nath.Transactionid= nath.Transactionid
AND StatusId = 3
AND HistoryId < nat.HistoryId)
enter image description here
In the above query the estimate for the hash match is only 1.17 but in reality there are 550K records coming out.
For further trouble shooting I ran the exact same query on one of our SQL 2014 server and it produced correct results and the estimates was 557K on the hash match operator. I then tried trace flag 9481 to force the old cardinality estimator on 2014 and the estimates where back to 1.
So I think the issue is something to do with old CE estimating self joins.
I even tried trace flag 4199 on SQL 2008R2 but that did not help.
Please let me know if there is anyway to fix this wrong estimate. Thanks

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

jesijesijesi Right there with Babe Points: 761 More actions · Answer 1

Issue can be simulated with the below script.

create table nat ( c1 int identity(1,1) primary key,c2 int)

declare @a int=1

declare @b-2 int =1

while @a<10000

begin

set @b-2=1

while @b-2<=5

begin

insert into nat select @a

set @b-2=@b+1

end

set @a=@a+1

end

select * from nat a where not exists (select 1 from nat b where b.c2=a.c2

and b.c1<a.c1)

OPTION(QUERYTRACEON 9481); -- estimated no of rows from hash match 1 (Old Cardinality Estimator)

select * from nat a where not exists (select 1 from nat b where b.c2=a.c2

and b.c1<a.c1) -- estimated no of rows from hash match 49995

Luis Cazares SSC Guru Points: 183706 More actions · Answer 2

Are you sure that you're querying the tables correctly?

You don't have alias for your tables and that might be causing to use the incorrect columns on what should be a correlated subquery.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

jesijesijesi (8/13/2016)
Hi Guys,
Sorry to post this in 2014 forum, I did that because the problem seems to be fixed in the new cardinality estimator.
I am trying to fix a performance problem with one of our reporting queries. I have just included the part of the query that is causing low estimate. This part is further joined with other tables and since the estimates for this one is so low further joins end up being nested loop and causing the query to run forever.
select distinct nath.Transactionid
from nath
WHERE StatusId = 3 and
Date IS NOT NULL and
NOT EXISTS (SELECT 1 FROM nath
WHERE nath.Transactionid= nath.Transactionid
AND StatusId = 3
AND HistoryId < nat.HistoryId)
enter image description here
In the above query the estimate for the hash match is only 1.17 but in reality there are 550K records coming out.
For further trouble shooting I ran the exact same query on one of our SQL 2014 server and it produced correct results and the estimates was 557K on the hash match operator. I then tried trace flag 9481 to force the old cardinality estimator on 2014 and the estimates where back to 1.
So I think the issue is something to do with old CE estimating self joins.
I even tried trace flag 4199 on SQL 2008R2 but that did not help.
Please let me know if there is anyway to fix this wrong estimate. Thanks

This is well-understood behaviour. The IF NOT EXISTS () subquery is a row-goal query with a goal of 1. The old cardinality estimator results in a plan optimised for 1 row. This isn't always ideal. Good discussions of this here and here along with one or two suggestions for fixes. Another way is to compare the row-goal plan and the count(*) plan and nudge the optimiser into a hybrid plan using join hints. It can take time but sometimes the results are spectacular.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden