Low Cardinality estimate in Hash match operator

  • Hi Guys,

    Sorry to post this in 2014 forum, I did that because the problem seems to be fixed in the new cardinality estimator.

    I am trying to fix a performance problem with one of our reporting queries. I have just included the part of the query that is causing low estimate. This part is further joined with other tables and since the estimates for this one is so low further joins end up being nested loop and causing the query to run forever.

    select distinct nath.Transactionid

    from nath

    WHERE StatusId = 3 and

    Date IS NOT NULL and

    NOT EXISTS (SELECT 1 FROM nath

    WHERE nath.Transactionid= nath.Transactionid

    AND StatusId = 3

    AND HistoryId < nat.HistoryId)

    enter image description here

    In the above query the estimate for the hash match is only 1.17 but in reality there are 550K records coming out.

    For further trouble shooting I ran the exact same query on one of our SQL 2014 server and it produced correct results and the estimates was 557K on the hash match operator. I then tried trace flag 9481 to force the old cardinality estimator on 2014 and the estimates where back to 1.

    So I think the issue is something to do with old CE estimating self joins.

    I even tried trace flag 4199 on SQL 2008R2 but that did not help.

    Please let me know if there is anyway to fix this wrong estimate. Thanks

  • Issue can be simulated with the below script.

    create table nat ( c1 int identity(1,1) primary key,c2 int)

    declare @a int=1

    declare @b-2 int =1

    while @a<10000

    begin

    set @b-2=1

    while @b-2<=5

    begin

    insert into nat select @a

    set @b-2=@b+1

    end

    set @a=@a+1

    end

    select * from nat a where not exists (select 1 from nat b where b.c2=a.c2

    and b.c1<a.c1)

    OPTION(QUERYTRACEON 9481); -- estimated no of rows from hash match 1 (Old Cardinality Estimator)

    select * from nat a where not exists (select 1 from nat b where b.c2=a.c2

    and b.c1<a.c1) -- estimated no of rows from hash match 49995

  • Are you sure that you're querying the tables correctly?

    You don't have alias for your tables and that might be causing to use the incorrect columns on what should be a correlated subquery.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • jesijesijesi (8/13/2016)


    Hi Guys,

    Sorry to post this in 2014 forum, I did that because the problem seems to be fixed in the new cardinality estimator.

    I am trying to fix a performance problem with one of our reporting queries. I have just included the part of the query that is causing low estimate. This part is further joined with other tables and since the estimates for this one is so low further joins end up being nested loop and causing the query to run forever.

    select distinct nath.Transactionid

    from nath

    WHERE StatusId = 3 and

    Date IS NOT NULL and

    NOT EXISTS (SELECT 1 FROM nath

    WHERE nath.Transactionid= nath.Transactionid

    AND StatusId = 3

    AND HistoryId < nat.HistoryId)

    enter image description here

    In the above query the estimate for the hash match is only 1.17 but in reality there are 550K records coming out.

    For further trouble shooting I ran the exact same query on one of our SQL 2014 server and it produced correct results and the estimates was 557K on the hash match operator. I then tried trace flag 9481 to force the old cardinality estimator on 2014 and the estimates where back to 1.

    So I think the issue is something to do with old CE estimating self joins.

    I even tried trace flag 4199 on SQL 2008R2 but that did not help.

    Please let me know if there is anyway to fix this wrong estimate. Thanks

    This is well-understood behaviour. The IF NOT EXISTS () subquery is a row-goal query with a goal of 1. The old cardinality estimator results in a plan optimised for 1 row. This isn't always ideal. Good discussions of this here and here along with one or two suggestions for fixes. Another way is to compare the row-goal plan and the count(*) plan and nudge the optimiser into a hybrid plan using join hints. It can take time but sometimes the results are spectacular.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply