Coerce SQL Server to use HASH JOIN physical operator for ANTI SEMI-JOIN logical operator

  • SQL Server 2005 Enterprise

    Hi

    I have a problem with a query. This query is on a read only (log shipped) database that cannot be changed, so no indexes or statistics can be created.

    My issue is that I have a NOT EXISTS clause that was using a NESTED LOOPS join. This was inefficient and was due to the optimiser woefully underestimating the number of rows in the outer table of the nested loops. I added OPTION(HASH JOIN) to the query and it ran much better. However, I have now added an OUTER APPLY join to the query. OUTER APPLY has to use NESTED LOOPS and it does not appear to accept a join hint. As such I have had to remove OPTION(HASH JOIN) and performance is terrible again.

    So my question is: how can I coerce the optimiser to use a HASH JOIN for an anti semi-join in a query that also contains an OUTER APPLY?

    I am using the OUTER APPLY to solve the TOP N OVER GROUP problem so worst case is that I instead use ROW_NUMBER() windowing function.

    Many thanks

    Dan

  • Changing this from an OUTER APPLY to a LEFT OUTER JOIN with ROW_NUMBER curiously made the optimiser use a HASH JOIN for the anti-semi join so the hint is not now needed at all. This change of physical op knocks the logical reads down from 33,518,533 to 1,803,954.

    I suppose the problem is resolved other than for interest.

  • Sounds like you got past the issue. I strongly recommend the approach you took, rewriting the query to use better mechanisms as a tuning method. I really, really, hate what I'm about to suggest.

    If you've already gone the route of restructuring the query and you're still getting a less than beneficial exec plan, you can try using JOIN hints. This example is from Books Online:

    SELECT p.Name, pr.ProductReviewID

    FROM Production.Product p

    LEFT OUTER HASH JOIN Production.ProductReview pr

    ON p.ProductID = pr.ProductID

    ORDER BY ProductReviewID DESC;

    I've found that doing this can cause many more problems than it can solve, but when every other method has failed, it is available to you.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks Grant - that is what I was trying to do. However, AFAIK you cannot use a join hint for an OUTER APPLY and I know you can't use one for EXISTS. I would be comfortable forcing a particular physical join here - the optimiser was using an inappropriate NESTED LOOPS.

  • You can use join hints for both.

    Create Table T1 (

    SomeVal int

    )

    Create table T2 (

    SomeVal int

    )

    insert into T1 values (1),(2)

    insert into T2 values (1),(3)

    GO

    SELECT * FROM T1

    WHERE NOT EXISTS (SELECT 1 FROM T2 WHERE T1.SomeVal = T2.SomeVal)

    SELECT * FROM T1

    WHERE NOT EXISTS (SELECT 1 FROM T2 WHERE T1.SomeVal = T2.SomeVal)

    OPTION (MERGE JOIN)

    SELECT * FROM T1

    WHERE NOT EXISTS (SELECT 1 FROM T2 WHERE T1.SomeVal = T2.SomeVal)

    OPTION (HASH JOIN)

    SELECT * FROM T1

    OUTER APPLY (SELECT * FROM T2 WHERE T1.SomeVal = T2.SomeVal) a

    WHERE a.SomeVal IS NULL

    SELECT * FROM T1

    OUTER APPLY (SELECT * FROM T2 WHERE T1.SomeVal = T2.SomeVal) a

    WHERE a.SomeVal IS NULL

    OPTION (MERGE JOIN)

    SELECT * FROM T1

    OUTER APPLY (SELECT * FROM T2 WHERE T1.SomeVal = T2.SomeVal) a

    WHERE a.SomeVal IS NULL

    OPTION (HASH JOIN)

    Caveat is that it'll affect all the joins in the query. Haven't tested combo of both that and the inline join hints.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Not in 2005. Since your code is 2008 I assume things changed....

  • This is ok though:

    SELECT * FROM T1

    OUTER APPLY (SELECT * FROM T2 WHERE T1.SomeVal = T2.SomeVal) a

    WHERE a.SomeVal IS NULL

    OPTION (LOOP JOIN)

  • hallidayd (12/29/2010)


    Not in 2005. Since your code is 2008 I assume things changed....

    The apply failed on 2005, the exists worked just fine (2005 SP3). Changes to the optimiser I would assume.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply