hash join

  • hi,

    i have this quay:

    select * from data_accounts da

    left join dbo.data_users as us

    on da.userid=us.id

    on both table i have index in join condition

    there is reason why sql choose hash join insted of mereg join

    thanks alot sharon

  • The indexes probably aren't covering. With no filter conditions, SQL has to scan either the index or the table. I'll only scan the indexes if they are covering (otherwise too expensive to do lookups for every row).

    Since SQL has chosen to do a clustered index scan on both, the rows are likely not in order for the join, and SQL's calculated that a sort will be more expensive than a hash join.

    The hash join is one of the best joins for really large resultsets (like here) and it's the one that parallels the best.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • In addition to what Gail said you can force a merge join using 'LEFT MERGE JOIN'.

    It will *probably* be a lot more expensive



    Clear Sky SQL
    My Blog[/url]

  • first thank you for your reply,

    second i run this quary :

    select da.userid,us.id from usagc.dbo.data_accounts da

    inner join usagc.dbo.data_users as us

    on da.userid=us.id

    on both cloumn i put an index

    second thay are same data type

    and again sql using hashing insted of merge

    thanks sharon

  • Hash joins are the best join type for really huge resultsets. They parallel the best of any of the join types.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • As Gail said, SQL optimizer had choosen HASH join is the best option here as long as you have plenty of data in both tables for SORT is costly operation.

  • hi,

    maybe i miss something , but both clolumn hava index mean that cloumn ordered ( sql no need to sort them , thay all ready sort)

    all so the select is only on this cloumn ?

    so what i shuld do to make optimizer to decide merge join in plan?

    thank sharon

  • If the optimiser has decided on a hash join, it's because it's the cheapest join for the number of rows involved..

    Merge doesn't parallel well (iirc) and your query is running in parallel.

    Why do you want a merge join?

    Do you absolutely, 100%, definitely know that it's better for this query in all cases and circumstances?

    p.s. Please post table and index definitions.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Well maybe I'm missing something but what's the point of returning 12M + rows to anything?

    I'd understand the left join if you wanted to find unactive users. But with no where condition it's hard to guess what you want to do.

    More to the point, returning 12M rows will take time and there's little opportunities for tuning other than hardware here.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply