Bad estimates producing good plan ?

  • I understand i will have to probably post schema definition and the query, just for sake of conversation i would like to know what others have experienced.

    I have a query which until last week used  nested loop operator on cross apply on tables fairly large size ( outer table had about 10 million records and inner table about 1 Billion records). We have purged lot of the data from inner table, rebuild indices and updated statistics ( table size is down to 700 Million from 1 Billion ). Ofcourse i was hoping since i purged the data and did good maintenance i should be expecting the same query to be executing faster right? But to my surprise the same query is using Merge join operator now, AFAIK merge join is probably the right one since both data sets are large previously i think it was using nested loop because of stale statistics. My question is with stale statistics it was running faster and now with most updated statistics it is running slower. Has anyone experienced this and how did they handle? TIA

  • Weird.  When a Merge Join is appropriate, it should typically be the fastest type of join.  Make sure the query actually does have good row estimates.  Also, if you rebuilt an index, you should not have updated statistics on the index after the rebuild (reason: the rebuild will use all rows for the stats, a rebuild may use only a sampling of rows).

    Typically merge join is best when the tables are clustered appropriately so that the clustering keys match enough to use a merge join.  A nonclustered, covering index would also qualify for a merge join.  You don't want to see a large sort being done -- a smaller sort is fine -- in order to use a merge join, except in rare cases.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher - Wednesday, October 18, 2017 12:11 PM

    Weird.  When a Merge Join is appropriate, it should typically be the fastest type of join.  Make sure the query actually does have good row estimates.  Also, if you rebuilt an index, you should not have updated statistics on the index after the rebuild (reason: the rebuild will use all rows for the stats, a rebuild may use only a sampling of rows).

    Typically merge join is best when the tables are clustered appropriately so that the clustering keys match enough to use a merge join.  A nonclustered, covering index would also qualify for a merge join.  You don't want to see a large sort being done -- a smaller sort is fine -- in order to use a merge join, except in rare cases.

    I ended up using join hint Option ( loop join ), wasn't my first preference but based on our storage ( where tiny lookups are ridiculously fast) compared to full scan.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply