March 25, 2014 at 9:33 am
I've a query like so:
select distinct t1.f1 from t1 join t2 on t1.f2 = t2.f2
join t3 on t1.f4 = t3.f4
join t4 on t2.f5 = t4.f5
where t3.dt is not null and t2.f9 = xxx
it takes 6 min to execute. If i change select to do a select t1.* instead of select distinct t1.f1, it takes 3 seconds.
When i look at the execution plans for both queries the difference is that in 1st query there is a nested loop join where there is a hash join in the 2nd.
So if I change the query to:
select distinct t1.f1 from t1 inner hash join t2 on t1.f2 = t2.f2
join t3 on t1.f4 = t3.f4
join t4 on t2.f5 = t4.f5
where t3.dt is not null and t2.f9 = xxx
The query takes 3 seconds.
The problem is that this is dynamically generated SQL in an application and we don't want to use hints within it. I've tried updating stats on all tables in the joins, added covering indexes etc. nothing changed the plan. What can be done to force the plan to change from nested loop join to a hash join without using a hint explicitly?
Thanks
March 25, 2014 at 9:42 am
Table definitions, index definitions and execution plan please.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 25, 2014 at 10:26 am
I think the "Distinct" stinks..
Try to rewriting the query using aggregate functions instead
You need to look at the query plan..
I have a personal pet hate using the Distinct keyword. Overly used in lazy programming..
March 25, 2014 at 11:26 am
If the requirement is to remove duplicate rows, use DISTINCT
If the requirement is to produce aggregates with grouping, use GROUP BY
Seeing Group By with no aggregates suggests that someone read the myth that group by is faster/better than DISTINCT and didn't do any tests.
Specify the query in the simplest, most obvious way, if after testing performance is inadequate then consider alternate query forms and test them. If they're faster great, if not, go back to the simple form.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 25, 2014 at 12:03 pm
Do you need the aggregation of DISTINCT? Are there data problems you could fix instead of using DISTINCT? You might want to try attacking that. But, I'm with Gail, without seeing what's happening through the execution plan, there's not much to suggest here.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 25, 2014 at 6:42 pm
I'm afraid I can't put table definitions and query plans here. I can send you via PM.
March 26, 2014 at 2:19 am
If you PM them to me I'll delete them (not a private consultant any longer). The reason we're asking for them to be posted here is so that anyone wandering by can help, not just the person you send the details to
You can obfuscate both, replace the table and column names with generic names, run a search/replace over the plan to do the same (just make sure it still loads). We don't care about the names, we care about the operators, their properties and what that says about what SQL's doing
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply