Execution plan estimates Join hint spilling.

Question

Execution plan estimates Join hint spilling.

Viewing 5 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

SilentOne - Monday, January 30, 2017 8:38 AM
ChrisM@Work - Monday, January 30, 2017 5:11 AM
The optimiser is timing out. If you can simplify the query you would almost certainly get a better plan.
You have duplication of effort: the group of tables Object4, Object14, Object13 are queried twice. Consider simplifying the CTE version using SUM(CASE..., and take a look at SUM() OVER() in books online.
You've mentioned that the query can be run with either very low rowcounts or much higher rowcounts. Unless you're sure that one plan will suit both scenarios, use two queries. It won't be too much effort if the query can be simplified - which seems likely.
Edit:
Remove your existing join hints. Experiment with hash join hints on Object16 and Object17: estimated cost is 23%, with 25 executions. In fact there are about 30,000 executions and almost the entire table is pulled in each case.
Examine the generated plan for accuracy and see if the time out has been eliminated.
Strange, I had spotted and thought that my optimisation efforts so far had resolved the query timeout, more optimisation required. Do you happen to know if the query optimiser times out whether it has time to generate sensible estimates? This could explain some of the spilling.
When trying to stop the query timeouts I created Objects 1 and 6 (temporary tables) that I created to break up the query and avoid more repetition. I've been considering breaking the query up into reusable objects (views, inline table functions) but although it would make the code simpler and more readable. I expect the query plan to mash the objects together creating a large execution that would suffer from optimisation timeouts? But I don't know and will need to test this.
I will re-write that CTE.
Thanks,

To be honest it's really difficult to tell with the plan being so anonymised - but here's what I'd do.
Firstly, I'd get that group of three tables into a #temp table without aggregation and with an appropriate clustered index to support either the main query or the aggregation later. Reference this in your main query instead of the three tables, and join the aggregate of it as a derived table. This could offer you a path to simplification.

Secondly, I'd get some ordinary indexes in place to support some of those joins - even if they are only there temporarily. This is a trick I use regularly for providing SQL Server with additional information when the optimiser times out.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Ed Wagner SSC Guru Points: 287026 More actions · Answer 2

ChrisM@Work - Tuesday, January 31, 2017 6:06 AM
SilentOne - Monday, January 30, 2017 8:38 AM
ChrisM@Work - Monday, January 30, 2017 5:11 AM
The optimiser is timing out. If you can simplify the query you would almost certainly get a better plan.
You have duplication of effort: the group of tables Object4, Object14, Object13 are queried twice. Consider simplifying the CTE version using SUM(CASE..., and take a look at SUM() OVER() in books online.
You've mentioned that the query can be run with either very low rowcounts or much higher rowcounts. Unless you're sure that one plan will suit both scenarios, use two queries. It won't be too much effort if the query can be simplified - which seems likely.
Edit:
Remove your existing join hints. Experiment with hash join hints on Object16 and Object17: estimated cost is 23%, with 25 executions. In fact there are about 30,000 executions and almost the entire table is pulled in each case.
Examine the generated plan for accuracy and see if the time out has been eliminated.
Strange, I had spotted and thought that my optimisation efforts so far had resolved the query timeout, more optimisation required. Do you happen to know if the query optimiser times out whether it has time to generate sensible estimates? This could explain some of the spilling.
When trying to stop the query timeouts I created Objects 1 and 6 (temporary tables) that I created to break up the query and avoid more repetition. I've been considering breaking the query up into reusable objects (views, inline table functions) but although it would make the code simpler and more readable. I expect the query plan to mash the objects together creating a large execution that would suffer from optimisation timeouts? But I don't know and will need to test this.
I will re-write that CTE.
Thanks,
To be honest it's really difficult to tell with the plan being so anonymised - but here's what I'd do.
Firstly, I'd get that group of three tables into a #temp table without aggregation and with an appropriate clustered index to support either the main query or the aggregation later. Reference this in your main query instead of the three tables, and join the aggregate of it as a derived table. This could offer you a path to simplification.
Secondly, I'd get some ordinary indexes in place to support some of those joins - even if they are only there temporarily. This is a trick I use regularly for providing SQL Server with additional information when the optimiser times out.

Chris, I'm glad to hear that I wasn't the only one having problems with the anonymized plan. 😉
I almost posted thoughts about simplifying it, but didn't because I didn't get the details of the query. The functions were really distracting me.

SilentOne, take the time to design the temp table well, especially in the data types and nullability of columns. And Chris' point on indexing was right on. The optimizer uses all available data (about the data) in creating a query plan. Temp tables do get statistics, so the nullability will be considered.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

SilentOne Ten Centuries Points: 1325 More actions · Answer 3

Ed Wagner - Tuesday, January 31, 2017 6:54 AM
ChrisM@Work - Tuesday, January 31, 2017 6:06 AM
SilentOne - Monday, January 30, 2017 8:38 AM
ChrisM@Work - Monday, January 30, 2017 5:11 AM
The optimiser is timing out. If you can simplify the query you would almost certainly get a better plan.
You have duplication of effort: the group of tables Object4, Object14, Object13 are queried twice. Consider simplifying the CTE version using SUM(CASE..., and take a look at SUM() OVER() in books online.
You've mentioned that the query can be run with either very low rowcounts or much higher rowcounts. Unless you're sure that one plan will suit both scenarios, use two queries. It won't be too much effort if the query can be simplified - which seems likely.
Edit:
Remove your existing join hints. Experiment with hash join hints on Object16 and Object17: estimated cost is 23%, with 25 executions. In fact there are about 30,000 executions and almost the entire table is pulled in each case.
Examine the generated plan for accuracy and see if the time out has been eliminated.
Strange, I had spotted and thought that my optimisation efforts so far had resolved the query timeout, more optimisation required. Do you happen to know if the query optimiser times out whether it has time to generate sensible estimates? This could explain some of the spilling.
When trying to stop the query timeouts I created Objects 1 and 6 (temporary tables) that I created to break up the query and avoid more repetition. I've been considering breaking the query up into reusable objects (views, inline table functions) but although it would make the code simpler and more readable. I expect the query plan to mash the objects together creating a large execution that would suffer from optimisation timeouts? But I don't know and will need to test this.
I will re-write that CTE.
Thanks,
To be honest it's really difficult to tell with the plan being so anonymised - but here's what I'd do.
Firstly, I'd get that group of three tables into a #temp table without aggregation and with an appropriate clustered index to support either the main query or the aggregation later. Reference this in your main query instead of the three tables, and join the aggregate of it as a derived table. This could offer you a path to simplification.
Secondly, I'd get some ordinary indexes in place to support some of those joins - even if they are only there temporarily. This is a trick I use regularly for providing SQL Server with additional information when the optimiser times out.
Chris, I'm glad to hear that I wasn't the only one having problems with the anonymized plan. 😉
I almost posted thoughts about simplifying it, but didn't because I didn't get the details of the query. The functions were really distracting me.
SilentOne, take the time to design the temp table well, especially in the data types and nullability of columns. And Chris' point on indexing was right on. The optimizer uses all available data (about the data) in creating a query plan. Temp tables do get statistics, so the nullability will be considered.

I'm really surprise how much help and great suggestions I've been provided based on the anonymized plan. Thank you.

Unfortunately, I'm going to call time on how much more optimisation I can performance on this stored procedure. I've managed to ensure it is consistent and performs well. I've learnt ALOT trying to improve the original stored procedure and further improving the original solution to avoid the disk spilling.

I've not managed to get rid of the disk spilling entirely. Originally, this stored procedure was one really long inline dynamic SQL query from the web frontend. I've created one stored procedure that dynamically appends optional parameters to the where clause otherwise I would need to create 6 different stored procedures. I've added a few indexes to the temporary tables but I have kept the JOIN hints.

The biggest two problems I face is this is a report done on an OLTP system but should be done in our data warehouse and the design of the OLTP system is very poor.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

Ed Wagner - Tuesday, January 31, 2017 6:54 AM
ChrisM@Work - Tuesday, January 31, 2017 6:06 AM
SilentOne - Monday, January 30, 2017 8:38 AM
ChrisM@Work - Monday, January 30, 2017 5:11 AM
The optimiser is timing out. If you can simplify the query you would almost certainly get a better plan.
You have duplication of effort: the group of tables Object4, Object14, Object13 are queried twice. Consider simplifying the CTE version using SUM(CASE..., and take a look at SUM() OVER() in books online.
You've mentioned that the query can be run with either very low rowcounts or much higher rowcounts. Unless you're sure that one plan will suit both scenarios, use two queries. It won't be too much effort if the query can be simplified - which seems likely.
Edit:
Remove your existing join hints. Experiment with hash join hints on Object16 and Object17: estimated cost is 23%, with 25 executions. In fact there are about 30,000 executions and almost the entire table is pulled in each case.
Examine the generated plan for accuracy and see if the time out has been eliminated.
Strange, I had spotted and thought that my optimisation efforts so far had resolved the query timeout, more optimisation required. Do you happen to know if the query optimiser times out whether it has time to generate sensible estimates? This could explain some of the spilling.
When trying to stop the query timeouts I created Objects 1 and 6 (temporary tables) that I created to break up the query and avoid more repetition. I've been considering breaking the query up into reusable objects (views, inline table functions) but although it would make the code simpler and more readable. I expect the query plan to mash the objects together creating a large execution that would suffer from optimisation timeouts? But I don't know and will need to test this.
I will re-write that CTE.
Thanks,
To be honest it's really difficult to tell with the plan being so anonymised - but here's what I'd do.
Firstly, I'd get that group of three tables into a #temp table without aggregation and with an appropriate clustered index to support either the main query or the aggregation later. Reference this in your main query instead of the three tables, and join the aggregate of it as a derived table. This could offer you a path to simplification.
Secondly, I'd get some ordinary indexes in place to support some of those joins - even if they are only there temporarily. This is a trick I use regularly for providing SQL Server with additional information when the optimiser times out.
Chris, I'm glad to hear that I wasn't the only one having problems with the anonymized plan. 😉
I almost posted thoughts about simplifying it, but didn't because I didn't get the details of the query. The functions were really distracting me.
SilentOne, take the time to design the temp table well, especially in the data types and nullability of columns. And Chris' point on indexing was right on. The optimizer uses all available data (about the data) in creating a query plan. Temp tables do get statistics, so the nullability will be considered.

Yeah tell me Ed! You just know there are opportunities there!

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Ed Wagner SSC Guru Points: 287026 More actions · Answer 5

ChrisM@Work - Tuesday, January 31, 2017 9:57 AM
Ed Wagner - Tuesday, January 31, 2017 6:54 AM
ChrisM@Work - Tuesday, January 31, 2017 6:06 AM
SilentOne - Monday, January 30, 2017 8:38 AM
ChrisM@Work - Monday, January 30, 2017 5:11 AM
The optimiser is timing out. If you can simplify the query you would almost certainly get a better plan.
You have duplication of effort: the group of tables Object4, Object14, Object13 are queried twice. Consider simplifying the CTE version using SUM(CASE..., and take a look at SUM() OVER() in books online.
You've mentioned that the query can be run with either very low rowcounts or much higher rowcounts. Unless you're sure that one plan will suit both scenarios, use two queries. It won't be too much effort if the query can be simplified - which seems likely.
Edit:
Remove your existing join hints. Experiment with hash join hints on Object16 and Object17: estimated cost is 23%, with 25 executions. In fact there are about 30,000 executions and almost the entire table is pulled in each case.
Examine the generated plan for accuracy and see if the time out has been eliminated.
Strange, I had spotted and thought that my optimisation efforts so far had resolved the query timeout, more optimisation required. Do you happen to know if the query optimiser times out whether it has time to generate sensible estimates? This could explain some of the spilling.
When trying to stop the query timeouts I created Objects 1 and 6 (temporary tables) that I created to break up the query and avoid more repetition. I've been considering breaking the query up into reusable objects (views, inline table functions) but although it would make the code simpler and more readable. I expect the query plan to mash the objects together creating a large execution that would suffer from optimisation timeouts? But I don't know and will need to test this.
I will re-write that CTE.
Thanks,
To be honest it's really difficult to tell with the plan being so anonymised - but here's what I'd do.
Firstly, I'd get that group of three tables into a #temp table without aggregation and with an appropriate clustered index to support either the main query or the aggregation later. Reference this in your main query instead of the three tables, and join the aggregate of it as a derived table. This could offer you a path to simplification.
Secondly, I'd get some ordinary indexes in place to support some of those joins - even if they are only there temporarily. This is a trick I use regularly for providing SQL Server with additional information when the optimiser times out.
Chris, I'm glad to hear that I wasn't the only one having problems with the anonymized plan. 😉
I almost posted thoughts about simplifying it, but didn't because I didn't get the details of the query. The functions were really distracting me.
SilentOne, take the time to design the temp table well, especially in the data types and nullability of columns. And Chris' point on indexing was right on. The optimizer uses all available data (about the data) in creating a query plan. Temp tables do get statistics, so the nullability will be considered.
Yeah tell me Ed! You just know there are opportunities there!

Yes there are...and we know it. But with nothing concrete, anything I suggest is going to be a guess based on nothing other than instinct.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions