turning complex query into temp tables

  • Hello, do you have a general rule of thumb for breaking a complex query into temp tables? For someone who is not a sql specialist, a query with more than a few table joins can be complex. So a query with 10+ table joins can be overwhelming for someone who is not a sql specialist.

    One strategy is to break a problem into pieces so to speak by grouping together closely related tables into temp tables and then joining those temp tables together. This simplifies complex SQL and although not as performant as one big query it's much easier to understand. So do you have a general rule of thumb as far as a threshold for the number of joins you include in a query before you break the query into temp tables?

  • Why do you think the process of create and populating temp tables will be less confusing?

  • My rule of thumb is exactly what you just described: if I have trouble understanding a query, I want to break it into smaller pieces that are easier to understand.

    The performance will not necessarily be worse, either. The larger a query is, the larger the "solution space" that the query optimizer must search to create an optimal query execution plan. The larger the "solution space", the more likely the optimizer will not be able to search all of it before it times out.

    Some other tips for temp tables that might help performance:

    1. Try to align your temp tables with an index on your permanent tables.

    2. You don't have to load all of your data in a temp table right away. It might be more efficient to simply load the key values (e.g. Participant ID), until you get the final result set. Then go back and get any other data associated you need which is associated with those key values.

    I hope this is helpful.

  • ZZartin (3/5/2015)


    Why do you think the process of create and populating temp tables will be less confusing?

    Because you would be breaking the problem into pieces which inherently simplifies the problem. Instead of trying to conceptualize the interrelattionships of Table1, Table2, Table3 and Table4 as a single complex problem.

  • David Moutray (3/5/2015)


    My rule of thumb is exactly what you just described: if I have trouble understanding a query, I want to break it into smaller pieces that are easier to understand.

    The performance will not necessarily be worse, either. The larger a query is, the larger the "solution space" that the query optimizer must search to create an optimal query execution plan. The larger the "solution space", the more likely the optimizer will not be able to search all of it before it times out.

    Some other tips for temp tables that might help performance:

    1. Try to align your temp tables with an index on your permanent tables.

    2. You don't have to load all of your data in a temp table right away. It might be more efficient to simply load the key values (e.g. Participant ID), until you get the final result set. Then go back and get any other data associated you need which is associated with those key values.

    I hope this is helpful.

    Thanks! I seem to remember that there was a heuristic for a long time that queries that join more than 5 tables should generally be avoided. Is this still the general consensus?

  • sqlguy-736318 (3/5/2015)


    ZZartin (3/5/2015)


    Why do you think the process of create and populating temp tables will be less confusing?

    Because you would be breaking the problem into pieces which inherently simplifies the problem. Instead of trying to conceptualize the interrelattionships of Table1, Table2, Table3 and Table4 as a single complex problem.

    It really depends on the exact problem and what exactly is causing the confusion but no using temp tables does not mean you don't have to understand how all the tables are related. In some cases it can make things both more error prone as there's more steps that need to be accounted for and ultimately harder to understand the end result as you now have to work back through however many temp tables you have and how each of those is populated.

  • One benefit of breaking the query into temp tables is that allows the SQL writer to logically group related tables/data.

    For example, course-related tables/data could be grouped into 1 temp table and student-related tables/data could be grouped into a different temp table.

    Flattening the tables means that there are fewer decision points to understand how the data is being joined together.

  • sqlguy-736318 (3/5/2015)


    One benefit of breaking the query into temp tables is that allows the SQL writer to logically group related tables/data.

    For example, course-related tables/data could be grouped into 1 temp table and student-related tables/data could be grouped into a different temp table.

    Flattening the tables means that there are fewer decision points to understand how the data is being joined together.

    Without seeing the exact query you're working on it's hard to say, but I was just pointing out that A) joins are not always the most complex part of a query and B) there is no magical number of too many joins. So given that just saying throw temp tables into the mix because it will automatically make things easier to understand is not true.

    For example if you have someone can't work with a large number of related tables then you might have better luck throwing a view on top of the joins and not even telling them about the actual underlying data structure.

  • sqlguy-736318 (3/5/2015)

    Thanks! I seem to remember that there was a heuristic for a long time that queries that join more than 5 tables should generally be avoided. Is this still the general consensus?

    Only if you are working on a server that is 20 years old! 🙂

    Back in the 90's (when flip phones were really cool), the hardware did have trouble with queries with a large number of joins. That is not the case with modern hardware. The limitation now is the capacity of the query optimizer to create an optimal execution plan (the "solution space" problem I mentioned earlier), and the capacity of the developer to understand the code.

    That last limitation is the most important. We haven't gotten any big upgrades on our wetware in a while. 🙂

    I should mention, of course, that simply breaking down the problem into bite-size chunks is not sufficient. You need to give your temp tables meaningful names, for instance. (Object names and aliases are part of your code commentary!)

    You can also use other T-SQL features to "modularize" your code and make it easier to read. Common Table Expressions (CTE's) are an excellent way to do this. They don't simplify the optimizer's job, but they sure do make the code easier for humans to read.

  • David Moutray (3/5/2015)


    sqlguy-736318 (3/5/2015)

    Thanks! I seem to remember that there was a heuristic for a long time that queries that join more than 5 tables should generally be avoided. Is this still the general consensus?

    Only if you are working on a server that is 20 years old! 🙂

    Thanks! I haven't really studied SQL performance since the early 2000's so it sounds like the 5-table join rule is no longer true. Thanks for updating my archaic SQL knowledge 😉

  • I don't really subscribe (generally) to the notion that breaking a query into separate pieces using temp tables makes it any easier to understand.

    To me, properly formatting the query to make it readable (Make it Work, Make it Fast, Make it Pretty[/url]) is significantly more important. I'm wondering if you've got any standards in place to that effect?

    There are some (relatively few I'd say) cases where breaking up a query may make it faster because the original query overloaded the optimizer's ability to optimize. But remember that the optimizer does not find "the optimal query." It finds a query plan that will successfully execute reasonably well given time and constraints (e.g., cost) available to find it. Most of the time, bad (or not so good) query plans are due to other things like bad cardinality estimates due to statistics being out of date.

    Temp tables do require allocation of space, overhead of writing to that space/reading back from it, etc. that you should think about before using them.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (3/8/2015)


    I don't really subscribe (generally) to the notion that breaking a query into separate pieces using temp tables makes it any easier to understand.

    To me, properly formatting the query to make it readable (Make it Work, Make it Fast, Make it Pretty[/url]) is significantly more important. I'm wondering if you've got any standards in place to that effect?

    There are some (relatively few I'd say) cases where breaking up a query may make it faster because the original query overloaded the optimizer's ability to optimize. But remember that the optimizer does not find "the optimal query." It finds a query plan that will successfully execute reasonably well given time and constraints (e.g., cost) available to find it. Most of the time, bad (or not so good) query plans are due to other things like bad cardinality estimates due to statistics being out of date.

    Temp tables do require allocation of space, overhead of writing to that space/reading back from it, etc. that you should think about before using them.

    On the performance note using temp tables to break up queries also hurts sql servers ability to take advantage of parallelism.

  • At the risk of sounding harsh, if you cannot understand a complex query that has more than X number of joins, then is that something you really should be doing?

    Another "rule of thumb" is if joining lots and lots of tables is the norm in your database, then maybe you need to revisit your schema.

    The "5 join rule" is garbage. That is a fallacy form who knows where.

    The answer to this question really is "It depends".

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Personally, I think breaking things into pieces is easier to understand because it can be explained from start to finish. The better you can understand the process, the easier things will go in any language let alone SQL.

    I would suggested another approach too. Try formatting the code in a way that makes it more clear. JOIN's specifically can be arranged in a way that makes more sense. See the attachments below. Please forgive the names, it's all fabricated. No idea what Notepad+ is bolding random keywords.

    SQL Prompt Formatting

    This one is a bit standard.

    My Tabbed Formatting

    This one makes it a little clearer on what's happening.

  • Michael L John (3/9/2015)


    At the risk of sounding harsh, if you cannot understand a complex query that has more than X number of joins, then is that something you really should be doing?

    I agree that this sounds a bit harsh. But then to me so is your avatar ("no triggers"). 🙂

    In SQL as in every programming language you learn by doing. If you don't push yourself to write ever more complex queries, algorithmic solutions, etc. your skill set will remain stagnant.

    Edit: Note that this is not meant to imply that complexity just for complexity's sake is a good thing. Queries should be in the simplest form possible. Always. Unless there's an overriding reason to increase complexity to achieve another goal, e.g., performance.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply