turning complex query into temp tables

  • ZZartin (3/9/2015)


    dwain.c (3/8/2015)


    I don't really subscribe (generally) to the notion that breaking a query into separate pieces using temp tables makes it any easier to understand.

    To me, properly formatting the query to make it readable (Make it Work, Make it Fast, Make it Pretty[/url]) is significantly more important. I'm wondering if you've got any standards in place to that effect?

    There are some (relatively few I'd say) cases where breaking up a query may make it faster because the original query overloaded the optimizer's ability to optimize. But remember that the optimizer does not find "the optimal query." It finds a query plan that will successfully execute reasonably well given time and constraints (e.g., cost) available to find it. Most of the time, bad (or not so good) query plans are due to other things like bad cardinality estimates due to statistics being out of date.

    Temp tables do require allocation of space, overhead of writing to that space/reading back from it, etc. that you should think about before using them.

    On the performance note using temp tables to break up queries also hurts sql servers ability to take advantage of parallelism.

    Does it? Could you provide an example?


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Just do what makes it easier for you to understand, don't worry about what others think. I like to go with the KISS method, Keep It Simple Stupid. You are the one who will be maintaining it, make it easier for you. But formatting does help a lot to make complex SQL easier to understand, and I prefer formatting similar to the first example shown by xsevensinzx, pick the format that best works for you. I know when I started out I would always break it down to simple queries, only joining a few tables at a time into temp tables. After a while I got tired of going back and changing the SQL for these temp tables to bring in extra fields I needed for the new joins I needed, so it was a natural progression to writing more complex code. As others have said you will loose the indexes with the temp tables, so performance may be a problem at some point. As you get better you and understand your data and how the joins work I would bet you will start to join more and more tables together. I always try and keep in mind that I will not always be the one maintaining this code, so by keeping it simple I make it easier for someone new to understand and follow without them having to have me explain the code.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • dwain.c (3/9/2015)


    ZZartin (3/9/2015)


    dwain.c (3/8/2015)


    I don't really subscribe (generally) to the notion that breaking a query into separate pieces using temp tables makes it any easier to understand.

    To me, properly formatting the query to make it readable (Make it Work, Make it Fast, Make it Pretty[/url]) is significantly more important. I'm wondering if you've got any standards in place to that effect?

    There are some (relatively few I'd say) cases where breaking up a query may make it faster because the original query overloaded the optimizer's ability to optimize. But remember that the optimizer does not find "the optimal query." It finds a query plan that will successfully execute reasonably well given time and constraints (e.g., cost) available to find it. Most of the time, bad (or not so good) query plans are due to other things like bad cardinality estimates due to statistics being out of date.

    Temp tables do require allocation of space, overhead of writing to that space/reading back from it, etc. that you should think about before using them.

    On the performance note using temp tables to break up queries also hurts sql servers ability to take advantage of parallelism.

    Does it? Could you provide an example?

    If your "complex" query could say take advantage of 4 cores but the queries that populate your temp tables can only use 2 and have to be run sequential well then there's not much SQL Server can do about that.

  • I agree with the others.

    -Format the query so that it is easy to parse sections and keep track.

    -Use intuitive alias names so you can easily trace back which object a column reference is from, and what information is in that column.

    -As with any other type of coding, use comments to make things clearer wherever they are not.

    Rewriting the code to sacrifice performance and optimal design just to make it easier to follow makes little sense.

  • Nevyn (3/10/2015)


    Rewriting the code to sacrifice performance and optimal design just to make it easier to follow makes little sense.

    I quess erything I deal with is batch processing over night, so if a complex SQL took about a minute to run, but is hard to understand, debug, and I break it down, it to let's say 4 queries, and now all 4 take 4 minutes to run in total. Well that makes sense to me, if it is going to take a developer less time to find the change/fix needed in this situation, then it's worth it. Now if those 4 queries run for much more than that, like an hour, then you need to either tune those for better performance or go back to the complex SQL. As in everything, it depends, and it's easy for us to say what is the best way, but we all are not at the same knowledge level, or in the same situations. Do what works best for you and your comapny's performance needs.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • below86 (3/10/2015)


    Nevyn (3/10/2015)


    Rewriting the code to sacrifice performance and optimal design just to make it easier to follow makes little sense.

    I quess erything I deal with is batch processing over night, so if a complex SQL took about a minute to run, but is hard to understand, debug, and I break it down, it to let's say 4 queries, and now all 4 take 4 minutes to run in total. Well that makes sense to me, if it is going to take a developer less time to find the change/fix needed in this situation, then it's worth it. Now if those 4 queries run for much more than that, like an hour, then you need to either tune those for better performance or go back to the complex SQL. As in everything, it depends, and it's easy for us to say what is the best way, but we all are not at the same knowledge level, or in the same situations. Do what works best for you and your comapny's performance needs.

    The problem with that is that even in batch processing it is seldom (if ever) only one query.

    Once an approach becomes the way you do things, it tends to get used everywhere. And the impact can be non-linear.

    You start with the 1 minute query that you made into a 4 minute query, and its not the end of the world. Life stayed easier, and some overnight job got a tad slower.

    But of course, just because you took 4 times as long does not mean you used 4 times the resources. You might have be taking HUNDREDS of times the resources.

    And over time, as you add more solutions to your nightly runs with the same philosophy, you end up with a longer than desired loading window. Or you discover that when two 4 minute solutions run at the same time, they both take 30 minutes.

    And at the same time, you have babied your developers, so if you do run into the hour scenario and need the big complicated query, they are totally ill-equipped to support it.

  • Nevyn (3/10/2015)


    below86 (3/10/2015)


    Nevyn (3/10/2015)


    Rewriting the code to sacrifice performance and optimal design just to make it easier to follow makes little sense.

    I quess erything I deal with is batch processing over night, so if a complex SQL took about a minute to run, but is hard to understand, debug, and I break it down, it to let's say 4 queries, and now all 4 take 4 minutes to run in total. Well that makes sense to me, if it is going to take a developer less time to find the change/fix needed in this situation, then it's worth it. Now if those 4 queries run for much more than that, like an hour, then you need to either tune those for better performance or go back to the complex SQL. As in everything, it depends, and it's easy for us to say what is the best way, but we all are not at the same knowledge level, or in the same situations. Do what works best for you and your comapny's performance needs.

    The problem with that is that even in batch processing it is seldom (if ever) only one query.

    Once an approach becomes the way you do things, it tends to get used everywhere. And the impact can be non-linear.

    You start with the 1 minute query that you made into a 4 minute query, and its not the end of the world. Life stayed easier, and some overnight job got a tad slower.

    But of course, just because you took 4 times as long does not mean you used 4 times the resources. You might have be taking HUNDREDS of times the resources.

    And over time, as you add more solutions to your nightly runs with the same philosophy, you end up with a longer than desired loading window. Or you discover that when two 4 minute solutions run at the same time, they both take 30 minutes.

    And at the same time, you have babied your developers, so if you do run into the hour scenario and need the big complicated query, they are totally ill-equipped to support it.

    All valid points, I'm just saying everyone starts with small queries, I would bet none of us jumped straight into writting 'big complicated queries'. The inital post asked if it was a valid way to do it, break down the 'big complicated query'. And I would, if I didn't understand it. And as I stated earlier hopefully you would eventually start to write the more complicated code as you understand it more. I would bet we all could look back at SQL we wrote a year ago, or 5 years ago and see plenty of ways to make it more efficient, if not, you're better developers than me.:-)

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • below86 (3/10/2015)


    Nevyn (3/10/2015)


    below86 (3/10/2015)


    Nevyn (3/10/2015)


    Rewriting the code to sacrifice performance and optimal design just to make it easier to follow makes little sense.

    I quess erything I deal with is batch processing over night, so if a complex SQL took about a minute to run, but is hard to understand, debug, and I break it down, it to let's say 4 queries, and now all 4 take 4 minutes to run in total. Well that makes sense to me, if it is going to take a developer less time to find the change/fix needed in this situation, then it's worth it. Now if those 4 queries run for much more than that, like an hour, then you need to either tune those for better performance or go back to the complex SQL. As in everything, it depends, and it's easy for us to say what is the best way, but we all are not at the same knowledge level, or in the same situations. Do what works best for you and your comapny's performance needs.

    The problem with that is that even in batch processing it is seldom (if ever) only one query.

    Once an approach becomes the way you do things, it tends to get used everywhere. And the impact can be non-linear.

    You start with the 1 minute query that you made into a 4 minute query, and its not the end of the world. Life stayed easier, and some overnight job got a tad slower.

    But of course, just because you took 4 times as long does not mean you used 4 times the resources. You might have be taking HUNDREDS of times the resources.

    And over time, as you add more solutions to your nightly runs with the same philosophy, you end up with a longer than desired loading window. Or you discover that when two 4 minute solutions run at the same time, they both take 30 minutes.

    And at the same time, you have babied your developers, so if you do run into the hour scenario and need the big complicated query, they are totally ill-equipped to support it.

    All valid points, I'm just saying everyone starts with small queries, I would bet none of us jumped straight into writting 'big complicated queries'. The inital post asked if it was a valid way to do it, break down the 'big complicated query'. And I would, if I didn't understand it. And as I stated earlier hopefully you would eventually start to write the more complicated code as you understand it more. I would bet we all could look back at SQL we wrote a year ago, or 5 years ago and see plenty of ways to make it more efficient, if not, you're better developers than me.:-)

    Yeah but how much that code was you figuring things out on your own because noone was able to help you? And looking back how annoyed would you be if someone had actually been giving you bad, or at least incomplete, advice when they were trying to teach you?

  • UNCLE! UNCLE! To SQLguy-7... Ignore my posts, everyone else here is a better SQL developer than me. 🙂

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • CELKO (3/11/2015)


    SQL is supposed to be a declarative language, which means we want to write a single statement to answer a query or do a task. We do not want local variables, much less to fake a 1950's scratch tape file.

    OK, I guess I shouldn't be writing SQL, I don't know how I've made it for 15+ years doing just the opposite. So I have this 2,000 line code to build a cube fact table, it creates many #temp type tables and 'work' tables to get the final table. Now I'm sure I could take all that code and boil it down to one giant SQL statement, but I sure as hell would not want to have to maintain or debug that code.

    But noobs still think procedurally so they use3 SQL to write COBOL, BASIC, FORTRAN, or whatever their first language was.

    So you did not start with any other language than SQL?

    No, it doesn't. It just relocates complexity. Use a CTE or a VIEW instead.

    Any view I've ever used only slows things down. I'll just blame that on the DBA then, since I'm only a developer.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • below86 (3/11/2015)


    CELKO (3/11/2015)


    SQL is supposed to be a declarative language, which means we want to write a single statement to answer a query or do a task. We do not want local variables, much less to fake a 1950's scratch tape file.

    OK, I guess I shouldn't be writing SQL, I don't know how I've made it for 15+ years doing just the opposite. So I have this 2,000 line code to build a cube fact table, it creates many #temp type tables and 'work' tables to get the final table. Now I'm sure I could take all that code and boil it down to one giant SQL statement, but I sure as hell would not want to have to maintain or debug that code.

    Please please keep doing it this way! I will be able to retire earlier because I get paid a lot to fix this stuff.

    But noobs still think procedurally so they use3 SQL to write COBOL, BASIC, FORTRAN, or whatever their first language was.

    So you did not start with any other language than SQL?

    No. I started with C++, Cobol, and SQL at the same time.

    They are all very different languages, and far too many people try to crow-bar the same techniques into vastly different things. I had to spend a lot of time learning the differences and be able to put best practices in place regardless of the language being used.

    If I had a nickel for every time I have heard someone say "I have done these same things in every language I have ever written in" I would have been retired long ago.

    No, it doesn't. It just relocates complexity. Use a CTE or a VIEW instead.

    Any view I've ever used only slows things down. I'll just blame that on the DBA then, since I'm only a developer.

    Probably.

    The funny thing in this entire discussion is listening to what is comfortable to one person is completely baffling to another.

    I for one have lots of trouble reading code that is full of If's, temp tables, etc. etc. The "big complex query" is far more simpler for me to read and understand.

    If there is something to take away from this discussion, it would be that to be very good at what you do, you need to to understand and be able to make the right choice regarding the different ways to do it.

    A big complex set of joins may be the optimal way to do one thing, a series of temp tables and decisions may be the optimal way to do something else.

    But if you cannot understand the various ways to do things, and blindly do things a single way because you don't know any other way, then you are probably not doing your job.

    And, after 20 years doing this, I have found that the "big complex queries" are much more efficient than procedural coding in 90% of the cases.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • CELKO (3/11/2015)

    Another rule of thumb is that a query with too many tables is a bad code smell that says the schema design is a mess. A classic flaw is putting a constraint into a look-up table and doing a needless join when a CHECK() would have done the job. I have even see a foreign key to a table of sex_codes when a simple CHECK (sex_code IN (0,1,2,9) would do the job.

    Well, what if you working in a OLAP structure (snowflake) with multiple fact tables and 20+ dimensions where you have to query all facts? :hehe:

    I used #temp tables a lot in the past. Then switched to complex CTE's and views. But, I am a C++ programmer if anything and I break things down into simple steps. Overtime, things became a lot easier to do everything in one complex swoop. But other times, I still break things down just to keep overly complex processes easier to read and manage if the performance allows it.

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply