Logic and Theory / Optimisation

  • I've just optimised a query to run in 19 seconds rather than just under 5 mins. All I did was take a small select out of the where clause and create a temporary table of this universe first and evaluate against this instead. This was discovered by trial and error and a hunch based on previous knowledge.

    I've got a few ideas as to why this made such a difference, however, can anyone suggest any good reading on this sort of thing so one can bear it in mind when designing queries? At the moment some things I just 'know' will be far quicker than others but it would be good to see the logic behind this.

    Perhaps a bit of an abstract post but hopefully people can see where I'm coming from! I can elaborate on the specific example that prompted me to make this post but it's not especially interesting or unusual!

    Thanks

  • The query optimizer in SQL Server does consider storing and reusing intermediate result sets in some cases. One common case is where part of the plan is executed many times with different correlated parameters (a nested loops join). If the optimizer estimates that duplicates may appear on the driving side of the join, it may cache results from the sub-plan on the inner side of the join in a Table Spool or Index Spool. It may also choose to pre-sort the outer input so that duplicates are guaranteed to arrive sequentially; together, this sort and table spool combination is known as a 'performance spool'.

    This support is fairly limited, however: a plan shape has to be found during normal plan exploration that would benefit from the spool. The optimizer does not generally go looking for 'queries within queries' to apply a spool optimization to. In addition, table spools do not benefit from statistics or indexes, and even eager index spools do not produce the full benefits of a separate table (for an example of this see http://sqlblog.com/blogs/paul_white/archive/2011/08/30/can-a-select-query-cause-page-splits.aspx).

    Fundamentally, the DBA can often know more about the data than the optimizer could ever deduce on its own, so there will probably always be cases where materializing a set in a temporary table manually will pay dividends. Another thing to bear in mind is that the current design means the optimizer is always racing the clock: its search activities must produce incrementally better plans that are expected to save more time in one execution than it spends optimizing.

  • Thanks for the response Paul, specific to my situation I posted about. This is great but I was also looking for some sources of information on a wider scale - for example I've just been playing around with 2 functions that do the same thing, one via Coalesce and one via XML Path - trying to see which is the most efficient.

    I didn't know which was (or which should be) the most efficient before embarking on it, I guess what I'm asking for is sources of information so I know which should be the most efficient. Obviously the theory doesn't always work in practice but it's a start!

    I stress I'm not looking for answers for one or two specific cases, just some more broad reading I can do as time allows to have more of an understanding of the impact of doing something via method A vs method B when the results set is identical but the execution time is not.

  • Rob-350472 (1/18/2012)


    Thanks for the response Paul, specific to my situation I posted about. This is great but I was also looking for some sources of information on a wider scale - for example I've just been playing around with 2 functions that do the same thing, one via Coalesce and one via XML Path - trying to see which is the most efficient.

    String concatenation is most efficiently performed with XML PATH or a SQLCLR function. The COALESCE method may not be reliable - as far as I recall it is only maintained for backward-compatibility (and that was some time ago, so the guarantee might have lapsed). Yes, I know I am answering your specific question again.

    I didn't know which was (or which should be) the most efficient before embarking on it, I guess what I'm asking for is sources of information so I know which should be the most efficient. Obviously the theory doesn't always work in practice but it's a start!

    I stress I'm not looking for answers for one or two specific cases, just some more broad reading I can do as time allows to have more of an understanding of the impact of doing something via method A vs method B when the results set is identical but the execution time is not.

    I'm not sure such resources exist. SQL Server is a collection of things with specific performance characteristics in specific cases. Performance characteristics may also depend on implementation details that vary from release to release. "It depends" is the most common answer to questions for good reason 🙂

  • SQL Kiwi (1/18/2012)


    Rob-350472 (1/18/2012)


    Thanks for the response Paul, specific to my situation I posted about. This is great but I was also looking for some sources of information on a wider scale - for example I've just been playing around with 2 functions that do the same thing, one via Coalesce and one via XML Path - trying to see which is the most efficient.

    String concatenation is most efficiently performed with XML PATH or a SQLCLR function. The COALESCE method may not be reliable - as far as I recall it is only maintained for backward-compatibility (and that was some time ago, so the guarantee might have lapsed). Yes, I know I am answering your specific question again.

    I didn't know which was (or which should be) the most efficient before embarking on it, I guess what I'm asking for is sources of information so I know which should be the most efficient. Obviously the theory doesn't always work in practice but it's a start!

    I stress I'm not looking for answers for one or two specific cases, just some more broad reading I can do as time allows to have more of an understanding of the impact of doing something via method A vs method B when the results set is identical but the execution time is not.

    I'm not sure such resources exist. SQL Server is a collection of things with specific performance characteristics in specific cases. Performance characteristics may also depend on implementation details that vary from release to release. "It depends" is the most common answer to questions for good reason 🙂

    My testing actually suggested that Coalesce produced marginally more efficient results than For XML Path when comparing the two functions and looking at the client statistics, however, that's not exactly a rock solid conclusion! - I also didn't try a CLR function. Interesting about the coalesce method and it's unreliability -as far as I know we have shed loads of functions using coalesce! (most years old).

    I see your point as regards the 'it depends'. I think my last comment was perhaps a bit misleading there. I guess what I'm saying (for example) is I know what coalesce 'does' in terms of how it works in a select statement but what does it do behind the scenes in the database to retrieve the values it does, and what happens behind the scenes for example when one uses a case when X is null then case when Y is null etc. Sounds like I need to find some sort of 'database/SQL theory' type book or course!

    Perhaps I'm not necessarily looking for 'is A better than B' but more what IS A or B? - e.g. I know what results coalesce produces, but behind the scenes how is it obtaining this information from the database

  • There are a lot of books on SQL Server that would give you some insight into what you're asking for.

    One great author on the subject is Itzik Ben-Gan. List of books here: http://www.amazon.com/Itzik-Ben-Gan/e/B001IGQENW

    There are plenty of others.

    Hit Amazon and search for SQL Server and you'll get a HUGE list. Check the samples, or see if your local library carries (or can get) a few of them, and see whose writing style works for you, then get their books.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Rob-350472 (1/18/2012)


    My testing actually suggested that Coalesce produced marginally more efficient results than For XML Path when comparing the two functions and looking at the client statistics, however, that's not exactly a rock solid conclusion! - I also didn't try a CLR function. Interesting about the coalesce method and it's unreliability -as far as I know we have shed loads of functions using coalesce! (most years old).

    When you have a moment, have a read of some of these:

    http://msmvps.com/blogs/robfarley/archive/2007/04/08/coalesce-is-not-the-answer-to-string-concatentation-in-t-sql.aspx

    http://sqlblog.com/blogs/aaron_bertrand/archive/2011/03/08/t-sql-tuesday-16-this-is-not-the-aggregate-you-re-looking-for.aspx

    http://blogs.msdn.com/b/sqltips/archive/2005/07/20/441053.aspx

  • GSquared (1/18/2012)


    There are a lot of books on SQL Server that would give you some insight into what you're asking for.

    One great author on the subject is Itzik Ben-Gan. List of books here: http://www.amazon.com/Itzik-Ben-Gan/e/B001IGQENW

    There are plenty of others.

    Hit Amazon and search for SQL Server and you'll get a HUGE list. Check the samples, or see if your local library carries (or can get) a few of them, and see whose writing style works for you, then get their books.

    Thanks for the suggestion, some of those look like the sort of thing I'm after, more so than just 'here's how to use TSQL'.

  • By far the best way to learn performance analysis and tuning is to have a mentor that can work with you to examine/improve your existing applications while teaching you what he/she knows. Company wins (better database apps), you win (improved knowledge), consultant wins ($). 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply