Logic and Theory / Optimisation

Question

Logic and Theory / Optimisation

Rob-350472

SSCertifiable

Points: 7197
More actions
January 18, 2012 at 4:22 am

#266890

I've just optimised a query to run in 19 seconds rather than just under 5 mins. All I did was take a small select out of the where clause and create a temporary table of this universe first and evaluate against this instead. This was discovered by trial and error and a hunch based on previous knowledge.
I've got a few ideas as to why this made such a difference, however, can anyone suggest any good reading on this sort of thing so one can bear it in mind when designing queries? At the moment some things I just 'know' will be far quicker than others but it would be good to see the logic behind this.
Perhaps a bit of an abstract post but hopefully people can see where I'm coming from! I can elaborate on the specific example that prompted me to make this post but it's not especially interesting or unusual!
Thanks

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150468 More actions · Answer 1

The query optimizer in SQL Server does consider storing and reusing intermediate result sets in some cases. One common case is where part of the plan is executed many times with different correlated parameters (a nested loops join). If the optimizer estimates that duplicates may appear on the driving side of the join, it may cache results from the sub-plan on the inner side of the join in a Table Spool or Index Spool. It may also choose to pre-sort the outer input so that duplicates are guaranteed to arrive sequentially; together, this sort and table spool combination is known as a 'performance spool'.

This support is fairly limited, however: a plan shape has to be found during normal plan exploration that would benefit from the spool. The optimizer does not generally go looking for 'queries within queries' to apply a spool optimization to. In addition, table spools do not benefit from statistics or indexes, and even eager index spools do not produce the full benefits of a separate table (for an example of this see http://sqlblog.com/blogs/paul_white/archive/2011/08/30/can-a-select-query-cause-page-splits.aspx).

Fundamentally, the DBA can often know more about the data than the optimizer could ever deduce on its own, so there will probably always be cases where materializing a set in a temporary table manually will pay dividends. Another thing to bear in mind is that the current design means the optimizer is always racing the clock: its search activities must produce incrementally better plans that are expected to save more time in one execution than it spends optimizing.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Rob-350472 SSCertifiable Points: 7197 More actions · Answer 2

Thanks for the response Paul, specific to my situation I posted about. This is great but I was also looking for some sources of information on a wider scale - for example I've just been playing around with 2 functions that do the same thing, one via Coalesce and one via XML Path - trying to see which is the most efficient.

I didn't know which was (or which should be) the most efficient before embarking on it, I guess what I'm asking for is sources of information so I know which should be the most efficient. Obviously the theory doesn't always work in practice but it's a start!

I stress I'm not looking for answers for one or two specific cases, just some more broad reading I can do as time allows to have more of an understanding of the impact of doing something via method A vs method B when the results set is identical but the execution time is not.

Paul White SSC Guru Points: 150468 More actions · Answer 3

Rob-350472 (1/18/2012)
Thanks for the response Paul, specific to my situation I posted about. This is great but I was also looking for some sources of information on a wider scale - for example I've just been playing around with 2 functions that do the same thing, one via Coalesce and one via XML Path - trying to see which is the most efficient.

String concatenation is most efficiently performed with XML PATH or a SQLCLR function. The COALESCE method may not be reliable - as far as I recall it is only maintained for backward-compatibility (and that was some time ago, so the guarantee might have lapsed). Yes, I know I am answering your specific question again.

I didn't know which was (or which should be) the most efficient before embarking on it, I guess what I'm asking for is sources of information so I know which should be the most efficient. Obviously the theory doesn't always work in practice but it's a start!
I stress I'm not looking for answers for one or two specific cases, just some more broad reading I can do as time allows to have more of an understanding of the impact of doing something via method A vs method B when the results set is identical but the execution time is not.

I'm not sure such resources exist. SQL Server is a collection of things with specific performance characteristics in specific cases. Performance characteristics may also depend on implementation details that vary from release to release. "It depends" is the most common answer to questions for good reason 🙂

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Rob-350472 SSCertifiable Points: 7197 More actions · Answer 4

SQL Kiwi (1/18/2012)
Rob-350472 (1/18/2012)
Thanks for the response Paul, specific to my situation I posted about. This is great but I was also looking for some sources of information on a wider scale - for example I've just been playing around with 2 functions that do the same thing, one via Coalesce and one via XML Path - trying to see which is the most efficient.
String concatenation is most efficiently performed with XML PATH or a SQLCLR function. The COALESCE method may not be reliable - as far as I recall it is only maintained for backward-compatibility (and that was some time ago, so the guarantee might have lapsed). Yes, I know I am answering your specific question again.
I didn't know which was (or which should be) the most efficient before embarking on it, I guess what I'm asking for is sources of information so I know which should be the most efficient. Obviously the theory doesn't always work in practice but it's a start!
I stress I'm not looking for answers for one or two specific cases, just some more broad reading I can do as time allows to have more of an understanding of the impact of doing something via method A vs method B when the results set is identical but the execution time is not.
I'm not sure such resources exist. SQL Server is a collection of things with specific performance characteristics in specific cases. Performance characteristics may also depend on implementation details that vary from release to release. "It depends" is the most common answer to questions for good reason 🙂

My testing actually suggested that Coalesce produced marginally more efficient results than For XML Path when comparing the two functions and looking at the client statistics, however, that's not exactly a rock solid conclusion! - I also didn't try a CLR function. Interesting about the coalesce method and it's unreliability -as far as I know we have shed loads of functions using coalesce! (most years old).

I see your point as regards the 'it depends'. I think my last comment was perhaps a bit misleading there. I guess what I'm saying (for example) is I know what coalesce 'does' in terms of how it works in a select statement but what does it do behind the scenes in the database to retrieve the values it does, and what happens behind the scenes for example when one uses a case when X is null then case when Y is null etc. Sounds like I need to find some sort of 'database/SQL theory' type book or course!

Perhaps I'm not necessarily looking for 'is A better than B' but more what IS A or B? - e.g. I know what results coalesce produces, but behind the scenes how is it obtaining this information from the database

GSquared SSC Guru Points: 260824 More actions · Answer 5

There are a lot of books on SQL Server that would give you some insight into what you're asking for.

One great author on the subject is Itzik Ben-Gan. List of books here: http://www.amazon.com/Itzik-Ben-Gan/e/B001IGQENW

There are plenty of others.

Hit Amazon and search for SQL Server and you'll get a HUGE list. Check the samples, or see if your local library carries (or can get) a few of them, and see whose writing style works for you, then get their books.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Paul White SSC Guru Points: 150468 More actions · Answer 6

Rob-350472 (1/18/2012)
My testing actually suggested that Coalesce produced marginally more efficient results than For XML Path when comparing the two functions and looking at the client statistics, however, that's not exactly a rock solid conclusion! - I also didn't try a CLR function. Interesting about the coalesce method and it's unreliability -as far as I know we have shed loads of functions using coalesce! (most years old).

When you have a moment, have a read of some of these:

http://msmvps.com/blogs/robfarley/archive/2007/04/08/coalesce-is-not-the-answer-to-string-concatentation-in-t-sql.aspx

http://sqlblog.com/blogs/aaron_bertrand/archive/2011/03/08/t-sql-tuesday-16-this-is-not-the-aggregate-you-re-looking-for.aspx

http://blogs.msdn.com/b/sqltips/archive/2005/07/20/441053.aspx

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Rob-350472 SSCertifiable Points: 7197 More actions · Answer 7

GSquared (1/18/2012)
There are a lot of books on SQL Server that would give you some insight into what you're asking for.
One great author on the subject is Itzik Ben-Gan. List of books here: http://www.amazon.com/Itzik-Ben-Gan/e/B001IGQENW
There are plenty of others.
Hit Amazon and search for SQL Server and you'll get a HUGE list. Check the samples, or see if your local library carries (or can get) a few of them, and see whose writing style works for you, then get their books.

Thanks for the suggestion, some of those look like the sort of thing I'm after, more so than just 'here's how to use TSQL'.

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 8

By far the best way to learn performance analysis and tuning is to have a mentor that can work with you to examine/improve your existing applications while teaching you what he/she knows. Company wins (better database apps), you win (improved knowledge), consultant wins ($). 😎

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service