Use your own optimizer to tune your queries

Question

Use your own optimizer to tune your queries

Viewing 15 posts - 16 through 30 (of 49 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

Gullimeel (5/17/2012)
I am not saying that it will make your future tuning easy or something like that. You have to keep looking for whether you query is still performaning better than the query without using the hints. It will be on going process.Tuning and improvements are constant.
Rgearding you have never used your own optimizer in 11 years. I am sure once in these 11 years you must have used hints 🙂 or have changed the sql so that it exploits the optimizer in better way.
e.g. In SQL server 2008 R2 you have aggregate anaytical function but these are not performing better than the alternate long methods. Thus you may not use aggregate analytical function but use the alternates and that doesnt mean that you are making future tuning hard ( it is possible that in ext version of SQL server aggregate analytical function perfoms better than alternate methods).
I have to think about the future but I can not ignore the present as well.
I would like to know the more scalable method.I know one that is recompile.Any other is welcome..

You're right. In 11 years, I've used index hints once. Was in SQL 2000, and in 2008 or beyond, I'd use a filtered index instead. Even then, a partitioned table might have been better, but I didn't know about those yet.

As for changing the query so it uses the optimizer better, yeah, that's one of the main points of tuning. But that's not a "roll your own optimizer" situation. I still use the default optimizer. I just work with it. Things like SARGable Where clauses and Join math, tuning indexes (selective leading edges, et al), all the various methods for avoiding parameter sniffing, avoiding multi-table UDFs and table variables except where they are forced to be effective by the data structure itself, avoiding cursors, avoiding triangular joins, use of Numbers tables, HierarchyID instead of adjacency hierarchies for semi-static data hierarchies, proper normalization, avoiding name-value-pair table structures, pulling functionality out of the database or database API into a higher layer when the higher layer is better at it, CLR REGEX instead of T-SQL string manipulation, avoiding unnecessary MSDTC calls, picking the right isolation level, and so on and so on and so on, are all performance tuning, but none of them require "using my own optimizer to tune my queries" in this fashion.

As for scalable vs unscalable methods, that's a HUGE subject all by itself. There are books on just that. But in this particular situation, I don't see maintaining a table of this sort of data being something that will allow for rapid growth in a large, complex environment. As others have mentioned, if you have a dozen procs, this might work, but if you have a thousands, forget it. I used to administer a relatively small CRM/BI/ERP solution. Database was only about 20 Gig, but it had over 3000 stored procedures, hundreds of UDFs, dozens of views, even a few CLR assemblies. SQL Server and its built-in optimizer, cache management, etc., worked beautifully with that. The database was fast, efficient, supported multiple websites, a whole business worth of applications and users, several long-distance ETL processes (moving data to different cities on the other side of the continent). Can't imagine having to manage my own "optimizer" for something like that. Seems like that would be a full time job all by itself.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Gullimeel Hall of Fame Points: 3871 More actions · Answer 2

I am not saying use this in each and every procedure you have.It is a specific solutio for specific problem and in no way should be abused.

Optimizers are becoming better and better with each version and most of things which anyone might have done as a workaround in say sql server 2000 are not needed at all in sql server 2008 as optimizer is intelligent enough to handle a number of them. Similarly in sql server 2008 you might have some issues for which you do aother workarounds. These might not need say in 2012( Ihave not used it till now so do not know any things about this).

GulliMeel

Finding top n Worst Performing queries[/url]
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 3

Gullimeel (5/17/2012)
The overhead of the querying two extra tables is just 3 logical IO's.

You're still just optimizing for logical I/O, not the whole cost of executing the extra statements. Compare that overhead to 150 microseconds for the recompile.

Gullimeel (5/17/2012)
5 compilation a minute is not a big overhead in isolation but when you have other processes going on it could cause the contention on proc cache.

OPTION (RECOMPILE) queries are not cached as I mentioned, so there is no contention. Five compilations per minute is absolutely nothing at all.

Gullimeel (5/17/2012)
If you see the script i have mentioned that the other problem with using recompile is that it doesnt generate in my case a table scan untill it has around 1900 rows.

Yes, and according to the real optimizer's more comprehensive model, that is the correct tipping point. Your tests only consider logical I/O, nothing else at all. The parallel scan and gather streams incur more costs that just the logical I/Os.

Gullimeel (5/17/2012)
As you will see that in my script I have mentioned that the cost is not alone decided by the logical IO but it includes cpu as well as physical IO's but usually CPU factor is not as big as the logical or pysical IO's. Thus I have added some extra logical IO's to actual threshold value to determine where the cost of table scan would be less than the index seek + key lookup.

That's just pure guesswork. The real optimizer contains real estimation logic.

Gullimeel (5/17/2012)
Thus you could try different method to come up with a correct threshold value. e.g. If table is not memory then physical Io's will be better with the table scan than the Index scan. Thus this could be a factor where you could decrease the threshold.

The non-clustered index is narrower (and so smaller overall. It can use read-ahead just like the clustered index scan. Prefetching and optimized-mode batch sorting of keys on the nested loops join can also help. Things are a lot more complex than your simplified logical-I/O-only model would suggest.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Gullimeel Hall of Fame Points: 3871 More actions · Answer 4

You're still just optimizing for logical I/O, not the whole cost of executing the extra statements. Compare that overhead to 150 microseconds for the recompile.

I would like to know that extra overhead.It can be CPU time etc. But when you recompile.You will be reading a number of system tables/structures which contains statistics etc and I am sure reading these tables will occur more overhead than my two tables. To check what it reads when it executes the below query.

declare @id int

set @id = 1000

select * from dbo.SkewedTable

where id = @id

option(recompile)

I created a small script to catch the locks it holds etc..Be carful it is infinite while loop.You have to cancel the below query when your main query is executed.

set nocount on

go

dbcc freeproccache

go

dbcc dropcleanbuffers

go

drop table mylocks1

go

select * into mylocks1 from sys.dm_tran_locks

where request_session_id =56

while 1=1

begin

insert into mylocks1

select * from sys.dm_tran_locks

where request_session_id =56

end

select * from mylocks1

order by resource_subtype desc

In this you will see that optimizer is reading indexstats, stats tables twice and read planguide as well. I am sure the overhead of these is more than my reading two tables.

That's just pure guesswork. The real optimizer contains real estimation logic.

I am not saying that my optimizer is better than the real optimizer.But real optimizer relies on statistics it gathers and sometimes these are not enough to generate the best plan.One thing i have noticed is that the optimizer doesnt take the locks on the keys into account while calculating costs. In my case for 1000 rows it used around 753 key locks.See the above script which i ran to check the indexstats and stats table. I am sure this value should be 1000 but my script was not running fast enough to cpature all of these locks. Acquiring locks (eben though) shared has to use a number of resources. Thus based on this you can drop the threshold

of the query which I am sure the optimizer is not able to do. Also, Parallelism is enabled then you can further drop the threshold value as the table scan will take full advantage of parallelism.

OPTION (RECOMPILE) queries are not cached as I mentioned, so there is no contention. Five compilations per minute is absolutely nothing at all.

But to execute the query the plan will be generated and put in cache and It will be discarded once execution is done. I am talking about the overhead of creating this plan.

The non-clustered index is narrower (and so smaller overall. It can use read-ahead just like the clustered index scan. Prefetching and optimized-mode batch sorting of keys on the nested loops join can also help. Things are a lot more complex than your simplified logical-I/O-only model would suggest.

Yes that is why I have accounted mostly the key lookup IO's and almost neglected the non clustered index scan.

GulliMeel

Finding top n Worst Performing queries[/url]
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 5

Gullimeel (5/17/2012)
In this you will see that optimizer is reading indexstats, stats tables twice and read planguide as well. I am sure the overhead of these is more than my reading two tables.

These are all highly optimized internal constructs, not at all the same as reading two user tables. Even if your extra checks complete in zero time, the most you could ever save here would be the 150 microseconds I mentioned before.

Gullimeel (5/17/2012)
I am not saying that my optimizer is better than the real optimizer.But real optimizer relies on statistics it gathers and sometimes these are not enough to generate the best plan.

Nevertheless, the optimizer picks a very good plan every time on your test data when OPTION RECOMPILE is specified. You are considering only one possible metric (logical I/O) to determine what is 'best'. It's not really surprising that the real optimizer disagrees with you.

Gullimeel (5/17/2012)
One thing i have noticed is that the optimizer doesnt take the locks on the keys into account while calculating costs.

There are standard ways to reduce locking overhead (if that is causing a measurable performance impact). Trying to guess, and manually tweaking some 'magic' number to compensate is not a very sensible approach.

Gullimeel (5/17/2012)
I am talking about the overhead of creating this plan.

This is the 150 microseconds I keep mentioning. Take the overheads of your extra statements away from this, and tell me, is what you are doing really worth it?

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Gullimeel Hall of Fame Points: 3871 More actions · Answer 6

I have made my stats tables highly optimized as well.

I am not saying that i am gaininng 150 micro seconds and that is great achievement.

The gain I will get through using less latches while reading the pages in buffer (logical IO's) and less looping is what will save me the time and resources.

If you think that the threshold i have chose is wrong then you can use the optimizer's tipping point and it is somewhere between 1800 and 1900 and use that as a threshold. I have shown an example of just 2000 rows there could be any number of rows.This demo script is way too small to show any kind of gain even in terms of IO's it will be around 1200 for 2000 rows.Just to gain 1200 IO's I do not want to write my 'own optimizer'. The real performance gain you will realize when the number of rows are too mcuh skewed and number of rows processed are large.In demo script even skeweing has a pattern. The number of rows are small so estimation are quite good because of histogram. When the number of distinct values will be quite high then historgram estimation will not be that effective and you will see seriusly bad plan chosen by the optimizer.

As far as the cost is considered I do not have exact idea how SQL server optimizer calculates the cost but I am sure it must be using the following for sure.Logical IO, Physical IO and CPU time. I am in process of learning how the SQL server is estimating the cost. Like I was running some cases to find what is the relationship between cost of reading one page(logical) and cpu work done in reading that one page. Once I will have that figure I will let you know.

GulliMeel

Finding top n Worst Performing queries[/url]
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 7

Gullimeel (5/18/2012)
I have made my stats tables highly optimized as well.

A user table can never be as optimized as the internal metadata caches. The optimizations I was referring to are ones only available to people that write SQL Server.

I am not saying that i am gaininng 150 micro seconds and that is great achievement.

You haven't demonstrated your technique providing any worthwhile benefits either. That's not to say it never could, but there are plenty of standard techniques available that a much better, taken as a whole, that your proposal. The fact that your demonstration code does not (presumably) do your technique justice, is hardly our concern.

The gain I will get through using less latches while reading the pages in buffer (logical IO's) and less looping is what will save me the time and resources.

Latch contention is hardly likely to be an important performance consideration here. Certainly there is a trade-off between alternative plan strategies (e.g. key lookup versus parallel scan) but nothing you have said so far gives much confidence that you truly understand the details enough to do what you are doing.

When the number of distinct values will be quite high then historgram estimation will not be that effective and you will see seriously bad plan chosen by the optimizer.

Good standard techniques exist to solve this issue reliably - filtered statistics, to take one example. We could see the number of histogram steps increased to 400 from 200 in the next year or so, according to Eric Hanson. Your solution is also limited to a single equality predicate, and is extremely fragile to changing code or circumstances. There's just no comparison between your solution and the standard capabilities of the optimizer, in general terms.

As far as the cost is considered I do not have exact idea how SQL server optimizer calculates the cost but I am sure it must be using the following for sure.Logical IO, Physical IO and CPU time. I am in process of learning how the SQL server is estimating the cost. Like I was running some cases to find what is the relationship between cost of reading one page(logical) and cpu work done in reading that one page. Once I will have that figure I will let you know.

You're guessing again! The costing component has specific algorithms for each physical iterator. I wrote about some of the details here: http://sqlblog.com/blogs/paul_white/archive/2010/09/01/inside-the-optimizer-plan-costing.aspx. Joe Chang (who also blogs at sqlblog.com) has some more specifics on his own website: http://www.qdpma.com/. Be very careful though, the costing details can and do change between releases.

Once again, I would counsel you to abandon this approach, and learn to work *with* the standard query optimizer rather than trying to impose your own beliefs on it. Generally, there are good standard solutions to most plan selection issues. Your time is better spent learning techniques that will scale and manage themselves over time, rather than trying to write your own optimizer.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Gullimeel Hall of Fame Points: 3871 More actions · Answer 8

I have not demonstarted anything apart from the equality. You can use inequality or anything you want and use the same to get the number of rows from stats table. It depend show you want to query and change it as per your requirement.

Filtered statistics is good when you have very less number of distinct value.If there are 10000 distinct values, Will you create 10000 filtered index statistics? Even if you do maintaining them wouldnt be free. It will also have the overhead.

I was looking for the costing algorithms and couldnt find so thought to do some researhc of my own.

BTW, thanks for the link for costing.

GulliMeel

Finding top n Worst Performing queries[/url]
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw[/url]

Paul White SSC Guru Points: 150468 More actions · Answer 9

Received via private message:

Gullimeel (5/18/2012)Cost of Sorting
When you run a query say select * from sales..salesOrderheader order by customerid asc. I could see that it does a table scan and cost assigned is around 20% and then there is sort which has a cost of around 80%. However, I can not see the cost related to sort in term of IO's, CPU time or memory usage. I knwo if sort is small it wont use tempdb and will do everything in memory. But still it should have some kind of stats to tell me that what was the overhead.

The query plan exposes some high-level details:

But to see the exact formulas used to generate those numbers, you would need SQL Server source code access.

I tried to use the extended event to gather sort statistics it provided the details but I could not find any documentation on the meaning of various colums returned by the extended stats.

The internal details of sort runs, phases, bobs, fanout (and so on) are undocumented because they can change at any time, are not useful for general query tuning, and may be covered by patents or otherwise represent Microsoft's intellectual property.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 10

Gullimeel (5/18/2012)
I have not demonstarted anything apart from the equality. You can use inequality or anything you want and use the same to get the number of rows from stats table. It depend show you want to query and change it as per your requirement.

Sure. I fully understand what you are doing, and what you are hoping to achieve, I just disagree that it is a good idea to do it this way.

Filtered statistics is good when you have very less number of distinct value.If there are 10000 distinct values, Will you create 10000 filtered index statistics? Even if you do maintaining them wouldnt be free. It will also have the overhead.

No. The exact solution and techniques to employ would depend on the specific situation, of course. All I'm saying is that there are a number of standard techniques and facilities which mean that I have never had to consider doing the sort of manual optimization you are proposing (and I have done my own fair share of nasty hacks over the years).

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Gullimeel Hall of Fame Points: 3871 More actions · Answer 11

Sorry I did not ask question correctly. I forgot to ask is there a table or DMV where we could see this Memory Grant option for a given query.

GulliMeel

Finding top n Worst Performing queries[/url]
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw[/url]

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 12

Gullimeel (5/18/2012)
Filtered statistics is good when you have very less number of distinct value.If there are 10000 distinct values, Will you create 10000 filtered index statistics? Even if you do maintaining them wouldnt be free. It will also have the overhead.

No you wouldn't, that would be a very poor plan. You'd create maybe 10 or 20 and test, if there is need for more, maybe double that. It's unlikely, even on the hugest tables with the weirdest data distribution, that you'd need much more.

It's not about the number of distinct values, it's about the severity of the data skew. The worse the skew the more stats you may need.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Paul White SSC Guru Points: 150468 More actions · Answer 13

The following paper will help you understand sorting internals. Please bear in mind that although the paper gives details that are very close to the SQL Server implementation, it does not describe exactly how SQL Server works internally now, or on any specific version. It also does not give specific numbers or formulas used by SQL Server for costing (except in a very general sense). I am providing this (public) information for interest and education - not for you to go off and build your own sort :laugh:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.83.1080 (click the PDF icon to download the paper)

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 14

Gullimeel (5/18/2012)
Sorry I did not ask question correctly. I forgot to ask is there a table or DMV where we could see this Memory Grant option for a given query.

sys.dm_exec_query_memory_grants (Transact-SQL)

Also see:

http://blogs.msdn.com/b/sqlqueryprocessing/archive/2010/02/16/understanding-sql-server-memory-grant.aspx

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Gullimeel Hall of Fame Points: 3871 More actions · Answer 15

Thanks.

Do not worry I will not create my own sort.

However, the people who queries dm_exec_query_stats to find the costly queries misses this sort cost. Like if you see this salesorderheader query.It will show just 706 logical IO's which is just 10 % of actual cost. Thus we are kind of comparing 10 % cost of one query against 100% cost of another query and showing that the query with no sort is more costly than the sort one. But actually the sort query might be much more costly than origianl one.

Thus while finding the queries which are putting pressure on sql server we should consider the exec_query_stats dmv along with dmv for memeory_grant_option.

GulliMeel

Finding top n Worst Performing queries[/url]
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw[/url]