Cost Threshold of Parallelism

Question

Cost Threshold of Parallelism

Michael Saint

SSC Veteran

Points: 293
More actions
May 5, 2008 at 8:59 am

#123715

I recently observed a situation where the default settings of SQL Server proved improper for an application. The application is a VB Application using ADO to build a list of records and then loops through the records to make qualifying decisions. As the loop progresses, it executes many small queries most taking only a couple of ms to run. On a server with 9 processors, the batch would run for 5 hours. On a single processor machine, the batch would run in 1 hour. We did not expect that the parallelism settings were impacting this because the default threshold required a query to run for 5 seconds to invoke parallelism. However, when we changed the number of processors that could participate in a query (Max Degree of Parallelism) to 1, the batch ran in less than 1 hour as opposed to 5 hours.
It would be interesting to better understand why we experienced this behavior. We plan to experiment with different settings to determine which settings (Threshold or MDOP) produce the best results.
I would appreciate the benefit of your past experience.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

John Sinclair Mr or Mrs. 500 Points: 518 More actions · Answer 1

I've had nothing but trouble with parallel queries. At a past job when we moved to SQL2k sp4 suddenly it was much more agressive about parallelising our queries, and generally caused lots of lock contention and generally slowed everything down.

We even saw a single query (an overnight job) get split into two parallel threads, which deadlocked each other.

In the end we gave up fighting it and just set maxdop to 1

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 2

http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/performance_tuning_waits_queues.mspx

Check out CXPACKET waits. Also the best practice on (properly tuned) OLTP systems is to set max dop = 1.

If your IO subsystem can't keep up with parallelism of the multiple CPUs you should also reduce maxdop and/or up the cost threshold for parallelism until most of the cxpacket waits go away.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

RBarryYoung SSC Guru Points: 143329 More actions · Answer 3

TheSQLGuru (5/6/2008)
...Also the best practice on (properly tuned) OLTP systems is to set max dop = 1.

Really?!? Could you site a reference for this? The only general "Best Practices" that I have ever heard for MAXDOP server-wide are either zero (0) or #CPUs/2.

I have only ever heard MAXDOP=1 as a suggested server-wide setting for servers that are already experiencing parallelism problems. That is how it is put in the document that you reference: it is only listed as a suggestion, not even as strong as a recommendation really, and definitely not even close to a "Best Practice".

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 4

rbarryyoung (5/6/2008)
TheSQLGuru (5/6/2008)
...Also the best practice on (properly tuned) OLTP systems is to set max dop = 1.
Really?!? Could you site a reference for this? The only general "Best Practices" that I have ever heard for MAXDOP server-wide are either zero (0) or #CPUs/2.
I have only ever heard MAXDOP=1 as a suggested server-wide setting for servers that are already experiencing parallelism problems. That is how it is put in the document that you reference: it is only listed as a suggestion, not even as strong as a recommendation really, and definitely not even close to a "Best Practice".

References (some do indeed fall into the category you mention):

SAP_SQL2005_BestPractices.doc

SQL2005_Performance_Tuning_Waits_Queues.doc

TheSQLGuru 😎

http://blogs.msdn.com/sqlcat/archive/2005/11/30/498415.aspx

download.microsoft.com/download/b/8/f/b8f13247-b992-4f0e-846e-8f89fcaac0bd/SQL_OLTP_BestPractices.ppt

http://blogs.msdn.com/axperf/archive/2008/03/10/welcome-database-configuration-checklist-part-1.aspx

http://www.tpc.org/results/FDR/TPCC/HP_ML350G5_tpcc_fdr.pdf

The collective body is enough for me to call it a Best Practice, despite the fact that every article doesn't explicitly state that. There are other resources too. And don't forget that I qualified my initial statement with "well tuned OLTP".

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Roger L Reid SSCrazy Points: 2004 More actions · Answer 5

The usual answer applies: it depends!

The problem is that decisions on parallelism are made on the basis of the query, and not on the basis of the overall server workload. This can allow certain queries or groups of queries to take over most of the machine's resources.

It makes intuitive sense in my case to set max parallelism to 1 or 2, since I support a vast disconnected range of applications and dbs - 30 year old in house code, 3rd party stuff of various quantities, etc. All of it written to not NEED parallelism.

If I were running a server for a single purpose, that would be a whole other situation. Also, I haven't proven it but I suspect that my use of AMD based x64 impacts this due to the "io channel per CPU" design - no throttling by the front bus during IO waits.

It was most dramatic when I ported 30 year old reporting code. Code that was written for machines with less power than your cell phone was ale to bring a fancy new multicore, multipath io, 32 GB memory machine to its knees - until I set max parallelism to 1.

Here's why - and all this proves is "it depends". You have to test your own situation.

These reports, written in C, ran on clients (DEC 11/70) that could allocate a max of 64KB data space to a program. You could not hold a large result set in memory, so you would process each tuple as it came off the firehose.

Networking was originally GPIB! Then ethernet - but the network was once considered a bottleneck. The data disk on the client was 330 MB - not all for any one report either. You could not store the whole report, so it had to produce output on the fly, before all the data was returned, because it had to be printed and cleared every hundred pages or so.

And optimizers weren't what they are now. So often queries had to be split: you'd set up several connections, and get the keys for the top order first, and as each tuple came back, get the applicable keys for the next order, and perhaps as those came back then get the data for the output row, sometimes using multiple queries to avoid particularly slow joins.

This was the only practical way to do it at the time, if you wanted the reports to run in a day or two instead of a month or two.

When you throw something like that at a Monster Multicore Multipath machine today, you can find that each one of those connections is getting multiple CPU's. But what the optimizer doesn't know is that THIS one's result set can't be returned any faster that this other ones! And so the outer loop has a result set sitting and sitting waiting for the inner loop, waiting for its inner loop - and since its taking such a long time, they all can get extra CPUs.

Now, you'll say "that's bad code - rewrite it". And I'd love to. I love to code, and I can't say I like SQL Server administration. But the economics today are that it's faster to buy and configure hardware than to recode (and retest, and redeploy - and besides, if you recode, they'll redesign --)

So I'm stuck with it. In most cases, by not allowing parallelism reports which ran in 1 day 20 years ago and 1 hour last year can run in 1 minute.

However, I know if I unwound those loops and simply generated The One Big Result Set the report needs, it could run in 1 second. Sadly, that makes no business sense.

Roger L Reid

Abrar Ahmad_ SSCarpal Tunnel Points: 4248 More actions · Answer 6

Roger L Reid (5/7/2008)
....
Now, you'll say "that's bad code - rewrite it". And I'd love to. I love to code, and I can't say I like SQL Server administration. But the economics today are that it's faster to buy and configure hardware than to recode (and retest, and redeploy - and besides, if you recode, they'll redesign --)
So I'm stuck with it. In most cases, by not allowing parallelism reports which ran in 1 day 20 years ago and 1 hour last year can run in 1 minute.
However, I know if I unwound those loops and simply generated The One Big Result Set the report needs, it could run in 1 second. Sadly, that makes no business sense.

[font="Verdana"]

Do u still dangling even after year/s 🙂 of your decision and findings!

It's a great post from u!

I suffered same problem for to support legacy system but not such old as your quoted system.

According to performance enhancement thumb rule/reply 😛 "It Depends!"! The current upgraded system under consideration, is boxed into category where parallel execution caused performance problems!

I need to have your feedback over the other less cost effective and simple alternative to handle such problem i.e. to split DB files (luckily designed for current system) on different disks/partitions so the system can eventually benefit through the parallel execution?

and why parallel execution is failing to produce good throughput, in broader sense?

Thank you

[/font]

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

Please don't reply to 3 year old threads. Start a new one.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

jswong05 Hall of Fame Points: 3553 More actions · Answer 8

DBA, it is called archiving or set to read-only.

Jason
http://dbace.us
😛