Trying to generate a parallel execution plan

Question

Trying to generate a parallel execution plan

jdbrown239

SSCarpal Tunnel

Points: 4393
More actions
March 5, 2015 at 1:23 pm

#315338

Hi
I have been trying to the query optimizer to generate a parallel execution plan but no matter the MaxDOP (0) or Cost Threshold (5) settings I use it will only execute in serial. Any suggestions would be helpful.
UPDATE [dbo].[Targus_201412_V7_B]
SET [URBAN] =(
CASE
WHEN [METRO_STATUS] = 'Urban' THEN 1
ELSE 0
END)
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Alan Burstein SSC Guru Points: 61141 More actions · Answer 1

The query optimizer does not always get it right but seems to correct more often that not provided that your system is configured correctly. MAXDOP(0) won't do anything for you if Max Degree of Parallelism (server properties > advanced) is set to 0. Parallel is not always better.

That said, if you want to guarantee a parallel plan you can use make_parallel() by Adam Mechanic.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

jdbrown239 SSCarpal Tunnel Points: 4393 More actions · Answer 2

jdbrown239

SSCarpal Tunnel

Points: 4393

March 5, 2015 at 3:04 pm

#1782046

Thanks Alan

What is the syntax for make_parallel() ?

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 3

jdbrown239 (3/5/2015)
Thanks Alan
What is the syntax for make_parallel() ?

It's all in the article. Click on the differently colored "make_parallel" in Alan's post. It's a link to the article.

As a bit of a sidebar, why do you think you need to force-parallel such a query?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Alan Burstein SSC Guru Points: 61141 More actions · Answer 4

Oh dang! It does not appear that makeparallel() will help you get a parallel plan here...

First, here's a little background on the function:

Some time ago Paul White wrote an amazing article about how to force a parallel plan using an undocumented trace flag (8649). Here's a quick demo:

Using this sample data:

USE tempdb

GO

IF OBJECT_ID('tempdb.dbo.t') IS NOT NULL DROP TABLE dbo.t;

GO

CREATE TABLE dbo.t

(METRO_STATUS varchar(20) not null, URBAN int);

INSERT t (METRO_STATUS)

VALUES ('Urban'),('xxx'),('xxx'),('Urban');

You could use the traceflag like this:

SELECT t.*

FROM t

OPTION (RECOMPILE, QUERYTRACEON 8649);

Blam! Parallel query plan. The problem is that this is an undocumented and unsupported trace flag and therefore unacceptable in a production environment. (which is why I don't recommend it except for testing in non-prod environments)

Then Adam Mechanic came up with the aforementioned makeparallel() function that, until now, usually behaves exactly the same. To use it in a SELECT statement you would use CROSS APPLY like so...

SELECT t.*

FROM t

CROSS APPLY dbo.make_parallel();

Anyhow, I never tried makeparallel() in an update statement... It's easy with the traceflag but a little tricky with makeparallel... After some tinkering I came up with:

WITH tu AS

(

SELECT

METRO_STATUS,

URBAN

FROM t

CROSS APPLY make_parallel()

)

UPDATE tu

SET URBAN = CASE METRO_STATUS WHEN 'Urban' THEN 1 ELSE 0 END

This is getting me a parallel zone in the query plan but the the update is still serial. I am going to play with this a little more but I am beginning to think that makeparallel is the right solution here. I am going to tinker around with this a little and get back to you.

If anything I would read Paul's article and Adams article - they are super-informative.

Mr. Magoo... If you are out there perhaps you could chime in:w00t: you know much more about this than I do...

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

mister.magoo SSC-Forever Points: 47174 More actions · Answer 5

Alan.B (3/5/2015)
Mr. Magoo... If you are out there perhaps you could chime in:w00t: you know much more about this than I do...

Wow, I appreciate the kind thought, but I really don't agree in this case - I am but a student 🙂

This technet magazine TIP suggests that UPDATES are always serial, and I cannot find anything to suggest otherwise.

Certain types of statements cannot be processed in parallel unless they contain clauses, however. For example, UPDATE, INSERT, and DELETE are not normally processed in parallel even if the related query meets the criteria. But if the UPDATE or DELETE statements contain a WHERE clause, or an INSERT statement contains a SELECT clause, [highlight="#ffff11"]WHERE and SELECT can be executed in parallel. Changes are applied serially to the database in these cases[/highlight].

It's not documentation, but the source is good.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

jdbrown239 SSCarpal Tunnel Points: 4393 More actions · Answer 6

@alan Thanks Alan

The Paul White post is great! I learn something every time I post in this forum. I am going to setup a test using your example with make_parallel.

@mm Thank you also MM

The technet tip is also good and might explain why the update might always be serial.

@jeff

I have the task of running multiple updates to a table with a row count of 107 million. Even after splitting the table in to smaller tables (about 38 million rows ea) the updates are still very time consuming.

I thank you all for your help.

Doug

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 7

jdbrown239 (3/5/2015)
@Alan Thanks Alan
The Paul White post is great! I learn something every time I post in this forum. I am going to setup a test using your example with make_parallel.
@mm Thank you also MM
The technet tip is also good and might explain why the update might always be serial.
@jeff
I have the task of running multiple updates to a table with a row count of 107 million. Even after splitting the table in to smaller tables (about 38 million rows ea) the updates are still very time consuming.
I thank you all for your help.
Doug

The problem is likely a thing called the "tipping point". Every system has one. On my older computer at home, the tipping point comes into play for over 3 million rows (depending on the width of the table). It takes just 3 seconds to update 1 Million rows. For 2 million rows it only takes 6 seconds and for 3 million rows it takes only 9 seconds. All as expected. For 4 million rows, though, it takes well over 2 hours because I've gone past the mysterious (related to the amount of memory available) but ever present "tipping point". You also have to consider what a mega-update does to the log file even if you're in the simple recovery mode.

I recommend breaking up the updates along with a proper index so you can have an expedient WHERE clause to filter out the rows that have already been updated. The index does not need to be permanent.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

roger.price-1150775 Default port Points: 1458 More actions · Answer 8

In my experience you get parallel when a query has multiple objects where it can use separate threads to read separate objects. In an update statement you have locking to manage and a single thread locking at the table or extent level could be more efficient than many threads locking at page or row level. One thing worth considering is partitioning. A partitioned table is a good candidate for paralel queries although not sure about parallel updates. Only place I've seen partitioning used was on a 27 billion row, 2.5TB table with partitioning used for improving queries and index maintenance. With that size you are I/O constraint so it doesn't always matter how many threads are working on it. I cant recall if a single update statement that involved data in multiple partitions updated multiple partitions in paralel but I'm guesing it will. It would be fun to try if you have the time.

Alan Burstein SSC Guru Points: 61141 More actions · Answer 9

mister.magoo (3/5/2015)
Alan.B (3/5/2015)
Mr. Magoo... If you are out there perhaps you could chime in:w00t: you know much more about this than I do...
Wow, I appreciate the kind thought, but I really don't agree in this case - I am but a student 🙂
This technet magazine TIP suggests that UPDATES are always serial, and I cannot find anything to suggest otherwise.
Certain types of statements cannot be processed in parallel unless they contain clauses, however. For example, UPDATE, INSERT, and DELETE are not normally processed in parallel even if the related query meets the criteria. But if the UPDATE or DELETE statements contain a WHERE clause, or an INSERT statement contains a SELECT clause, [highlight="#ffff11"]WHERE and SELECT can be executed in parallel. Changes are applied serially to the database in these cases[/highlight].
It's not documentation, but the source is good.

I always assumed you could get parallel plan for an UPDATE. What you posted makes sense and would explain why we can't. I noticed that even the traceflag can't get an update to run in parallel.

I mentioned you because I know, from other forum posts, that you have used makeparellel().

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Sergiy SSC Guru Points: 110208 More actions · Answer 10

jdbrown239 (3/5/2015)
I have the task of running multiple updates to a table with a row count of 107 million.

That's sounds plain wrong.

There should not be such task.

If not a secret - what does it have to achieve?

_____________
Code for TallyGenerator

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 11

What is the cost of the query as it currently exists?

Setting your threshold to 5 might work, but only if the cost of that query is over 5. If it's under 1, you can't make it be parallel. If it's over 1, set the cost threshold to 1 and you may see it go parallel.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning