March 5, 2015 at 1:23 pm
Hi
I have been trying to the query optimizer to generate a parallel execution plan but no matter the MaxDOP (0) or Cost Threshold (5) settings I use it will only execute in serial. Any suggestions would be helpful.
UPDATE [dbo].[Targus_201412_V7_B]
SET [URBAN] =(
CASE
WHEN [METRO_STATUS] = 'Urban' THEN 1
ELSE 0
END)
March 5, 2015 at 2:26 pm
The query optimizer does not always get it right but seems to correct more often that not provided that your system is configured correctly. MAXDOP(0) won't do anything for you if Max Degree of Parallelism (server properties > advanced) is set to 0. Parallel is not always better.
That said, if you want to guarantee a parallel plan you can use make_parallel() by Adam Mechanic.
-- Itzik Ben-Gan 2001
March 5, 2015 at 3:04 pm
Thanks Alan
What is the syntax for make_parallel() ?
March 5, 2015 at 4:00 pm
jdbrown239 (3/5/2015)
Thanks AlanWhat is the syntax for make_parallel() ?
It's all in the article. Click on the differently colored "make_parallel" in Alan's post. It's a link to the article.
As a bit of a sidebar, why do you think you need to force-parallel such a query?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 5, 2015 at 4:07 pm
Oh dang! It does not appear that makeparallel() will help you get a parallel plan here...
First, here's a little background on the function:
Some time ago Paul White wrote an amazing article about how to force a parallel plan using an undocumented trace flag (8649). Here's a quick demo:
Using this sample data:
USE tempdb
GO
IF OBJECT_ID('tempdb.dbo.t') IS NOT NULL DROP TABLE dbo.t;
GO
CREATE TABLE dbo.t
(METRO_STATUS varchar(20) not null, URBAN int);
INSERT t (METRO_STATUS)
VALUES ('Urban'),('xxx'),('xxx'),('Urban');
You could use the traceflag like this:
SELECT t.*
FROM t
OPTION (RECOMPILE, QUERYTRACEON 8649);
Blam! Parallel query plan. The problem is that this is an undocumented and unsupported trace flag and therefore unacceptable in a production environment. (which is why I don't recommend it except for testing in non-prod environments)
Then Adam Mechanic came up with the aforementioned makeparallel() function that, until now, usually behaves exactly the same. To use it in a SELECT statement you would use CROSS APPLY like so...
SELECT t.*
FROM t
CROSS APPLY dbo.make_parallel();
Anyhow, I never tried makeparallel() in an update statement... It's easy with the traceflag but a little tricky with makeparallel... After some tinkering I came up with:
WITH tu AS
(
SELECT
METRO_STATUS,
URBAN
FROM t
CROSS APPLY make_parallel()
)
UPDATE tu
SET URBAN = CASE METRO_STATUS WHEN 'Urban' THEN 1 ELSE 0 END
This is getting me a parallel zone in the query plan but the the update is still serial. I am going to play with this a little more but I am beginning to think that makeparallel is the right solution here. I am going to tinker around with this a little and get back to you.
If anything I would read Paul's article and Adams article - they are super-informative.
Mr. Magoo... If you are out there perhaps you could chime in:w00t: you know much more about this than I do...
-- Itzik Ben-Gan 2001
March 5, 2015 at 5:21 pm
Alan.B (3/5/2015)
Mr. Magoo... If you are out there perhaps you could chime in:w00t: you know much more about this than I do...
Wow, I appreciate the kind thought, but I really don't agree in this case - I am but a student 🙂
This technet magazine TIP suggests that UPDATES are always serial, and I cannot find anything to suggest otherwise.
Certain types of statements cannot be processed in parallel unless they contain clauses, however. For example, UPDATE, INSERT, and DELETE are not normally processed in parallel even if the related query meets the criteria. But if the UPDATE or DELETE statements contain a WHERE clause, or an INSERT statement contains a SELECT clause, [highlight="#ffff11"]WHERE and SELECT can be executed in parallel. Changes are applied serially to the database in these cases[/highlight].
It's not documentation, but the source is good.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
March 5, 2015 at 6:51 pm
@alan Thanks Alan
The Paul White post is great! I learn something every time I post in this forum. I am going to setup a test using your example with make_parallel.
@mm Thank you also MM
The technet tip is also good and might explain why the update might always be serial.
I have the task of running multiple updates to a table with a row count of 107 million. Even after splitting the table in to smaller tables (about 38 million rows ea) the updates are still very time consuming.
I thank you all for your help.
Doug
March 5, 2015 at 9:13 pm
jdbrown239 (3/5/2015)
@Alan Thanks AlanThe Paul White post is great! I learn something every time I post in this forum. I am going to setup a test using your example with make_parallel.
@mm Thank you also MM
The technet tip is also good and might explain why the update might always be serial.
I have the task of running multiple updates to a table with a row count of 107 million. Even after splitting the table in to smaller tables (about 38 million rows ea) the updates are still very time consuming.
I thank you all for your help.
Doug
The problem is likely a thing called the "tipping point". Every system has one. On my older computer at home, the tipping point comes into play for over 3 million rows (depending on the width of the table). It takes just 3 seconds to update 1 Million rows. For 2 million rows it only takes 6 seconds and for 3 million rows it takes only 9 seconds. All as expected. For 4 million rows, though, it takes well over 2 hours because I've gone past the mysterious (related to the amount of memory available) but ever present "tipping point". You also have to consider what a mega-update does to the log file even if you're in the simple recovery mode.
I recommend breaking up the updates along with a proper index so you can have an expedient WHERE clause to filter out the rows that have already been updated. The index does not need to be permanent.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 6, 2015 at 8:00 am
In my experience you get parallel when a query has multiple objects where it can use separate threads to read separate objects. In an update statement you have locking to manage and a single thread locking at the table or extent level could be more efficient than many threads locking at page or row level. One thing worth considering is partitioning. A partitioned table is a good candidate for paralel queries although not sure about parallel updates. Only place I've seen partitioning used was on a 27 billion row, 2.5TB table with partitioning used for improving queries and index maintenance. With that size you are I/O constraint so it doesn't always matter how many threads are working on it. I cant recall if a single update statement that involved data in multiple partitions updated multiple partitions in paralel but I'm guesing it will. It would be fun to try if you have the time.
March 6, 2015 at 9:44 am
mister.magoo (3/5/2015)
Alan.B (3/5/2015)
Mr. Magoo... If you are out there perhaps you could chime in:w00t: you know much more about this than I do...Wow, I appreciate the kind thought, but I really don't agree in this case - I am but a student 🙂
This technet magazine TIP suggests that UPDATES are always serial, and I cannot find anything to suggest otherwise.
Certain types of statements cannot be processed in parallel unless they contain clauses, however. For example, UPDATE, INSERT, and DELETE are not normally processed in parallel even if the related query meets the criteria. But if the UPDATE or DELETE statements contain a WHERE clause, or an INSERT statement contains a SELECT clause, [highlight="#ffff11"]WHERE and SELECT can be executed in parallel. Changes are applied serially to the database in these cases[/highlight].
It's not documentation, but the source is good.
I always assumed you could get parallel plan for an UPDATE. What you posted makes sense and would explain why we can't. I noticed that even the traceflag can't get an update to run in parallel.
I mentioned you because I know, from other forum posts, that you have used makeparellel().
-- Itzik Ben-Gan 2001
March 7, 2015 at 2:46 am
jdbrown239 (3/5/2015)
I have the task of running multiple updates to a table with a row count of 107 million.
That's sounds plain wrong.
There should not be such task.
If not a secret - what does it have to achieve?
_____________
Code for TallyGenerator
March 9, 2015 at 6:46 am
What is the cost of the query as it currently exists?
Setting your threshold to 5 might work, but only if the cost of that query is over 5. If it's under 1, you can't make it be parallel. If it's over 1, set the cost threshold to 1 and you may see it go parallel.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply