Parallel Inserts

  • Hi

    Still got issues with a long running stored procedure; it runs over night, pulls 10.5m rows. Inserts into a table, from a select (so "insert...select", rather than "select into"), from many tables, grouping on a max. It's complex.

    During the day, it runs fine - maybe 25 minutes. At night it *sometimes* runs fine, but then sometimes takes 4hours.

    Checking this morning there were 230 threads open for this one query. Checking sys.dm_os_tasks t

    and sys.dm_os_waiting_tasks there were no other wait types on that session_id. None at all. Checking activity monitor, most of the existing threads were suspended on the insert.

    There are 24 cores, but NUMA'd. We have maxdop on the server of 8. The maxdop option on the query is 12, just to speed up the select. Index and Stats refreshed daily. We've eight identical tempdb data files, on a separate spindle. Checking those, they are filling up using the round robin correctly

    Would the delay be due to SQL trying to combine so many 'select' threads into one 'insert' thread (as it can't insert in parallel; 2014 can, apparently. Upgrades not available!)

    Should i change the SP to run the select into a temp table (table variable?) with a maxdop of 12, then do the insert into the actual table using a maxdop of 1. Checking the execution plans for a table variable implies the subtree cost comes down from 2.5m to 357k. What's best - temp table or table variable?

    many thanks

    pete

  • Can you post the estimated execution plan as a .sqlplan attachment please? And when you get the opportunity, the actual plan too.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I can. Only problem with that of course is it's completely unsanitized. Even the table/column names can give away a lot of information.

    pete

  • IIRC SQL Sentry Plan Explorer has an "obfuscation" option designed with you in mind, Pete. It would help us a ton if you can do it.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • have two plans. Sanitized (good little product that!)

    Obviously the names should speak for themselves.

    Many thanks

  • Thinking myself, one difference of course is that @tables don't allow parallelism, whereas #tables do

    As the whole issue appears to be an issue of SQL getting confused with many parallelised threads being pushed into a single thread for an insert, it could be i just need to use maxdop 1 on the original script.

    There may be a hit on the select, but we'll gain it back on the insert. Potentially.

    thoughts?

  • I don't see the full plan when I load it here, it gets chopped at this point:

    INNER JOIN Database_107.dbo.tblDatabase_107ExchangeRate AS XR WITH (NOLOCK)

    ON O.Column_39 = XR.XRateCurrID

    AND XR.XRateTypeID = 1

    AND XR.XrateColumn_5 = ScenarioYear.Column_5

    AND XR.Column_211 = S.Column_35

    AND XR.Column_212 = S.Column_37

    -- Get exchange rate from MadeBy of item costing to USD

    INNER JOIN Database_107.dbo.tblDatabase_107ExchangeRate AS XR2 WITH (NOLOCK)

    ON O

    I've placed the two exchange rate joins together to compare them but can't 🙁

    It looks as if the joins for this table (XR and XR2) might be missing a predicate, the row counts go haywire off of their respective operators.

    Also, there are numerous Compute Scalars and the properties sheets are missing the "defined values" row. Can you tell us what nodes 20, 31 and 34 show in this field?

    Conditional joins like this

    INNER JOIN Database_107.dbo.tblDatabase_107Organisation AS O -- 1075 rows

    ON CASE WHEN ORG.Column_485 IN ('SP', 'LRD') THEN S.Column_8 ELSE BOM.Column_8 END = O.Column_8

    -- Retrieve currency from the MadeBy Column_8 of the item in costing table

    -- Get the correct JDE Plant code for the transaction

    INNER JOIN /*(

    SELECT DISTINCT Column_8, Column_661 FROM Database_107Staging.dbo.tblSTGOrganisationXref WHERE Column_1 = 1

    )*/ Table_630 AS OrgXRef -- 130 rows

    -- If the transaction is in Jazz scope then we consider the LRCM

    ON CASE WHEN ORG.Column_485 IN ('SP', 'LRD') THEN S.JazzLRCMColumn_8 ELSE S.Column_7 END = OrgXRef.Column_8

    can really screw things up. You should try splitting each one into two left joins and resolving in your output.

    Finally, I think it would help a great deal if you could run the SELECT statement in isolation and grab the actual plan. Costs are blasted by the table insert at 97%.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • here's the select sqlplan

    I've checked with changing the maxdop to 1 for the existing script

    original (maxdop 1) 666433 (subrow cost)

    original (maxdop 12) 688366

    @table 98225

    So the @table still appears to be the way to go, even if we can't use parallelism.

  • I appreciate conditional joins aren't the best. Apparently it's a legacy of buying a new system that bears no resemblance to the original system, and then trying to run them in parallel for a period of time, whilst joining the data up for the cube...

    Hopefully - long term - the original system can be got rid of, and all will be sweetness and light.

    The short term plan is to alter the scripts, but i really just need to get consistency on the job duration before I can investigate solving it. Politics innit.

  • This is an estimated parallel plan - estimating over 600 million rows output.

    Are your stats up to date?

    Any chance of an actual plan please? - just the SELECT. Ensure your stats are up to date first. Cheers.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply