Excessive run time

  • Hi

     

    I have a number of queries that select data and put the data into a table.

    Data flow from one database and is loaded into another table in another database using a SSIS data flow task.

    I've noticed that on a good day, it works through them all fairly quickly (around 30-40 mins)

    Occasionally, it sticks (CXCONSUMER waits) for a number of hours (up to 20!) on one of the extracts

    It does eventually work through

    I've attached one of the execution plans generated (they are all pretty similar)

    Is it the Parallelism that's causing this?

    If so, what's the best approach to make it more reliable

    Appreciate I could use something like MAXDOP 1 and this does change the plan but should I be applying different indexes?

    I've been dissuaded from throwing MAXDOP at issues

     

    Thanks

     

    - Damian

  • If memory serves, CXCONSUMER wait type is related to parallelism.  So, my approach to fix this is to adjust parallelism.

    Your options are pretty limited on how to fix it (from my understanding):

    1 - reduce MAXDOP

    2 - increase cost threshold for parallelism

    3 - reduce the query cost

    Indexes can help to reduce query cost, but if you are pulling all of the data, indexes are not going to help there and may hurt other query performance.

    All options have tradeoffs.  When I am patching my SQL instances, I like to review all queries that have query hints to see if the patches or upgrades correct the need for query hints.  In some cases, I've found that upgrading to a newer SQL version resulted in a better execution plan which removed the need for the query hint.  Other times, the query hint is a required hint (such as NOLOCK when selecting from a service broker queue).

    Increasing the cost threshold for parallelism affects the entire SQL instance, not just the one database or the one query.  So it may fix your problem but create 100 new ones.

    Reducing query cost is only good if you can reduce it below the cost threshold for parallelism and do so without removing any required data.  Indexes may reduce the cost, but if you have no WHERE clause or ORDER BY clause, they are unlikely to improve anything with your situation.  Plus, adding indexes can hurt the performance of other queries.

    My approach in your situation would be to evaluate if the cost threshold for parallelism is set to an appropriate value (hint - if it is 5, it is likely an incorrect value and you would benefit from bumping it up).  If it seems appropriate, I would evaluate the query.  Does it have parameters in it?  If so, you may benefit from adding query hints (even though I advised against it higher up) of OPTIMIZE FOR.  If your query is filtering the data as it comes in from your SELECT query, I would look at adding an index to work with that query. If this needs to be fixed in a very short period of time, I would use MAXDOP 1.  Another option (that I do NOT recommend) would be to use NOLOCK.  NOLOCK may result in duplicate data or missed data which is why I would not recommend it, but I have seen where it can fix things too.

     

    Now, all of the above being said, 30-40 minutes seems like a long time to pull data from table A to table B.  Are there any optimizations you can take to reduce that time or are you moving a LOT of data or doing a lot of calculations on the data as it moves?

     

    TL;DR version - I would increase cost threshold for parallelism if it is not set to an appropriate value (5 is too low).  If you can't do that, then I would look at optimizing the query (rewrite it to have a lower cost).  If you can't do that, I would investigate indexes presuming you are filtering the data as it comes in.  If the data is unfiltered, I would do MAXDOP 1.  And if none of those options were good and someone higher up than me was pushing me to fix it ASAP and they approved the potential for data inaccuracy, I would use NOLOCK.

    Note - if the data isn't changing at the time this is being run, NOLOCK may be safe.

    Lastly - it looks like you forgot (or couldn't) attach the execution plans.  So the above advice is generalized and not specific to your situation.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thanks

    Strange, it's not letting me attach the plan - 'Sorry, this file type is not permitted for security reasons.'

    Sure I've attached sqlplan files before

     

    This all makes sense

     

    Yes, I suppose the issue I'm trying to prevent is inconsistent very long running queries caused by parallelism

    There isn't really any filtering I can apply here

     

    In short, it basically just (inner) joins one big table (90M ish rows) to any other big table (90M ish rows) i.e. return if join can be made

    Then it extracts certain columns from the first table

     

    I believe (although I could be wrong) that MAXDOP just papers over cracks as this only runs slowly sometimes?

     

    Upgrading is an option as this runs on 2008 - I have 2016 available

    Suppose I could copy it over and see what plan it generates?

     

    As this is copying from a database that's built seperately, I'm reluctant to use NOLOCK as there will be the potential for dirty reads here. Appreciate this was in your note.

     

    Is there anything that can be changed in SSIS?

    - Damian

  • Are you sure the problem is in the extract portion - and not somewhere else?  You stated you are joining 2 tables to get the results - each table has 90+ million rows, and there is no filtering.  Are you performing a full refresh on the destination - basically a truncate and load or is there further processing done after the data is loaded?

    If that select is 'hanging' for up to 20 hours due to a parallel plan - then any processes accessing those tables are going to be blocked.

    If there is no way to filter the rows, is it possible to extract each table separately - and join on the destination?  That would allow for only selecting the key column(s) and the output columns from each table, which should be a lot faster on the extract part of the process.

    If the issue is not really in the extract portion, then there are ways to speed up the load.  If you are using the OLEDB destination - then you can set the batch and commit sizes to reduce the impact and generally speed up the process.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Since it is a JOIN, if that JOIN is on the PK/FK column, it may be already indexed.  If it isn't, you MAY get a boost by indexing on that column or columns and that may fix the parallelism issue where it suddenly runs slow.

    Now if you already have an index on that column or columns, then it likely won't do much.

    As for MAXDOP 1, from my experience, if a query runs slow sometimes and it is due to parallelism when it goes slow AND when it runs slow it is SUBSTANTIALLY slower (ie not going from 40 minutes to 45, but going from 40 minutes to 20 hours), then putting MAXDOP 1 (or adjusting cost threshold for parallelism which is usually the better solution) will remove the long run AND in some cases make your "fast" run faster!

    I would check the cost threshold for parallelism to make sure that it isn't set to 5 as that would indicate to me that it needs to be adjusted and it SHOULD help overall performance of the system, not just your one query.

    Another thing to check that I completely forgot about is the free memory on the server running SSIS.  This is important because SSIS runs in separate memory space from SQL Server.  So if your SSIS package needs 5 GB of memory and your server has 2 GB free, you are going to be paging to disk which is INCREDIBLY slow.  Even if your SQL instance has 50 GB of memory free, SSIS doesn't use SQL Server memory, it uses its own.  If one or more of the threads needs to page to disk, that will hurt performance drastically.

    As for running it on the 2016 instance, I'd say give it a shot.  My opinion, it never hurts to run it on a newer instance.  When you say running it on a 2016 instance, are you talking about SSIS or the TSQL?  If you mean SSIS, it is not likely to make a difference as the query to get the data runs on the SQL Server side, not the SSIS side.

    I am one of those who doesn't like using NOLOCK.  It does have its place or it wouldn't be a keyword in SQL Server; I just try to use it as sparingly as I can as every use case that I have had for it had other better solutions.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • This was removed by the editor as SPAM

  • The indexes could be critical.  Would you be able to post the DDL for the tables and the SQL queries?

    SQL 2016 could be a big help in another way, it would allow you to compress data.

    You may be able to post the query plan as xml rather than a ".sqlplan"?!  But I am not an expert on how to post things to SSC.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Have you  confirmed there are no other processes causing blocking ?

  • To be honest, this sounds like a case of the bad type of parameter sniffing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply