Parallelism, cxpacket, working threads....

  • Hi everybody.

    I'm working with a new customer that is experiencing performance problem on some sql server instances. I analyzed part of workloads running on those instances, and I'm sure that the most part of the problem is because of parallelism.

    The typical workload of these systems is like data warehouse, all instances are on VMWARE virtual machine.

    The underlying infrastructure is really high performing: I can see on more instances, in the very same moment, throughput of 300 MB/s.... wow!

    The problem is not cxpacket wait invents by themselves, analysing a single query I can find that the CXPACKET wait is, in the most cases, higher than the elapsed time.

    I'm sure that statistics are good and updated.

    In order to deeply check for indexing I would have to analyse hundreds of queries and I'm not supposed to do it.

    I can observer a high number of deadlock for intra parallelism query.

    I googled a lot, but I have to say that i did not find many interesting things.

    Do you know queries,scripts to demonstrate that a system is parallelising too much? The customer does not want to modify MAX_DEGREE_OF_PARALLELISM or other parameters. So I have to produce a report they make evidence of the problem. It is "simple" for a single query o for a single batch, but I have to show the same for an entire workload and I can not put in trace the entire database ( In half an hour, with a minimally configured sql server profiler, I get 10 GB trace file...). I can not use the test environments because there are to many resources difference between test and production infrastructures.

    Can someone help me?

    Another question: in your opinion why Microsoft is enabling parallelism by default? Im an experienced Oracle DBA and on Oracle database parallelism is disabled by default. the 90% of programmers that tried it, are not using it. With sql server instance I find a great amount of problems because of wild parallelism. When I create new instances, I put it at a value of 1 (you can override it with maxdop hint)

    =========================================================
    I'm always happy to hear your opinion!
    Senior Oracle DBA
    Sql Server "learner" and... enthusiast!

  • CXPacket is not a problem. CXPacket is a sign that you have queries running in parallel, nothing more. Since as soon as a query goes parallel there's a controller thread that incurs CXPacket waits all through the processing, it's not at all unusual to have CXPacket wait time more than the query's execution time.

    Start by increasing the server's cost threshold for parallelism. 5, the default, is stupidly low. 25 is better, hell anything above 5 is better. Then monitor for expensive queries. That's ones doing a lot of reads, taking a lot of time or using a lot of CPU. Tune those queries. That may be a combination of tuning indexes to support the queries or changing the queries to be more efficient and/or use the indexes better.

    Use a server-side trace for the monitoring. DO NOT ever run the Profiler GUI against a busy production server, it can cause horrible performance degradation. Yes, it can be a lot of data. You can filter out the 0 duration, 0 read, 0 CPU queries that usually cuts the size down. Make sure you're tracing the minimum number of events & columns necessary.

    Are you sure those deadlocks are intra-query parallelism deadlocks? Not just deadlocks involving parallel queries?

    https://www.simple-talk.com/sql/performance/sql-server-deadlocks-by-example/

    Edit: And please, stop disabling parallelism. Server-wide MAXDOP 1 is a poor idea in most cases.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I'm going to pretty much echo and agree with everything that Gail said about parallelism, MAXDOP and CXPACKET.

    CXPACKET is NEVER the problem. You need to look at the other waits on the system and see what the other wait types are. They are the cause of the performance problem.

    The through put that other VMs are getting is basically meaningless unless they are on the same LUN as they could be on flash and you are on spinning disks.

    Or it could have nothing at all to do with disk, and you need to add indexes to the system (I'm betting this is the case). What do the missing index DMVs tell you. When you dumped the plan cache and started looking at the plans did they tell you about problems with indexes, stats, scans, etc?

  • Thanks!

    I really don't always put max dop=1 ;).

    I know that CXPACKET by themselves is not an issue, but I was quite sure that unbalanced execution time between parallel threads was a bad thing.

    I'll try to rise the parallelism threshold but I have hundreds of queries with incredible costs ( more or less, 50/60 have a cost between 10.000 and 15.000). So values like 20, 25, are still ridiculous in this case. I think these queries need to be tuned but at the moment the customer want not spend on this. At the moment I can only try to fully describe the causes of the effects it is experiencing.

    I found that the actually the maxdop parameter is set to 8 and to sql server has been assigned 32 cpu.

    Another fact is that in sys.dm_os_wait_events view of these instances I always find that for the event CXPACKET, signal_wait_time_ms is 25%/35% of wait_time_ms + signal wait time ms. signal_wait:time_ms increases when 4/5 of these big queries are running and the database is involved to perform intensive I/Os.

    Yes, I'm sure I'm not experiencing "normal" deadlocks. Occasionally, at peak moments, some query executed in parallel fails due to intra-query parallelism deadlocks.

    =========================================================
    I'm always happy to hear your opinion!
    Senior Oracle DBA
    Sql Server "learner" and... enthusiast!

  • Giova (7/7/2014)


    I'll try to rise the parallelism threshold but I have hundreds of queries with incredible costs ( more or less, 50/60 have a cost between 10.000 and 15.000). So values like 20, 25, are still ridiculous in this case

    I think you may have missed the point of that change. It isn't to disable parallelism. It's to stop the cheap queries from paralleling inappropriately. Queries of costs in the tens of thousands probably should be running in parallel as it'll be faster.

    I found that the actually the maxdop parameter is set to 8 and to sql server has been assigned 32 cpu.

    That sounds OK.

    Another fact is that in sys.dm_os_wait_events view of these instances I always find that for the event CXPACKET, signal_wait_time_ms is 25%/35% of wait_time_ms + signal wait time ms. signal_wait:time_ms increases when 4/5 of these big queries are running and the database is involved to perform intensive I/Os.

    I assume you mean sys.dm_os_wait_stats.

    Ignore CXPacket. It is not the problem. You need to look at the other waits the system is experiencing.

    Yes, I'm sure I'm not experiencing "normal" deadlocks. Occasionally, at peak moments, some query executed in parallel fails due to intra-query parallelism deadlocks.

    How are you determining that the deadlocks are 'intra-query parallelism'? Those should be extremely rare.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Giova (7/7/2014)


    I think these queries need to be tuned but at the moment the customer want not spend on this. At the moment I can only try to fully describe the causes of the effects it is experiencing.

    About all you can do at that point is tell the customer that tuning will fix it. Pull a single plan from cache throw it in SSMS or Plan Explorer and just toss the graphic in the report with the scans.

    Giova (7/7/2014)


    I found that the actually the maxdop parameter is set to 8 and to sql server has been assigned 32 cpu.

    That's probably fine. I wouldn't push it any higher as you probably only have 8 cores per NUMA node anyway.

    Giova (7/7/2014)


    Another fact is that in sys.dm_os_wait_events view of these instances I always find that for the event CXPACKET, signal_wait_time_ms is 25%/35% of wait_time_ms + signal wait time ms. signal_wait:time_ms increases when 4/5 of these big queries are running and the database is involved to perform intensive I/Os.

    Yes, I'm sure I'm not experiencing "normal" deadlocks. Occasionally, at peak moments, some query executed in parallel fails due to intra-query parallelism deadlocks.

    You could spend all day trying to solve this stuff. Until the indexing problems are solved you have no idea if these numbers are really bad or not. And as Gail said, ignore all the CXPACKET stuff. It doesn't mean anything to you.

  • Hello, thank you for your reply.

    In the last days I spent some times making three of these database more stable and I accomplish this goal.

    What I have done:

    increase the statistics deep for some big tables;

    increased the query threshold for parallelism

    decreased the MAXDOP at instance level

    analysed some queries

    Let me focus on the last point

    I found queries with higher estimated costs and check them for estimated and effective execution plans. Basically I asked for some changes to the development and I added some indexes/asked to remove some others. I asked to add maxdop (1) option to six queries that was really high cpu consuming and that we could not manage in other ways in this short period. This decision has been drive by the fact that those queries, as they are executed at the moment, did not reduce their execution time increasing MAXDOP.

    All in all I achieved the following results:

    OS Counters about disc I/O operations are now at acceptable values

    pageiolatch events are decreased

    data loads perform in acceptable time and have predictable duration

    The third point is what matter.

    I'm very confused about some aspects of SQL Server point of view of a database workload, but I'm going to write a post about it.

    Regards

    Giovanni

    P.S.

    I'm sure about the deadlock for intra query parallelism because of messages in SQL Server error log. It's a sql server 2005 instance :(.

    =========================================================
    I'm always happy to hear your opinion!
    Senior Oracle DBA
    Sql Server "learner" and... enthusiast!

  • Hallo Gail and MRdenny.

    I totally understood your point of view but I have difficult to fully understand it. Reading blog articles and your replies to my questions I can resume in this way what I found:

    Sql server is supposed to execute query in parallel. This is because of two instance parameters, cost and degree. De default values are basically wrong: the one for cost made sense some years ago, the one for degree has always to be changed.

    Microsoft give us guidelines about the degree, based on the number of cores and other particular things (DAC,...)

    Following this guidelines I always have queries supposed to use parallelism. ???. If it's true, is also true that developers have to write queries with parallelism in mind. They have to add indexes on tables, at least on those tables read by big queries candidates for parallelism because of their costs and their operators but that are not performing good using parallel plan. For these queries I have to add indexes in order to reduce their costs or I/O. It's good in general, but what if these queries run in an acceptable time doing an acceptable I/O and are not locking excessively? Their only problem is that they run in the same time with maxdop(1) and with maxdop(n) (obviously in a certain range), if n i 4 or 5 I'm consuming 4 or 5 CPUs to do the work that I can do with only one CPU in the same time...

    In SQL server I MUST take care of them! Immediately. I can write it in different way, I can change other things. But in some cases I have to add indexes to make them faster ONLY BECAUSE on sql server they are supposed to run in parallel.

    I don't think it is wrong, I need to understand if this is the supposed approach.

    For Oracle database, we work in the opposite way (I'm not speaking about exacta machines, but normal Oracle database):

    if I'm asked to tune a query or a workload, I can find queries that perform better in parallel. In this case I let them run in parallel.

    This is because the sql server way to deal with parallelism seems a funny things to me. But I like sql server.

    =========================================================
    I'm always happy to hear your opinion!
    Senior Oracle DBA
    Sql Server "learner" and... enthusiast!

  • Giova (7/12/2014)


    De default values are basically wrong: the one for cost made sense some years ago, the one for degree has always to be changed.

    Cost threshold should always be changed, max degree of parallelism can be left at default in many cases.

    Following this guidelines I always have queries supposed to use parallelism.

    Yes, absolutely.

    If it's true, is also true that developers have to write queries with parallelism in mind.

    Not in general. There are some constructs (functions, table variables) that should be avoided as they prevent SQL from running the queries in parallel, but that's about all.

    They have to add indexes on tables, at least on those tables read by big queries candidates for parallelism because of their costs and their operators but that are not performing good using parallel plan. For these queries I have to add indexes in order to reduce their costs or I/O.

    Indexes should be added to support queries. That has nothing to do with parallelism, it's just about good performance.

    I can write it in different way, I can change other things. But in some cases I have to add indexes to make them faster ONLY BECAUSE on sql server they are supposed to run in parallel.

    Huh? You need to add indexes to support queries. If you don't they'll run slowly as SQL has to scan the entire table. Nothing to do with parallelism.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • nm.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply