February 13, 2017 at 11:43 am
You may want to take a look at what is going on with the schedulers as well since it can potentially show a few different issues. Something like: select scheduler_id, current_tasks_count, runnable_tasks_count,
current_workers_count, active_workers_count,
work_queue_count, pending_disk_io_count, load_factor, failed_to_create_worker
from sys.dm_os_schedulers
where status = 'VISIBLE ONLINE'
And execute it throughout as sustained high values are usually more important than a spike. You'd especially want to do that if runnable task count is high.
When you primarily see cxpackets waits and nothing else is running on the box, that's pretty much what is to be expected for a parallel plan and nothing else executing.
But when running parallel, there is usually one process in there that is not waiting on cxpackets and that would be the one to pay attention to. You probably want to focus on things other than just the cxpacket waits since those are normal.
Have you looked at the disk stats at all? I would probably take a look at those as well.
Sue
February 13, 2017 at 1:02 pm
What's the compatibility level of the database?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 13, 2017 at 1:12 pm
Hi Gail,
The compatibility level of the test server that completes is SQL Server 2008(100). On the new 2016 and 2012 production server where it never completes, it is 110.
Thanks!
Bea Isabelle
February 13, 2017 at 1:51 pm
Isabelle2378 - Monday, February 13, 2017 11:14 AMHi,
I agree on the comments that MAXDOP is not the solution or that CXPACKETS is the problem. I'm trying to figure out why it's behaving this way on my production server but having no problems completing on the test server. This is why I restored the same exact database to another test server, has the same memory, the same MAXDOP settings (default 0) and getting 2 different results. My 2008R2 test server completes in 40 minutes (3GB memory, MAXDOP 0) and my 2016 test server never finishes. Just hangs and the only thing I see is suspended waiting on CXPACKETS. There is no load on this server, it is brand new and no one is using it but me for this testing. So I'm trying to figure out what am I missing? The indexes are all the same, I rebuilt anything that was fragmented over 30%, updated statistics on both databases. It's the same exact query so what else can I look at on this 2016 server to try and see why it won't complete? I was hoping to get some fresh eyes to maybe point out something obvious that I'm just not seeing. Both systems are running collation Latin1_General_BIN. I asked the analyst to check the language setting of the application to see if he see's anything different with the way the 2 queries are running the code. Any other thoughts?
Some smart watches have more than 3GB as standard. Have you checked the minimum and minimum recommended memory requirements of your server versions? Why so little RAM when it's do cheap? We use test servers with 》200x this amount. 3GB is asking for trouble.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
February 13, 2017 at 4:17 pm
Hi,
Our production server has 48GB of memory. The test servers only have 16GB of memory and this has a named instance that has 3GB allotted to it. The test server with only 3GB of memory has no problems with the query completing. We are only using these servers to test it out so memory is not really a concern here. The production server with 48GB is having the issue with it not completing and my 2016 test box that I setup is just for testing.
I have looked at the disks and see latency on the production server:
Drive Volume Mount Point Read Latency Write Latency Overall Latency Avg Bytes/Read Avg Bytes/Write Avg Bytes/Transfer
S: S:\ 17 3 4 448146 57218 64094
K: K:\ 88 28 87 74360 71404 74264
G: G:\ 4 400 201 65104 65584 65343
And here's on the test server that completes the query okay:
Drive Volume Mount Point Read Latency Write Latency Overall Latency Avg Bytes/Read Avg Bytes/Write Avg Bytes/Transfer
W: W:\ 224 1 5 904816 13143 25745
K: K:\ 80 140 82 1004123 40982 962235
Thanks!
Bea Isabelle
February 13, 2017 at 7:12 pm
Sue
February 14, 2017 at 2:29 am
Isabelle2378 - Monday, February 13, 2017 1:12 PMHi Gail,
The compatibility level of the test server that completes is SQL Server 2008(100). On the new 2016 and 2012 production server where it never completes, it is 110.
Shouldn't make a difference with those settings, none of those are using the new cardinality estimator (available from compat mode 120)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 15, 2017 at 3:33 pm
Try this
--I don't know the length of your key
DECLARE @T TABLE (Xkey VARCHAR(100));
It would be helpful if you could give a bit more information, for example, why are you inserting into #Hindex index, then selecting 1 record from it, which is then deleted?
If these are worker tables, any chance you can move any of those temp tables to In-Memory? You will likely see some performance improvements if you have memory to spare.
Wes
(A solid design is always preferable to a creative workaround)
February 15, 2017 at 3:59 pm
Hi,
The query is coming from an application so I have no control over how it is written and I cannot modify it. I am using the same exact query on all 3 servers. Here is details about each server...my main concern is to try and get the query to complete on the Production SQL 2012 server. Our analyst uses the 2008R2 test server and has no problems running the query. I'm only using the 2016 test server because it is duplicating the problem that I'm seeing on the 2012 production server. This way I can test different things out to try and fix the problem and not have anyone else using the server.
Test Server àquery completes OK. No issues here
SQL 2008 R2 SP3 named instance with 3GB memory
Windows Server 2008R2 Standard
Logical CPU Count: 2 Hyperthread Ratio: 1 Physical CPUCount:2
MAXDOP:0
Production Server àquery never completes…CXPACKET waits
SQL 2012 named instance with 10GB memory
Windows Server 2012 R2 Standard
SQL Server detected 2 sockets with 4 cores per socket and 4logical processors per socket, 8 total logical processors; using 8 logicalprocessors based on SQL Server licensing.
MAXDOP was set 0. Query never completed…set MAXDOP: 6 improved CPU usage but query stillnever completes.
Test Server I am using for testing à query never completes…CXPACKETwaits
SQL 2016 named instance with 3GB of memory
Windows Server 2012 R2 Standard
SQL Server detected 2 sockets with 2 cores per socket and 2logical processors per socket, 4 total logical processors; using 4 logicalprocessors based on SQL Server licensing.
MAXDOP: 0
Thanks!
Bea Isabelle
February 16, 2017 at 2:17 am
As I suggested before, I don't think CXPACKET is the root cause. What other wait types do you get for the same process? Please run this while you're waiting (fruitlessly) for the query to complete:
WITH tasks AS (
SELECT *, COUNT(*) OVER (PARTITION BY session_id) AS threadcount
FROM sys.dm_os_waiting_tasks
)
SELECT * FROM tasks
WHERE threadcount > 1
AND session_id IS NOT null
John
February 16, 2017 at 6:55 am
Isabelle2378 - Wednesday, February 15, 2017 3:59 PMThe query is coming from an application so I have no control over how it is written and I cannot modify it.
Can you reduce the number of processors available to SQL Server on the Test server temporarily? That is not a permanent solution but is just something to try. If that eliminates the problem and you can't change the query, then you'll need to go back to the application developer and let them know the query needs to be changed to better support a multi-core server. CXPacket waits are just a symptom of the problem.
Wes
(A solid design is always preferable to a creative workaround)
February 16, 2017 at 8:16 am
For reference, here are the two query plans on PasteThePlan:
Query_that_completed_40_minutes: https://www.brentozar.com/pastetheplan/?id=SJklWSXFl
Query_not_finishing_still_running: https://www.brentozar.com/pastetheplan/?id=S1OmbS7Fl
February 16, 2017 at 1:51 pm
Sorry wrong question
February 16, 2017 at 2:16 pm
Ethan.Hinton - Thursday, February 16, 2017 8:16 AMFor reference, here are the two query plans on PasteThePlan:Query_that_completed_40_minutes: https://www.brentozar.com/pastetheplan/?id=SJklWSXFl
Query_not_finishing_still_running: https://www.brentozar.com/pastetheplan/?id=S1OmbS7Fl
Same as last time - two completely different plans. I think you should resolve this before tackling the performance issue.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
February 16, 2017 at 5:31 pm
Hi,
Based on the fact that it looks like two completely different queries are running for TST and Production, I went ahead and restored the Test database to the 2016 test server and ran that query against it and it completed in 50 minutes. So it's definitely something with the production data and stored procedure that is the problem. The analyst assured me that they were the same but after seeing the query plans that you posted, he looked further and came back that he does see that they are different. Thanks everyone who responded and helped pinpoint the issue. I truly appreciate all the help! 😀 Chris, thanks for posted the plans using PasteThePlan. I've never seen that before and it was very helpful!
Thanks!
Bea Isabelle
Viewing 15 posts - 16 through 29 (of 29 total)
You must be logged in to reply to this topic. Login to reply