Tracing a performance issue

  • Hello,

    We have a SQL agent job that runs daily. It runs a procedure, which for the purposes of this post, I will refer to as 'Procedure A'. That procedure runs other procedures.

    Every few months we get an issue where it appears to have stalled. Task manager is showing the SQL Server process at close to 100% so it doesn't seem to be non SQL things taking server resources.

    I'm struggling to find the cause and would appreciate some suggestions on what to do next time to try and find the cause or what concept I've not understand and need to go and learn about (I have done some research to try and work it out myself but not found the answer).

    I use the following query to try and establish what's happening:

    SELECT er.session_id,es.host_name,es.program_name, login_name, er.status,command, er.blocking_session_id,er.wait_type,er.wait_time, 
    DB_NAME(er.database_id) AS db_name,
    text, query_plan
    FROM sys.dm_exec_sessions es
    INNER join sys.dm_exec_requests er ON es.session_id=er.session_id
    outer APPLY sys.dm_exec_sql_text(er.sql_handle)
    outer APPLY sys.dm_exec_query_plan(er.plan_handle)
    WHERE er.session_id<>@@SPID
    AND host_name IS NOT null

    This is the relevant columns from the results:

    Sessions

    The wait_type for session 52 makes sense, it's waiting for the other sessions.

    Session 77 - That doesn't appear to be waiting for anything but it had been running for hours when it should be minutes.

    I suspect session 77 is the root cause of the issue but I can't prove it. I wonder if it's taking all of the cpu cores but I've not found a way to prove that?

    I thought about the possibility of a bad execution plan (the data changes every day) but if I look at the plan it looks the same as usual so I don't think it's that but I could be wrong.

    The wait_times for sessions 53 and 54 are much higher than they would normally be (the procedures should complete in a few minutes). If my understanding is correct CXpacket and cxconsumer are parallelism. Maybe they are waiting cpu cores but then they would be in status runnable?

    If locks were an issue then I think the sessions would show as being blocked?

    Or possibly my query could be excluding the real cause. I have tried doing things like just looking at the tables individually and not found anything that looks relevant. Or possibly I'm taking completely the wrong approach?

    My current fix is to stop the SQL agent job and restart it. It then completes OK but I'd like to properly understand what's happening.

    Thanks

  • The SELECT INTOs will hold locks.  Those might be causing issues.

    One good way to avoid that is to an initial SELECT TOP (0) ... INTO to create the table, then use INSERT INTO ... SELECT to actually load the table.  You can still get minimal logging, if otherwise available, but you won't hold as many meta locks in SQL Server that way.

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • There are a couple things in your post that may need more clarification:

    as_1234 wrote:

    Task manager is showing the SQL Server process at close to 100% so it doesn't seem to be non SQL things taking server resources.

    Are you looking at in Task Manager by logging onto the server?  Why?  A server is not a desktop.  There are few reasons to RDP into a server as a DBA.  Your RDP session may be contributing to the issue.

    SQL Server process in Task Manager should be close to 100% in memory usage.  That's how its designed to work.  If CPU is close to 100%, is that normal?  You are looking at a small slice of time in Task Manager.

    as_1234 wrote:

    The wait_times for sessions 53 and 54 are much higher than they would normally be (the procedures should complete in a few minutes). 

    Wait times from the sys.dm_exec_requests DMV may not correspond to the actual execution time of a procedure.  This is the definition from MS's documentation: If the request is currently blocked, this column returns the duration in milliseconds, of the current wait. Not nullable.

    as_1234 wrote:

    If my understanding is correct CXpacket and cxconsumer are parallelism. Maybe they are waiting cpu cores but then they would be in status runnable?

    CXPacket and CSConsumer wait types indicate that there is parallelism occurring.  They are not typically a problem.  There is a LOT of bad advice about these wait types.

    as_1234 wrote:

    If locks were an issue then I think the sessions would show as being blocked?

    A lock is normal.  All RDMS need to take locks.  The system will not function without locks. "Block" is not one of the statuses you will get from your query.  These are the only statuses in sys.dm_exec_requests:

    background

    rollback

    running

    runnable

    sleeping

    suspended

    Deadlocks are a potential issue. A lock is absolutely required.

    as_1234 wrote:

    My current fix is to stop the SQL agent job and restart it. It then completes OK but I'd like to properly understand what's happening.

    This is not a fix.  By restarting, you may be losing relevant information to diagnose this issue.

    Have you ruled out settings, other processes, and so forth? Is this a virtual machine?  If so, is the VM configured properly?  It appears that you are digging into the weeds when you need to start at a higher level.

    Have you captured the execution plans, and operational stats, for the various executions of the query and compared them?  Have you noted what other things were executing when this was slow? Query Store will help with this.

    I suggest you look at these things:

    Much more detail on wait types

    https://www.sqlskills.com/help/waits/

    This should be installed on all servers.

    http://whoisactive.com/

    A wealth of information about your system from the DMV queries.

    https://glennsqlperformance.com/

    First Responder kit. What does sp_blitz and sp_blitzcache show?

    https://www.brentozar.com/first-aid/

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply