URGENT Please help: Queries timeout for about 10 minutes several times a day.

  • Hi

    I hope some genius here can help me. For some reason queries that normally run fine (in a fraction of a second) are timing out (30 sec's+) This seems to happen for about 10 minutes until things return to normal. It would only appear to be one table and I believe you can still read from it but just can't use UPDATE/INSERT/DELETE.

    I'm assuming profiler is the way to identify this but I'm not having much success configuring it.

    My best (unverified) guess is that it's a delete/clear option we have on the site because it's the only thing that can take time to complete but it could be something else I've missed. There are there are lots of smaller deletes and updates going on every second. The big delete query is below...

    DELETE FROM tblBigTable WHERE myId=xyz

    In this case myId is the primary key. Would this query block write access to the whole table whilst it completed?

    Help would be very much appreciated!

  • How many rows are affected in this delete? Would it be possible to convert this delete query to delete in batches, in case the number of affected rows is extremely high?

  • The big delete could be thousands or rows. Not sure how I would split it up. Is there some kind of DELETE TOP 1000 command?

  • Also I've been checking through the profiler results and I cannot see any mention of the big delete query but that could be because I've configured profiler incorrectly.

    I am using the default template to and filtering duration for 10000. Would this show queries that timeout or just those that complete after a long time?

  • It sounds like it's probably an issue around blocking. Monitoring trace events might not get you very far. You need to check the processes that are causing the block and that will help you address the issue. There are mulitple blocking monitoring scripts available in the Scripts section here on SSC that will help. Bare minimum, if you start to experience the blocks again, run sp_who2 to see who is blocking the other processes. You can then determine what that process is doing and adjust accordingly.

    Depending on the size of the delete, it will have to lock pages rather than just rows, across a wide swath of the table, which is mich more likely to cause blocking issues. If you're deleting that many rows based on the primary key and the pk is clustered you may also be experiencing issues with performance caused by fragmentation. If you don't have a fragmentation analysis routine built in, you'll want to get on that too.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • david meagor (3/29/2010)


    Also I've been checking through the profiler results and I cannot see any mention of the big delete query but that could be because I've configured profiler incorrectly.

    I am using the default template to and filtering duration for 10000. Would this show queries that timeout or just those that complete after a long time?

    The default event trace template (and you're not running the Profiler gui directly on your under-stress production server, right?) won't capture timeouts. Most of the time we monitor based on execution complete, but a timeout event doesn't complete. If you are going to try to monitor these events from trace events, you'll want to monitor starts as well as completes.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thank you for your help with this. It is very much appreciated!

    I figured the profiler wasn't giving the whole picture but at the moment it's the only way I can see when the problem is occurring. Yes it is running on the production server, I have nothing else to run it on as it's a remote/hosted server.

    I will definitely use the sp_who2 command next time there is an issue.

    I was speaking with someone else who suggested that I should add more keys to the table thought this seemed strange to me as I can't see any normal reason to add them.

    The table currently has a primary and 2 indexes. Typically queries use either a primary key or indexed column and one of the others in the queries. The table is about 20GB and adding more indexes will significantly effect the size (and likely take hours to reindex!)

    ans_id Primary KEY

    session_id KEY

    form_id KEY

    page

    qId

    qsubId

    numericAnswer

    textAnswer

    Examples:

    DELETE FROM tblBigTable WHERE session_id = 123456789 AND page = 2

    DELETE FROM tblBigTable WHERE form_id=123456 AND qu_id=615651

  • Without a more exact definition on the indexes on the table and, preferably, execution plans, it's hard to say whether or not you need indexes or the current ones are adequate. You need to take advice (even mine) with a great deal of caution. Adding an index might help, but the wrong index will hurt, so it's better to know you need one and where you need it prior to just slapping one on.

    BTW, a remote server is even a worse candidate for monitoring with Profiler than a local one would be. Read this blog entry[/url] and, more importantly, the links.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The dmv, sys.dm_exec_requests can also be helpful in analyzing blocking.

  • Hopefully because I'm only looking at a small number of transactions in the profiler (those over 15secs) there won't be a big performance drain.

    Can you recommend any third party software that could monitor for these kinds of blocks and alert me?

    I'm sitting here watching the screen waiting for the problem to reoccur so I can run the sp_who2 command which is just silly. Even better if the software would help diagnose the problem in some way, perhaps by automatically collecting some info when it registers the block/timeouts.

  • One of the events that you can track is the Errors and Warnings:Blocked Process Report in sql profiler.

    By default this event won't be thrown because it's actually a report that needs to be generated, but by executing :

    sp_configure 'show advanced options', 1 ;

    GO

    RECONFIGURE ;

    GO

    sp_configure 'blocked process threshold', <number of seconds as blocking threshold> ;

    GO

    RECONFIGURE ;

    GO

    you can then monitor the events.

    Edit: Supposedly you can also set up an alert that can be emailed, paged or whatever when a block passes your threshold. This will at least get you so you don't have to sit in front of your machine waiting for the problem to come back.

    Another edit: Make sure you turn it off afterward also

    sp_configure 'show advanced options', 1 ;

    GO

    RECONFIGURE ;

    GO

    sp_configure 'blocked process threshold', 0 ;

    GO

    RECONFIGURE ;

    GO

    Tim Januario

  • OK It just went down so I ran the sp_who2 command several times.

    There are 4 queries (all INSERTS and DELETES) listed with BlkBy numbers 100. When I check the SPID 100 process it has a BlkBy number of 61. SPID61 is the SELECT command below.

    If that's right then they are actually waiting for a SELECT command to finish. However I just ran the select query again and it competed in 1 second so now I'm completely confused. Something must have stopped the SELECT query but it didn't show any BlkBy no.

    I had profiler running at the same time picking up anything over 15seconds) and this is what it lists the SPID's and durations as (they both completed in the same second, the SELECT started first as you'd expect.)

    SPID61

    reads: 147526 cpu: 422 duration: 253745

    SELECT tblBigTable.qId, tblBigTable.qu_sub_id, tblBigTable.numericAnswer, tblBigTable.textAnswer, tblBigTable.session_id, tblSession.dateStamp,tblSession.tracking,firstname,lastname,email,cust,lcode FROM tblBigTable ,tblSession LEFT JOIN tblParticipant on tblParticipant.session_id=tblSession.session_id WHERE tblBigTable.session_id=tblSession.session_id AND tblBigTable.form_id = 712587 ORDER BY session_id ASC

    SPID 100.

    Reads: 16 writes: 12: duration: 144233

    DELETE FROM tblBigTable WHERE form_id=732707 AND qId=5910957

  • First, go to the scripts section here at SSC. It's over on the left labeled scripts. search through there and you should find several different well rated scripts that will monitor for blocking. That's really all you need.

    Second, sounds like you spotted the culprit. So the question is, what does the execution plan for that query look like? Just because it runs quick when there's no load doesn't mean it runs quick all the time, but you're hitting multiple processes blocking each other, so it's a pretty serious performance situation.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for all of your help so far. I think the cause has now been found. The error occurred twice more during the night and each time the same query as before had been run and appeared to be holding the rest up.

    I don't really get why a SELECT query would lock out a table. That would seem like a rather fundamental flaw for anyone that needs a long running query on a production server! Is this normal? Do other SELECT queries block tables whilst they complete or is it just the LEFT JOIN part?

    Looking at the execution plan it shows...

    83% doing a KEY LOOKUP on a the LEFT JOIN to tblParticipant (In these cases there were no matching records)

    8% doing KEY LOOKUP (clustered, primary) on tblBigTable

    7% doing CLUSTERED INDEX SEEK on tblSession

    Any ideas what I can do to stop this?

  • david meagor (3/30/2010)


    Thanks for all of your help so far. I think the cause has now been found. The error occurred twice more during the night and each time the same query as before had been run and appeared to be holding the rest up.

    I don't really get why a SELECT query would lock out a table. That would seem like a rather fundamental flaw for anyone that needs a long running query on a production server! Is this normal? Do other SELECT queries block tables whilst they complete or is it just the LEFT JOIN part?

    Looking at the execution plan it shows...

    83% doing a KEY LOOKUP on a the LEFT JOIN to tblParticipant (In these cases there were no matching records)

    8% doing KEY LOOKUP (clustered, primary) on tblBigTable

    7% doing CLUSTERED INDEX SEEK on tblSession

    Any ideas what I can do to stop this?

    The select is what's preventing the exclusive lock needed for the delete from doing it's thing. It's the way SQL Server is wired and, by & large, it's a good mechanism.

    Without knowing more about the structure and the execution plan, I'm not sure what to do to speed things up. When you say key lookup, do you mean that you're seeing the key lookup operator, meaning a bookmark lookup operation is under way? If so, then I'd look to see if I could make whatever index is being used into a covering index or, possibly, reverse it, and make the clustered index key include the needed columns that are being used by the nonclustered index, but that's blind speculation and worth...ooh, not too much.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply