Determining cause of timeouts and transaction log growth

  • Hello,

    I've recently run into some problems with SQL Server 2000, and can't seem to find out what's causing it.

    Every once in a while (about a week, but sometimes every day), one db experiences time-outs. The only way to resolve this is by either stopping/starting SQL Server, or booting the server itself.

    The DB is about 300Mb in size, and the TL about 600Mb (normally). On inspection, the TL grows to about 10Gb when this porblem occurs.

    As this is a production server, and the problem is not reproducable on any development server, i'm somewhat limited in resources to find out what is causing this behaviour.

     

    I've tried reading through several backups of the TL itself with "ApexSQL Log" (http://www.apexsql.com), but after reading a few thousand records, the program crashes. (or, Windows closes it without warning).

     

    Are there ways to determine the cause of this behaviour, or are there steps i can take to make sure this problem disappears?

     

    TIA.

     

  • What is the recovery model you are using ? You might consider to change your recovery model to either Simple or Bulk-Logged if you do not really need Full Recovery model.

    Also, at what time of the day does the time out occurs ? I mean are you doing any bulk inserts or index creation/reindexing when the problem occurs ?

    Look closely at the activities which are balloning your T-Log to a huge size.

    --Kishore

     

  • When you suspect the problem is occurring I would start SQL Profiler and capture traces that you can review.

    Show Plan All, Execution Plan, possibly Show Plan text (all under Performance), SP:StmtCompleted under Stored Procedures and if you have any TSQL SQL:StmtCompleted under TSQL

    Set the trace to write to a file (for reviewability later) and start the trace.  You may be having a specific runaway statement.

    Also, I would look at locks/blocks to help determine what table (If any) is experiencing a lot of locks (CTRL+2) or sp_lock.  I would also see what sp_who and sp_who2 are doing to help determine if you can narrow down the scope of failure.

    Once you have your trace you should be able to look through it and possibly find table scans etc....

    Oh and for the record...  Had to do this last week and about a month ago.  A contractor had put code in that said WHERE fieldx <> @var AND fieldx <> 'literal'  Turned out once our data got to a certain point this started table scanning.  Once this was changed to WHERE fieldx = 'differentliteral' the system started using indexes again.....



    Good Hunting!

    AJ Ahrens


    webmaster@kritter.net

  • Thanks for the answers!

    We're using the Full Recovery model. For the moment, i'd rather not change the recovery model until i'm certain what the root of the problem is.

     

    Usually, the effects start around 8.15 in the morning. At that time, the DB itself isn't used as much as during the day. We don't do any bulk-inserts, and i have disabled the (automatic) reindexing as part of the maintenance plans.

     

    I've considered doing traces with SQL Profiler, but unfortunately i can't determine beforehand when the problem is going to occur, and leaving the trace active would slow down performance too much.

    (I don't know whether there's a way to do a trace without the massive performance degradation?)

     

    I've also tried using the 'sp_blocker_pss80' stored procedure from MS, to give insight into blocks/locks, (can't find the url right now), but the stored procedure doesn't run anymore, once the problem arrises.

     

     

  • You could set your log to the minimum size where you know it won't grow under normal operations (600mb?), create an alert to fire when the log fills, and optionally automatically fire your profiler trace in the response section of the alert.  Obviously, this won't catch it at the very beginning, but should be early in the process.  I tend to agree with AJ that it could be a specific runaway statement.

    Also, be sure to run sp_who2 anytime you catch the situation occurring.  Look for unusually high values for CPUTime and DiskIO in user connections to your database.  Once found, you can execute DBCC Inputbuffer (spid) to see what it is doing.

    Steve

  • Forgot to add to last post...  You need to add BinaryData to columns that you are capturing for the plans to be saved....



    Good Hunting!

    AJ Ahrens


    webmaster@kritter.net

  • Thank you, again.

     

    It looks like the trace parameters you mentioned wouldn't affect performance too greatly, so i'm running the trace for now.

    I'll also try the sp_who2/inputbuffer once the problem arrises again.

     

    So far, the problem hasn't re-occured yet, but i'll post the details once it has.

    (And hopefully with a solution as well)

     

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply