Need help diagnosing a problem

  • I've recently completed the latest version of a web application (asp.net 2.0 and SQL2005) which is now giving me grief.

    It worked perfectly in development, and beta testing, but when deployed to the live server it fails.

    It works happily for a few hours, then suddenly comes apart in a mass of "System.Data.SqlClient.SqlException, Timeout expired" exceptions

    The system is almost exclusively reading data rather than writing, with the exception of an internal table used to log traffic. The asp.net uses an asyncronous thread to call a stored procedure which writes the log entry, but the bulk of the interaction between website and database is adhoc SQL SELECT statements via ADO.NET.

    The applications error handling system reports nothing amiss at all until the first timeout so I'm having trouble isolating a cause.

    I've been over the code and re-examined the beta testing. I've done some new stress-testing on the development version with OpenSTA and I cant duplicate the problem.

    I am pretty certain it's not a connection leak but I've disabled connection pooling anyway to no effect.

    Running sp_who when it fails gives me a list of blocked INSERT processes but no obvious deadlock.

    I've added "WITH (NOLOCK)" to the bulk of the sql commands and it's improved performance but not solved the problem.

    The log table is the largest in the database at about 2million records right now, but the other tables have no more than a few thousand records each. The site's getting about 10,000 hits per day but the server is a resonably hefty quad core with plenty of RAM and theres buckets of unused bandwidth so I cant believe the traffic alone would be enough to take it down.

    I'm inclined to think its a deadlock somewhere but I'm not sure how to find it let alone fix it.

    Can anyone recommend a diagnostic approach I can use to isolate the problem? Just a place to start, or a tool to use would be helpful.

  • Paul

    "exception of an internal table used to log traffic"

    Can you describe this in full detail please, including table structures and SQL?

    Thanks.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Sure.

    The log table looks like this;

    CREATE TABLE [dbo].[EventLog](

    [Id] [int] NOT NULL,

    [EventType] [varchar](10) NULL,

    [UserId] [int] NULL,

    [EventDate] [datetime] NULL,

    [AdditionalInfo] [varchar](200) NULL,

    [MediaId] [int] NULL,

    [TransactionId] [int] NULL,

    [OriginalMediaId] [int] NULL,

    [OtherTableName] [varchar](50) NULL,

    [OtherTableId] [int] NULL,

    [IPAddress] [varchar](15) NULL,

    [IPCountryCode] [varchar](3) NULL,

    [Referrer] [varchar](200) NULL,

    [SuspectedCheating] [bit] NULL,

    CONSTRAINT [PK_EventLog] PRIMARY KEY CLUSTERED

    (

    [Id] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,

    IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    and the stored procedure that writes to it is like this;

    CREATE PROCEDURE [dbo].[InsertEventLog]

    @EventType varchar(10),

    @userid int,

    @MediaId int = NULL,

    @TransactionId int = NULL,

    @OriginalMediaId int = NULL,

    @IPAddress varchar(15) = NULL,

    @IPCountryCode varchar(3) = NULL,

    @OtherTableName varchar(50) = NULL,

    @OtherTableId int = NULL,

    @AdditionalInfo varchar(200)=NULL,

    @referrer varchar(200) = NULL,

    @CheatWindow int=1

    AS

    BEGIN

    SET NOCOUNT ON;

    DECLARE @SuspectCheating bit,

    @NewId int

    BEGIN TRANSACTION

    --decide if they're cheating

    IF @IPAddress is not null AND @IPAddress <> ''

    AND (@EventType='WATCH' OR @EventType='WATCHAD' OR @EventType='CLICKAD')

    BEGIN

    IF exists(select * FROM EventLog WITH (NOLOCK)

    WHERE EventType=@EventType AND IPAddress=@IPAddress

    AND EventDate>DATEADD(n,-1 * @CheatWindow,GETDATE()) AND MediaId=@MediaId

    AND (EventType='WATCH' OR ((EventType='WATCHAD' OR EventType='CLICKAD')

    AND OriginalMediaId=@OriginalMediaId)))

    set @SuspectCheating = 1

    ELSE

    set @SuspectCheating = 0

    END

    ELSE

    BEGIN

    set @SuspectCheating = 0

    END

    --get next available id

    UPDATE Ids SET lastid=lastid+1 WHERE tablename ='EventLog'

    if @@rowcount = 0

    begin

    INSERT INTO Ids(tablename, lastid) VALUES('EventLog',1)

    end

    SELECT @NewId=lastid FROM Ids WHERE tablename = 'EventLog'

    --insert eventlog record

    INSERT INTO EventLog VALUES(@NewId, @EventType, @userid, GETDATE(),

    @AdditionalInfo, @MediaId, @TransactionId, @OriginalMediaId, @OtherTableName,

    @OtherTableId, @IPAddress, @IPCountryCode, @referrer, @SuspectCheating)

    COMMIT

    END

    I'm not using an identity column because of some unrelated issues with another part of the system.

    The first part of the procedure is just used to identify repeat log entries for accounting purposes.

  • I'd like to ask what is differnet between the development/test environments and the production envirnoment.

  • You'll get a small but noticeable improvement by moving the BEGIN TRAN to between these two lines

    --get next available id

    UPDATE Ids SET lastid=lastid+1 WHERE tablename ='EventLog'

    Give eventlog its own ID table

    UPDATE Ids SET lastid=lastid+1 WHERE tablename ='EventLog'

    You would be much better off using identity columns...

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • In terms of software, none.

    They're both windows 2003 servers, sql2005 sp2 etc

    Hardware is completely different, the development "server" is a high-spec PC, with nowhere near the power of the quadcore rack servers used in production.

    There is one significant difference in structure though. The application consists of 2 related websites linked to the same database and the production system hosts the 2 sites on 2 different servers with the SQL on the slightly higher spec machine. The dev system is entirely on one box.

    There's a GB network connection between the 2 production servers, independent of their WAN interfaces

  • Is that procedure the only procedure doing any inserts? Can you confirm that the inserts that you see that are blocked are all related to this procedure?

    If so, that is what I would expect to see - since you have serialized the inserts into those tables. The way this is written - you can only insert one row at a time, one process at a time.

    I agree with Chris - you would be much better off using identity.

    With identity, you would need to the transaction - and you would not be serialized on the inserts.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Paul (12/19/2008)


    It works happily for a few hours, then suddenly comes apart in a mass of "System.Data.SqlClient.SqlException, Timeout expired" exceptions

    Sounds like the data volumes in prod are causing perf problems. Can you run profiler (or better, a server-side trace) for maybe an hour to see what the worst performing queries are. Typically if you pick 3-5 of the worst performing and optimise those a lot of the problem goes away.

    Post the queries here if you need help optimising.

    Running sp_who when it fails gives me a list of blocked INSERT processes but no obvious deadlock.

    A deadlock wouldn't cause a timeout. It would result in error 1203. "The process was deadlocked with another and selected as the deadlock victim...."

    I've added "WITH (NOLOCK)" to the bulk of the sql commands and it's improved performance but not solved the problem.

    And maybe caused data accuracy problems. Nolock hides locking/blocking/performance problems, it doesn't fix them.

    UPDATE Ids SET lastid=lastid+1 WHERE tablename ='EventLog'

    if @@rowcount = 0

    begin

    INSERT INTO Ids(tablename, lastid) VALUES('EventLog',1)

    end

    SELECT @NewId=lastid FROM Ids WHERE tablename = 'EventLog'

    This kind of structure is very, very bad for concurrency. You're pretty much ensuring that only one insert can occur in the system at a time.

    Any reason you went with this over identity columns?

    p.s. Can you please fix the lack of wrapping in the code block that you posted.?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I'm not concerned about data accuracy in this system, so the NOLOCKs shouldnt be a problem. The main data is updated at infrequent intervals, and its not critical that it be accurate anyway.

    Hmmm... I hadnt seriously considered that pure performance could cause the problem. I might be able to switch the log table over to an identity column, it isnt under exactly the same rules as the rest of the data. I may even be able to dispense with the id column entirely, although it goes against the grain to lose the primary key. There are a couple of non-unique column indexes on the log table I could consider living without too.

    Honestly if it is performance then I'm worried. The site gets about 10,000hits per day right now. but we have some new contracts coming up that we're expecting to see that jump as far as 50,000 or even 100,000 in the new year.

    I'll start with some more detailed performance analysis and see if it helps any.

    Thanks all.

  • Paul (12/20/2008)


    Honestly if it is performance then I'm worried.

    That probably is the root cause. Timeout simply means that the query took longer than the client app was willing to wait. You can increase the timeout on the client system, but the default is 30 sec (for .net), which is a really long time for a query to take. I doubt many users will be happy waiting 30 sec+ for a query to return.

    Your comments about seeing a list of blocked processes is also a concern. It's probably a result of the pseudo-identity design, and it's going to cause issues on all tables. A well performing system doesn't have a list of blocked processes. That's a symptom that something is wrong.

    I may even be able to dispense with the id column entirely, although it goes against the grain to lose the primary key. There are a couple of non-unique column indexes on the log table I could consider living without too.

    Why are you considering dropping the pk and the indexes? All tables should have a primary key. The indexes should only be dropped if you're sure they're not been used at all.

    I'll start with some more detailed performance analysis and see if it helps any.

    If you're having this kind of problem now with a 10x usage increase expected, it may be worth your while to get someone in to look purely at performance, especially if you don't have those kind of skills in-house.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Paul (12/20/2008)


    I'm not concerned about data accuracy in this system, so the NOLOCKs shouldnt be a problem. The main data is updated at infrequent intervals, and its not critical that it be accurate anyway.

    Hmmm... I hadnt seriously considered that pure performance could cause the problem. I might be able to switch the log table over to an identity column, it isnt under exactly the same rules as the rest of the data. I may even be able to dispense with the id column entirely, although it goes against the grain to lose the primary key. There are a couple of non-unique column indexes on the log table I could consider living without too.

    Honestly if it is performance then I'm worried. The site gets about 10,000hits per day right now. but we have some new contracts coming up that we're expecting to see that jump as far as 50,000 or even 100,000 in the new year.

    I'll start with some more detailed performance analysis and see if it helps any.

    Thanks all.

    If you're not worried about data accuracy, why perform the read anyway?

    You should be collecting wait stats through perfmon to see why things are running slow. Also, as Gail has already said, you should collect Profiler data for a time, again to see how things perform. You'll see exactly how long the queries are taking. As Gail has said, you should have a primary key. However, it doesn't have to be the ID column. It can be two or three other columns taken together to define a unique record. Clustering on that will distribute the data better and you may see much better performance.

    Execution plans would be nice.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (12/22/2008)


    If you're not worried about data accuracy, why perform the read anyway?

    I meant that the data is updated so infrequently that I dont think the costs of record locking outweigh the performance gain from ignoring it.

  • --get next available id

    UPDATE Ids SET @NewId = lastid = lastid+1 WHERE tablename ='EventLog'

    if @@rowcount = 0

    begin

    INSERT INTO Ids(tablename, lastid) VALUES('EventLog',1)

    end

    -- SELECT @NewId=lastid FROM Ids WHERE tablename = 'EventLog'


    N 56°04'39.16"
    E 12°55'05.25"

  • Paul (12/22/2008)


    I meant that the data is updated so infrequently that I dont think the costs of record locking outweigh the performance gain from ignoring it.

    Thing is, if using nolock makes things faster it means there were locks in place that it read through and hence you're not reading static data. If there were no data modifications, there would be no blocking and no lock wouldn't gain you anything.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • From what I've read, there's an overhead associated with checking for locks that you incur even if there aren't any.

    Adding NOLOCKS is supposed to skip that.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply