Need help diagnosing a problem

Question

Post reply

Need help diagnosing a problem

Paul-962113

SSC Journeyman

Points: 83
More actions
December 19, 2008 at 11:51 am

#92906

I've recently completed the latest version of a web application (asp.net 2.0 and SQL2005) which is now giving me grief.
It worked perfectly in development, and beta testing, but when deployed to the live server it fails.
It works happily for a few hours, then suddenly comes apart in a mass of "System.Data.SqlClient.SqlException, Timeout expired" exceptions
The system is almost exclusively reading data rather than writing, with the exception of an internal table used to log traffic. The asp.net uses an asyncronous thread to call a stored procedure which writes the log entry, but the bulk of the interaction between website and database is adhoc SQL SELECT statements via ADO.NET.
The applications error handling system reports nothing amiss at all until the first timeout so I'm having trouble isolating a cause.
I've been over the code and re-examined the beta testing. I've done some new stress-testing on the development version with OpenSTA and I cant duplicate the problem.
I am pretty certain it's not a connection leak but I've disabled connection pooling anyway to no effect.
Running sp_who when it fails gives me a list of blocked INSERT processes but no obvious deadlock.
I've added "WITH (NOLOCK)" to the bulk of the sql commands and it's improved performance but not solved the problem.
The log table is the largest in the database at about 2million records right now, but the other tables have no more than a few thousand records each. The site's getting about 10,000 hits per day but the server is a resonably hefty quad core with plenty of RAM and theres buckets of unused bandwidth so I cant believe the traffic alone would be enough to take it down.
I'm inclined to think its a deadlock somewhere but I'm not sure how to find it let alone fix it.
Can anyone recommend a diagnostic approach I can use to isolate the problem? Just a place to start, or a tool to use would be helpful.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

Paul

"exception of an internal table used to log traffic"

Can you describe this in full detail please, including table structures and SQL?

Thanks.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Paul-962113 SSC Journeyman Points: 83 More actions · Answer 2

Sure.

The log table looks like this;

CREATE TABLE [dbo].[EventLog](

[Id] [int] NOT NULL,

[EventType] [varchar](10) NULL,

[UserId] [int] NULL,

[EventDate] [datetime] NULL,

[AdditionalInfo] [varchar](200) NULL,

[MediaId] [int] NULL,

[TransactionId] [int] NULL,

[OriginalMediaId] [int] NULL,

[OtherTableName] [varchar](50) NULL,

[OtherTableId] [int] NULL,

[IPAddress] [varchar](15) NULL,

[IPCountryCode] [varchar](3) NULL,

[Referrer] [varchar](200) NULL,

[SuspectedCheating] [bit] NULL,

CONSTRAINT [PK_EventLog] PRIMARY KEY CLUSTERED

(

[Id] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,

IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,

ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY]

and the stored procedure that writes to it is like this;

CREATE PROCEDURE [dbo].[InsertEventLog]

@EventType varchar(10),

@userid int,

@MediaId int = NULL,

@TransactionId int = NULL,

@OriginalMediaId int = NULL,

@IPAddress varchar(15) = NULL,

@IPCountryCode varchar(3) = NULL,

@OtherTableName varchar(50) = NULL,

@OtherTableId int = NULL,

@AdditionalInfo varchar(200)=NULL,

@referrer varchar(200) = NULL,

@CheatWindow int=1

AS

BEGIN

SET NOCOUNT ON;

DECLARE @SuspectCheating bit,

@NewId int

BEGIN TRANSACTION

--decide if they're cheating

IF @IPAddress is not null AND @IPAddress <> ''

AND (@EventType='WATCH' OR @EventType='WATCHAD' OR @EventType='CLICKAD')

BEGIN

IF exists(select * FROM EventLog WITH (NOLOCK)

WHERE EventType=@EventType AND IPAddress=@IPAddress

AND EventDate>DATEADD(n,-1 * @CheatWindow,GETDATE()) AND MediaId=@MediaId

AND (EventType='WATCH' OR ((EventType='WATCHAD' OR EventType='CLICKAD')

AND OriginalMediaId=@OriginalMediaId)))

set @SuspectCheating = 1

ELSE

set @SuspectCheating = 0

END

ELSE

BEGIN

set @SuspectCheating = 0

END

--get next available id

UPDATE Ids SET lastid=lastid+1 WHERE tablename ='EventLog'

if @@rowcount = 0

begin

INSERT INTO Ids(tablename, lastid) VALUES('EventLog',1)

end

SELECT @NewId=lastid FROM Ids WHERE tablename = 'EventLog'

--insert eventlog record

INSERT INTO EventLog VALUES(@NewId, @EventType, @userid, GETDATE(),

@AdditionalInfo, @MediaId, @TransactionId, @OriginalMediaId, @OtherTableName,

@OtherTableId, @IPAddress, @IPCountryCode, @referrer, @SuspectCheating)

COMMIT

END

I'm not using an identity column because of some unrelated issues with another part of the system.

The first part of the procedure is just used to identify repeat log entries for accounting purposes.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

I'd like to ask what is differnet between the development/test environments and the production envirnoment.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

You'll get a small but noticeable improvement by moving the BEGIN TRAN to between these two lines

--get next available id

UPDATE Ids SET lastid=lastid+1 WHERE tablename ='EventLog'

Give eventlog its own ID table

UPDATE Ids SET lastid=lastid+1 WHERE tablename ='EventLog'

You would be much better off using identity columns...

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Paul-962113 SSC Journeyman Points: 83 More actions · Answer 5

In terms of software, none.

They're both windows 2003 servers, sql2005 sp2 etc

Hardware is completely different, the development "server" is a high-spec PC, with nowhere near the power of the quadcore rack servers used in production.

There is one significant difference in structure though. The application consists of 2 related websites linked to the same database and the production system hosts the 2 sites on 2 different servers with the SQL on the slightly higher spec machine. The dev system is entirely on one box.

There's a GB network connection between the 2 production servers, independent of their WAN interfaces

Jeffrey Williams SSC Guru Points: 90313 More actions · Answer 6

Is that procedure the only procedure doing any inserts? Can you confirm that the inserts that you see that are blocked are all related to this procedure?

If so, that is what I would expect to see - since you have serialized the inserts into those tables. The way this is written - you can only insert one row at a time, one process at a time.

I agree with Chris - you would be much better off using identity.

With identity, you would need to the transaction - and you would not be serialized on the inserts.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

Paul (12/19/2008)
It works happily for a few hours, then suddenly comes apart in a mass of "System.Data.SqlClient.SqlException, Timeout expired" exceptions

Sounds like the data volumes in prod are causing perf problems. Can you run profiler (or better, a server-side trace) for maybe an hour to see what the worst performing queries are. Typically if you pick 3-5 of the worst performing and optimise those a lot of the problem goes away.

Post the queries here if you need help optimising.

Running sp_who when it fails gives me a list of blocked INSERT processes but no obvious deadlock.

A deadlock wouldn't cause a timeout. It would result in error 1203. "The process was deadlocked with another and selected as the deadlock victim...."

I've added "WITH (NOLOCK)" to the bulk of the sql commands and it's improved performance but not solved the problem.

And maybe caused data accuracy problems. Nolock hides locking/blocking/performance problems, it doesn't fix them.

UPDATE Ids SET lastid=lastid+1 WHERE tablename ='EventLog'
if @@rowcount = 0
begin
INSERT INTO Ids(tablename, lastid) VALUES('EventLog',1)
end
SELECT @NewId=lastid FROM Ids WHERE tablename = 'EventLog'

This kind of structure is very, very bad for concurrency. You're pretty much ensuring that only one insert can occur in the system at a time.

Any reason you went with this over identity columns?

p.s. Can you please fix the lack of wrapping in the code block that you posted.?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Paul-962113 SSC Journeyman Points: 83 More actions · Answer 8

I'm not concerned about data accuracy in this system, so the NOLOCKs shouldnt be a problem. The main data is updated at infrequent intervals, and its not critical that it be accurate anyway.

Hmmm... I hadnt seriously considered that pure performance could cause the problem. I might be able to switch the log table over to an identity column, it isnt under exactly the same rules as the rest of the data. I may even be able to dispense with the id column entirely, although it goes against the grain to lose the primary key. There are a couple of non-unique column indexes on the log table I could consider living without too.

Honestly if it is performance then I'm worried. The site gets about 10,000hits per day right now. but we have some new contracts coming up that we're expecting to see that jump as far as 50,000 or even 100,000 in the new year.

I'll start with some more detailed performance analysis and see if it helps any.

Thanks all.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

Paul (12/20/2008)
Honestly if it is performance then I'm worried.

That probably is the root cause. Timeout simply means that the query took longer than the client app was willing to wait. You can increase the timeout on the client system, but the default is 30 sec (for .net), which is a really long time for a query to take. I doubt many users will be happy waiting 30 sec+ for a query to return.

Your comments about seeing a list of blocked processes is also a concern. It's probably a result of the pseudo-identity design, and it's going to cause issues on all tables. A well performing system doesn't have a list of blocked processes. That's a symptom that something is wrong.

I may even be able to dispense with the id column entirely, although it goes against the grain to lose the primary key. There are a couple of non-unique column indexes on the log table I could consider living without too.

Why are you considering dropping the pk and the indexes? All tables should have a primary key. The indexes should only be dropped if you're sure they're not been used at all.

I'll start with some more detailed performance analysis and see if it helps any.

If you're having this kind of problem now with a 10x usage increase expected, it may be worth your while to get someone in to look purely at performance, especially if you don't have those kind of skills in-house.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 398913 More actions · Answer 10

Paul (12/20/2008)
I'm not concerned about data accuracy in this system, so the NOLOCKs shouldnt be a problem. The main data is updated at infrequent intervals, and its not critical that it be accurate anyway.
Hmmm... I hadnt seriously considered that pure performance could cause the problem. I might be able to switch the log table over to an identity column, it isnt under exactly the same rules as the rest of the data. I may even be able to dispense with the id column entirely, although it goes against the grain to lose the primary key. There are a couple of non-unique column indexes on the log table I could consider living without too.
Honestly if it is performance then I'm worried. The site gets about 10,000hits per day right now. but we have some new contracts coming up that we're expecting to see that jump as far as 50,000 or even 100,000 in the new year.
I'll start with some more detailed performance analysis and see if it helps any.
Thanks all.

If you're not worried about data accuracy, why perform the read anyway?

You should be collecting wait stats through perfmon to see why things are running slow. Also, as Gail has already said, you should collect Profiler data for a time, again to see how things perform. You'll see exactly how long the queries are taking. As Gail has said, you should have a primary key. However, it doesn't have to be the ID column. It can be two or three other columns taken together to define a unique record. Clustering on that will distribute the data better and you may see much better performance.

Execution plans would be nice.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Paul-962113 SSC Journeyman Points: 83 More actions · Answer 11

Grant Fritchey (12/22/2008)
If you're not worried about data accuracy, why perform the read anyway?

I meant that the data is updated so infrequently that I dont think the costs of record locking outweigh the performance gain from ignoring it.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 12

--get next available id

UPDATE Ids SET @NewId = lastid = lastid+1 WHERE tablename ='EventLog'

if @@rowcount = 0

begin

INSERT INTO Ids(tablename, lastid) VALUES('EventLog',1)

end

-- SELECT @NewId=lastid FROM Ids WHERE tablename = 'EventLog'

N 56°04'39.16"
E 12°55'05.25"

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 13

Paul (12/22/2008)
I meant that the data is updated so infrequently that I dont think the costs of record locking outweigh the performance gain from ignoring it.

Thing is, if using nolock makes things faster it means there were locks in place that it read through and hence you're not reading static data. If there were no data modifications, there would be no blocking and no lock wouldn't gain you anything.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Paul-962113 SSC Journeyman Points: 83 More actions · Answer 14

From what I've read, there's an overhead associated with checking for locks that you incur even if there aren't any.

Adding NOLOCKS is supposed to skip that.