September 22, 2008 at 8:58 am
Hi,
Attached are performance stats for server 1. Server1 is connected to a SAN drive. Performance over the last year or so has been good. 3 weeks ago user machines started hanging. I noticed that the logs and the data files for the databases attached were on the same SAN drive, so i moved the log files to a different drive(local to the server) and kept the data on the SAN. I generated the stats attached. What could be the probem? Is it the SAN drive which is the problem? Do more disks need to be added to the raid set? Am i reading this right?
Please help.
September 22, 2008 at 9:06 am
What does hanging means here, is it the whole application or the part of the application, and if yes, then what part of the day...
if it is at specific time, check if any background processes are running at the same time, monitor the CPU usage.
September 22, 2008 at 9:06 am
You can also check sp_who2 and see if there is any blocking happening!
September 22, 2008 at 9:07 am
Hi
What RAID are u using, what's the SQL Configuration, what's the performance issue you are facing, since then some one can suggest that the counters that you have provided is fine or we need to look into some more counters.
More over does the counters remains same most of the times???
Ouch so many Questions ( ha ha ha)
Cheers
🙂
September 22, 2008 at 9:20 am
What do the following two counters look like?
Physical disk: Avg sec/read
Physical disk: Avg sec/write
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 23, 2008 at 12:44 am
The Avg. disk sec/read is between 0.005 to 0.020. The Avg disk sec/write is between 0.000 and 0.008. When i meant hanging, i was talking of the call centre application that people use when working with the databases on the server. I used sp_lock when the problem started and there were no locks then. There are no locks now. No background processes seem to be running or massive jobs. The application hangs at intermittent times during the day but i know for a fact that from 3.45pm in the afternoon to around 5pm, this problem occurs as some jobs that run during that time are slowed down.
September 23, 2008 at 1:10 am
Those look good.
Can you monitor the stats as well as watching for blocking during the known slow time?
Perhaps also run profiler for a while and see if you can correlate random slowdowns to certain queries running against the server.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 24, 2008 at 3:17 am
I have attached a trace for you to help me with. I did this trace on the 16th of september this month when the problem was happening. I only put the maximum number of rows in the Excel sheet.
To elminate the SAN drive as the problem, I took one database and moved it to 2 local drives on the server(one for the data and the other for the log). Users worked ok for the day or so. The jobs that ran at 4 ran quick. A couple of minutes later, perormance started to go haywire on both the SAN(where the majority of the databases are) and on the data drive of the database that i had moved. I could not identify any blockages.
September 24, 2008 at 7:33 pm
No flame wars please, but in my experience, as a rule of thumb, never never use SAN for databases.
NAS on the other hand is great!
The network stack is not the right place to be putting that kind of I/O.
I have no doubt there are some wonderfully clever people out there who can make SANs work for databases, but my experience over the last 10 years with this kind of kit has been that NAS > Direct attached disk > SAN
September 29, 2008 at 4:21 am
Hi,
Just to add to the problem i raised and which still continues, machines started hanging this morning. I took a caption of the sp_who2 results. The results are attched. Ignore those spids that are blocking others because for me thats not where, in my opinion, the problem is. Thats not were my issue is. I noticed that there yellow highlighted row is consuming a lot of disk time and cpu time. What is checkpointing? Is the process being done on the disk and how can it be avoided in any way while people are workin?
Thanks
Tendayi
September 29, 2008 at 10:24 am
Hi
This sounds Crazy, is that an in House application, if yes was there any changes made in the past week, since i have experienced the same problem, no offence, but mostly developers don't take database load into consideration, is that a multi thread application??
Cheers
🙂
September 29, 2008 at 11:58 am
I would put the profiler on around the time and write this to a table and see what is happening at the exact time.
Is the SQL going down or just connections not reaching the database.
There are some black box logs you can set up too to see what is happening too.
--Trace Files
SELECT * FROM SYS.TRACES
--Set up the BLACK BOX
--Run this first to create the BlackBox Trace
DECLARE @TraceId int
DECLARE @maxfilesize bigint
SET @maxfilesize = 25
EXEC sp_trace_create
@TraceId OUTPUT,
@options = 8,
@tracefile = NULL,
@maxfilesize = @maxfilesize
EXEC sp_trace_setstatus @TraceId, 1
--Run this to create the Stored Procedure
USE master
GO
CREATE PROCEDURE StartBlackBoxTrace
AS
BEGIN
DECLARE @TraceId int
DECLARE @maxfilesize bigint
SET @maxfilesize = 25
EXEC sp_trace_create
@TraceId OUTPUT,
@options = 8,
@tracefile = NULL,
@maxfilesize = @maxfilesize
EXEC sp_trace_setstatus @TraceId, 1
END
GO
--Run ths command to start on server startup
EXEC sp_procoption
'StartBlackBoxTrace', 'STARTUP', 'ON'
--Use this to get the details of the black box
SELECT * FROM fn_trace_gettable('C:\Program Files\Microsoft SQL Server\MSSQL.7\MSSQL\DATA\blackbox.trc', default);
GO
Change above to the path you set this up on.
This can help to some degree to see what going on.
Is there anything in logs .
September 29, 2008 at 12:05 pm
There some large durations here
exec PaymentFrequencyGetAll
and forsa2000000
exec [c3sabc]..sp_procedure_params_rowset N'ContactNumberUpdate',1,NULL,NULL
exec [c3vvm]..sp_procedure_params_rowset N'DebtorDetailGet',1,NULL,NULLC3C3User24443056
This seems like it taking forever to get the data from the SAN to me..............
Has anyone ran a fragmentation on the SAN disks at all to see how badly it is fragmented.
Why are these durations so high. I guess that where i would start.
September 29, 2008 at 7:17 pm
Checkpointing is a process where all the "dirty pages" (changed data) are written to the disk. It is described fairly well in the SQL help.
The main cause of issues around this seem to be Long Running Transactions. When they finally commit they can cause issues with checkpoints. It may be worth reviewing any transactions that are active for more than a few minutes.
September 30, 2008 at 7:31 am
Hi,
I ran the procedure which you sent. It showed me a number of queries that had been run, their start time but with no end time. I also picked up entries where the severity level was 15 and 16. If the entries don't have an end time, what does that mean?
The server with the problem was last defragged 2 weeks ago so that rules out that option.
Thanks
Tendayi
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply