May 7, 2008 at 4:28 pm
Hi,
We are experiencing database write slowdowns in our SQL Server 2000 (Standard) / Win2K3 server environment in Production. There is a corresponding Test instance which seems to be doing much better. So the problem seems to be specific to Production. In looking at the perfmon logs, it seems that the disk write latencies for the log files disk and the database file disk have increased from 6 ms to 24 ms. I am told that perfmon is not very accurate when it comes to reporting SAN utilization, but I am inclined to going by these numbers since that is the only way I can explain the progressive degradation in processing time for my application from 30 hours to 65 hours to 90+ hours now. There have been no application changes since the progressive degradation set in.
Of course IBM SAN support is all for disproving my claim, but honestly, since I administer the application (Informatica) on a Solaris server and the SQL 2000 Database on the Windows server, i can safely attest that there were no changes made to the application during this time. Is there something I could do to conclusively prove that SAN is not performing up to expectation?
Your help/input is greatly appreciated!
Leon
May 7, 2008 at 5:18 pm
What kind of SAN is it? Are there other servers using it also?
If so, then your slowdown could be caused by contention with those other servers for the same physical resources. Alternatively, you could be getting clobbered by some maintenance task (like backup?) on the SAN itself.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 7, 2008 at 7:22 pm
Barry
Thanks! Its a Fast T. I guess it is being used by other servers too. So you are thinking contention could potentially cause this. I will bring this up with the IBM team tomorrow.
Leon
May 7, 2008 at 8:57 pm
I dont understand something with these perfmon numbers here .....
During the actual production scheduled run, disk sec/write were around 27 ms for the database and the write throughput was 180 KBps. When I tried to recreate this for the IBM SAN team to investigate, the throughput remained around 220 however, the disk write latency was back to 6 ms. How is that possible?
I was planning to take this back to the SAN support telling them that I could still see disk latencies in the range of 20+ ms. The disk latencies seem to have returned to normal values, but not the throughput. Any insights???
Leon
May 7, 2008 at 9:11 pm
I guess I overlooked something .. the number of write operations per second .. they have increased from 8 (when this ran per schedule) to 20. does look like contention? doesn't it?
May 7, 2008 at 10:08 pm
leonp (5/7/2008)
I dont understand something with these perfmon numbers here .....During the actual production scheduled run, disk sec/write were around 27 ms for the database and the write throughput was 180 KBps. When I tried to recreate this for the IBM SAN team to investigate, the throughput remained around 220 however, the disk write latency was back to 6 ms. How is that possible?
Leon
Throughput(KB/s) is probably less important that IO Rate(IO/s).
It seems to me that there may be a contending task/job/workload on another server that is scheduled to run at the same time as your production run's schedule.
May 8, 2008 at 7:01 am
I am inclined to believe so too .. Is there any reason why the sql server will be responsible for this slowdown? I want to make sure we (SQL team) are not missing on anything before we start pursuing this with the SAN support
May 8, 2008 at 10:45 am
leonp (5/8/2008)
I am inclined to believe so too .. Is there any reason why the sql server will be responsible for this slowdown? I want to make sure we (SQL team) are not missing on anything before we start pursuing this with the SAN support
The easiest way to differentiate between in-box vs. off-box likely causes is to compare the system performance profiles wrt the Disk for the good runs vs. the bad runs. This should include: total-run-time, total-io to that disk(reads & writes), io/s to that disk(reads & writes), throughput(R & W), Disks Idle%, and IO response time.
The purpose of this is to discriminate between two possibilites: am I(meaning the machine that this executes on) sending more IO's to the disk or is the disk just slower with the same amount of IO? The former indicates an in-box problem, the latter indicates an off-box problem.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 8, 2008 at 11:20 am
leonp (5/8/2008)
I am inclined to believe so too .. Is there any reason why the sql server will be responsible for this slowdown? I want to make sure we (SQL team) are not missing on anything before we start pursuing this with the SAN support
In addition to Barry's comments - most HBA's have their own way to monitor traffic (for example - on the ones we use - you can see the # of queued IO's and see how long they take to clear out). In an ideal scenario, you'd have NO queued IO's, but most of the time, they queue and should dissipate just about instantaneously. Sustained periods with high queued IO's can often mean perf issue @ the SAN.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 8, 2008 at 6:29 pm
One of the findings IBM reported was that the Production server was connected to the SAN through a single attached switch .... (not sure if I am saying this right) ...
The Test server which is performing to expectation (and actually much better than Production) we is dual attached to the SAN
We are requesting a change/correction to see the amount of improvement it yields
Leon
May 11, 2010 at 12:43 pm
Leon,
What the solution for your problem?
We have a similar case where our application/DB performance has degraged after new SAN was installed which has 8X450GB drives vs 16X75GB drives.
Thanks
Harry
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply