Disk IO Performance Degredation after Server Upgrade

Question

Post reply

Disk IO Performance Degredation after Server Upgrade

Greg Roberts-134214

SSCommitted

Points: 1686
More actions
June 20, 2011 at 8:10 am

#389782

We recently upgrade our SQL server to new hardware, and our Storage to a new SAN
Server: HP 580 with 4 hex-core CPU's and 128 GB RAM (112 GB Available).
SAN: HP LeftHand SAN 3-node cluster
The Server and SAN communicate via iSCSI.
There are 2 databases hosted on the server.
DB1: 650GB
DB2: 700GB
Each database has multiple data files in multiple filegroups on multple SAN volumes.
Both databases are using SQL Transparent Data Encryption. The issues existed BEFORE TDE was implemented.
We are seeing very long IO times when performing disk operations, such as filegrowth. In one instance it took 5 minutes to add 10 GB to a data file. This caused applicaiton timeouts and downtime until I could manually extend the file. (In actuality I extended the file, disabled autogrowth, and added a new file)
Today we are seeing disk opperation alerts ( 12 ms Reads, 13 ms Writes, etc.) According to my NetOps team, there is nothing wrong with the storage, server, or network configuration.
Admitedly, there is high memory utilization on the server right now. > 95% OS memory. > 90% SQL Memory. A reboot is in order.
I need to tap into the configuration expertise in the group. Where in SQL Server, Windows, Network, or Storage configuration can I look for potential issues?
Thanks,
Greg
Greg Roberts

Viewing 15 posts - 1 through 15 (of 35 total)

You must be logged in to reply to this topic. Login to reply

rfr.ferrari SSCertifiable Points: 6879 More actions · Answer 1

what upgrade strategy you use, side-by-side or in-place?

[font="Times New Roman"]rfr.ferrari[/font]
DBA - SQL Server 2008
MCITP | MCTS

remember is live or suffer twice!
the period you fastest growing is the most difficult period of your life!

Greg Roberts-134214 SSCommitted Points: 1686 More actions · Answer 2

side by side.

we installed new hardware with a clean os and SQL server install. Restored a previous backup to the server and ran load tests. Everything appeared OK. After the Load tests we stopped SQL Server on the original server and moved the database files to the new hardware. Same volume and disk configuration as we conducted load tests on.

Original Server: HP 380 Performance

2x 4 Core Xeon CPU

64 GB RAM

2x 10/100/1000 NIC (Teamed @ 1GB each)

New Server: HP 580 G6 Performance

4x 6 Core Xeon CPU

128 GB RAM (112 usable)

6 x 10/100/1000 NICs

- 2x @ 1 GB public netowkr

- 4x @ 1 GB storage network

Greg Roberts

456789psw SSCarpal Tunnel Points: 4728 More actions · Answer 3

what has changed on the new SAN....are you running the same number of spindles.....more hopefully?

what about cache on the SAN, is set to favor READ or WRITE? and that number diff then before

and did the CACHE on the new SAN increase or decrease

and what about disk speeds increase or decrease

and last but not least did you run SQLIO on the new SAN before you turned it on? this should have told you what the SAN was capable off.

Greg Roberts-134214 SSCommitted Points: 1686 More actions · Answer 4

456789psw (6/20/2011)
what has changed on the new SAN....are you running the same number of spindles.....more hopefully?

Went from 2-node to 3-node cluster

what about cache on the SAN, is set to favor READ or WRITE? and that number diff then before

don't know

and did the CACHE on the new SAN increase or decrease

increase

and what about disk speeds increase or decrease

increase

and last but not least did you run SQLIO on the new SAN before you turned it on? this should have told you what the SAN was capable off.

no. did not have time to run SQLIO

Greg Roberts

456789psw SSCarpal Tunnel Points: 4728 More actions · Answer 5

Greg,

sorry I missed the 12 and 13 ms part. net team is right should have any issues there.

what was the auto growth set to grow at? if it was a low number and you asked for a big growth then that means it had to stop each time before it grew again..makes things very slow!!!

are you having any other issues that are slow? if so, Have you check your wait types best place to start.

as far as memory, not sure what you mean both are using in the 90 percent tile? I would check your SQL Server: Buffer Manager: Page Life Expectancy make sure you are at least above 300....which is 5 min

Johan Bijnens SSC Guru Points: 135369 More actions · Answer 6

What OS are you on ? ( + sp/hf please )

What's the version info of your sqlinstance ?

Can you elaborate on the topology of your lefthand solution ?

Are the luns of your sqlinstance placed on disks that also host other luns ?

Was there a change in raid level definitions of your luns ?

By enabling DTE on a system that already suffered IO, chances are you've made it worse.

Did you enable instant file initialisation ( windows level grant ! )

You jumped into a dark pit by not running SQLIO.

That thing is produced to predict and prevent IO problems before it hurts you !

IMO Lefthand is more advisable for file system solutions.

Top performance db systems shouldn't be placed on it, but it is a sales argument !

Of course it can host db systems, but how can you control it and make it performing well enough.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Syed Jahanzaib Bin hassan SSChampion Points: 11410 More actions · Answer 7

First of all check the SAN Cache and then RAID Groups,If All disk on the same RAID Group that might be a problem

Page-Life-Expect..

as asked by the @Old Hand

For Disk Delays

===============

select DB_NAME(database_id) DB_NAME, di.file_id,df.name,io_stall_read_ms ,num_of_reads

,cast(io_stall_read_ms/(1.0+num_of_reads) as numeric(10,1)) as 'avg_read_stall_ms'

,io_stall_write_ms,num_of_writes

,cast(io_stall_write_ms/(1.0+num_of_writes) as numeric(10,1)) as 'avg_write_stall_ms'

,io_stall_read_ms + io_stall_write_ms as io_stalls

,num_of_reads + num_of_writes as total_io

,cast((io_stall_read_ms+io_stall_write_ms)/(1.0+num_of_reads + num_of_writes) as numeric(10,1)) as 'avg_io_stall_ms'

from sys.dm_io_virtual_file_stats(null,null) di inner join sys.database_files df on df.file_id = di.file_id

where DB_NAME(database_id) = 'your database name'

order by avg_io_stall_ms desc

IO pendings

===========

select

database_id,

file_id,

io_stall,

io_pending_ms_ticks,

scheduler_address

from sys.dm_io_virtual_file_stats(NULL, NULL)t1,

sys.dm_io_pending_io_requests as t2

where t1.file_handle = t2.io_handle

For Memory Presure

==================

select * from sys.dm_os_performance_counters

where counter_name like 'page life%'

For Queries Utilizations

========================

SELECT ST.TEXT,SP.SPID,WAITTIME,LASTWAITTYPE,CPU,PHYSICAL_IO,STATUS,HOSTNAME,PROGRAM_NAME,CMD,LOGINAME FROM SYS.SYSPROCESSES SP

CROSS APPLY SYS.DM_EXEC_SQL_TEXT(SP.SQL_HANDLE) ST

WHERE STATUS !='SLEEPING'

ORDER BY CPU DESC

For CPU Usage

=============

select

scheduler_id,

current_tasks_count,

runnable_tasks_count

from

sys.dm_os_schedulers

where

scheduler_id < 255

For System Info

=============

Select * from sys.dm_os_sys_info

provide these queries result

My Blog
www.aureus-salah.com

rfr.ferrari SSCertifiable Points: 6879 More actions · Answer 8

The Server and SAN communicate via iSCSI.

Greg,

before change of servers, you communicate using iSCSI?

[font="Times New Roman"]rfr.ferrari[/font]
DBA - SQL Server 2008
MCITP | MCTS

remember is live or suffer twice!
the period you fastest growing is the most difficult period of your life!

Perry Whittle SSC Guru Points: 234013 More actions · Answer 9

please supply more details of the iSCSI setup, hardware or software?

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

Greg Roberts-134214 SSCommitted Points: 1686 More actions · Answer 10

I agree about not running SQLIO. Unfortunately the SAN did not arrive until the day before deployment, and the business refused to let us delay to properly configure and test.

Greg Roberts

456789psw SSCarpal Tunnel Points: 4728 More actions · Answer 11

Greg, is it just the file growth that is giving you issue?

Greg Roberts-134214 SSCommitted Points: 1686 More actions · Answer 12

The SAN is dedicated to this SQL Server. There are no other servers accessing LUNs on the SAN.

As far as SAN config is concerned, I'm a bit in the dark there. I know that it is a 3-node cluster. I don't recall how the array(s) are configured that are hosting the LUNs. My SAN Admin is on vacation,so I can't find out right now.

Another issue I am seeing, but not sure it really is an issue, is the Average disk milliseconds per Read/Write/ or Transfer is hovering in the 10 - 20 range on all SAN volumes.

Regarding the previous server, yes it was running iSCSI. Not my favorite, but not budget for a good fibre channel.

I'll dig into the WAIT stats a bit more and post what I find. Thank you all for your assitance.

Greg Roberts

Greg Roberts-134214 SSCommitted Points: 1686 More actions · Answer 13

456789psw (6/21/2011)
Greg, is it just the file growth that is giving you issue?

That was the big smashed thumb. We have also noticed a degradation in our backup times, even before TDE. We saw thruput drop from an average of 115MB/sec to 100MB/sec. But that only equated to 15 extra minutes to the backup job. I'm pretty convinced there is a SAN config issue, but our storage admin instists it is working correctly. And when he runs LAN Sped Test from the DB server to the SAN volumes, performance is good.

Regarding the growth settings, the files are set to grow at 5GB. This reduces the number of growth events to approximately 1 / week.

Greg Roberts

456789psw SSCarpal Tunnel Points: 4728 More actions · Answer 14

I would agree pointing more and more to the SAN...he may be right it is working correctly...but it may not be working optimally.( it could be set to run faster or maybe there is less Drives and more RPM resulting in less through put)

shoot us out the Wait stats...maybe that will help