October 14, 2011 at 1:36 am
Hi Guys
I constantly receive calls from the end users of poor database performances.
We are running SQL server 2008 R2 Standard Edition (SP1) X64 on Windows server 2003 Standard Edition X64.
The machine has 3 HDD's (these are local drives), one for the OS, one for the datafiles and one for the log files.
16gb Ram
Intel Xeon E5504 @2.00Ghz (8CPU's)
We have 2 databases mirroring to another system with similar specs. (Software is exact)
After reviewing the SQL logs I came upon the following:
SQL Server has encountered 2 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\Databases\SHC_LIVE_1_Data.ndf] in database [SHC_LIVE] (7). The OS file handle is 0x0000000000000810. The offset of the latest long I/O is: 0x00000038630000
BobMgr::GetBuf: Sort Big Output Buffer write not complete after 540 seconds.
SQL Server has encountered 13328 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\Databases\tempdb.mdf] in database [tempdb] (2). The OS file handle is 0x0000000000000514. The offset of the latest long I/O is: 0x0000003f040000
These alerts are coming through quite regularly.
I have also used the below script to determine top 10 waits:
select top 10 *
from sys.dm_os_wait_stats
where wait_type not in --remove common waits to identify worst bottlenecks
(
'KSOURCE_WAKEUP', 'SLEEP_BPOOL_FLUSH', 'BROKER_TASK_STOP',
'XE_TIMER_EVENT', 'XE_DISPATCHER_WAIT', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
'SQLTRACE_BUFFER_FLUSH', 'CLR_AUTO_EVENT', 'BROKER_EVENTHANDLER',
'LAZYWRITER_SLEEP', 'BAD_PAGE_PROCESS', 'BROKER_TRANSMITTER',
'CHECKPOINT_QUEUE', 'DBMIRROR_EVENTS_QUEUE', 'LAZYWRITER_SLEEP',
'ONDEMAND_TASK_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'LOGMGR_QUEUE',
'SLEEP_TASK', 'SQLTRACE_BUFFER_FLUSH', 'CLR_MANUAL_EVENT',
'BROKER_RECEIVE_WAITFOR', 'PREEMPTIVE_OS_GETPROCADDRESS',
'PREEMPTIVE_OS_AUTHENTICATIONOPS', 'BROKER_TO_FLUSH'
)
order by wait_time_ms desc
This returned the following:
DBMIRRORING_CMD1080667767993639494421601019861
SQLTRACE_INCREMENTAL_FLUSH_SLEEP8560983424395638416278
PAGEIOLATCH_SH491151945265152603000011596992
CXPACKET88384943525597541161520823267061
ASYNC_NETWORK_IO81417590149778537242947702846
BACKUPIO119122911189438392949213482
PAGEIOLATCH_EX935260357307398300001180837
LATCH_EX61684818451754702576324270556
ASYNC_IO_COMPLETION981414410143918208844255
BACKUPTHREAD21695435410623269736850
Can someone please help out here please?
Thanks
October 14, 2011 at 2:34 am
Get your windows team and hardware vendor involved. I have seen these issues on SAN and upgrading the firmware resolved the issue.
October 14, 2011 at 2:54 am
These are local drives, not set up in a SAN
Regards
October 14, 2011 at 2:59 am
Hi,
It's seem to be MS SQL Server's I/O operation performance issue. To imporove you MS SQL I/O performance you need to do as per below.
Kindly restructure your MS SQL DB files as per below. Considerting you have C, D & E drive respectively.
* As you told C drivde will be use for OS
* Keep your .mdf & .ldf file on D drive
Changes as per following
1 Create new .ndf file for your DB & keep it on "E" dirve.
2 Create new .ndf file for your Temp DB & keep it on "E" drive.
3 Also provide more Auto incremental space for Temp DB. With "percent %" auto incremental.
4 If there are any specific tables in which you are doing these actvity then create Clustered Index on them
& create these clustered index on "E" drive.
Note: I To find out which Proc takeing time use MS SQL Profiler.
II To find out I/O operation use PERFMON to monitor.
Use below link for more details.
http://blogs.msdn.com/b/sqlserverstorageengine/archive/2006/06/21/642314.aspx
TC,
KK
October 14, 2011 at 3:03 am
Thanks for the update.
I was under the impression that the database files and log files should be on separate drives?
Currently the mdf and ndf are on D and ldf on E
Regards
October 14, 2011 at 3:07 am
Hi,
Keep your log file on seperate drive. As your log file is on Different dirve is right one. You need some more kind of separation of your .mdf file. Which you can achive by adding .ndf file. into your database. And one more thing do not keep your DB related any file on OS drive.
TC,
KK
October 14, 2011 at 3:12 am
Thanks.
But as stated in previous post.
We have the C:\ drive reserved solely for Windows.
The D:\ drive has the database files (mdf and ndf)
The E:\ drive has the logs
I am a bit sceptical to move the log files over to the D:\ drive.
Wouldn't this make the IO worse?
Regards
October 14, 2011 at 3:19 am
Hi,
If it is MS SQL Log file then dont move them to D drive. It will surely hampper your DB performance. Keep it in the "E" dirve.
TC,
KK
October 14, 2011 at 3:20 am
Thank you.
But what do I do with my current situation?
October 14, 2011 at 3:33 am
Hi,
You need to do changes as per following
1 Create new .ndf file for your DB & keep it on "E" dirve.
2 Create new .ndf file for your Temp DB & keep it on "E" drive.
3 Also provide more Auto incremental space for Temp DB. With "percent %" auto incremental.
4 If there are any specific tables in which you are doing these actvity then create Clustered Index on them
& create these clustered index on "E" drive.
Below link will definitely help you to understand how to do it.
TC,
KK
October 14, 2011 at 5:19 am
I have added an extra ndf for the temp database.
The user database already has a ndf.
Regards
October 14, 2011 at 5:47 am
Hi,
Where User DB .nef file is located.
If it is located in D drive (where your DB .mdf file is already there) then move it to the E drive.
If it is already in E drive then check out the performance now. Is it improving.
Also as you told that you have introduced .ndf file for Temp DB. Hope while doing that you have given Auto Incremental Size in "%" / "percent" to 25% to 30%.
If you have not did this then do it, as same is also important.
TC,
KK
October 14, 2011 at 6:16 am
I have made the incremental percentage 10% (as default)
I don't necessarily think this is an issue with the IO,
I am currently busy monitoring the following Counters in Perfmon:
Avg. Disk sec/Read - 0.005
Avg. Disk sec/Write - 0.036
Avg. Disk Queue length - 0.095
% Disk Time - 3.104
Could this not be a network issue between mir01 and mir02 (seeing that the DB mirroring was 1st in the top 10 waits?)
October 14, 2011 at 6:36 am
Please start performance counters for a period of time. If you doubt on mirroring, can you check "log send queue" and "transaction delay" counters on perfmon.
dm_os_wait_stats is a cumulative figure and you would need to capture the information for a period of time to identify the waittype.
If you monitor the sys.dm_exec_requests, you would know the frequent waittype and the resource.
But before making changes to database files location, try to tune the queries to avoid disk IOs and if further tuning is not possible....trace sys.dm_io_virtual_file_stats for a period of time (they are cumulative figures) to identify the high IO intensive files and move them to separate disk.
October 14, 2011 at 6:56 am
Thank you
I will monitor these counters on Perfmon for next week and have a look.
Regards
Viewing 15 posts - 1 through 15 (of 39 total)
You must be logged in to reply to this topic. Login to reply