July 24, 2007 at 10:53 pm
Hi everyone,
At the moment we're setting up a SQL server with the following hardware:
DL580 G4, 4 dual core CPU, 16 GB RAM, P800 with 25 disks in a MSA70.
The 25 disks are split into 5 RAID 5 drives.
Each drive has a 64K stripe size and is formatted in Windows with 64K allocation units.
There are 3 DBs which are used quite a bit, and 3 which are used a bit less.
These DBs are also being replicated.
In order to improve performance, several developers decided to split up the DBs.
Now it looks like the following:
Application binaries (%APP%) | C:\ | ||||||
DB1 | DB2 | DB3 | Distribution | DB4 | DB5 | DB6 | |
K:\ | D/P | L | I | D/P | D/P | L | I |
L:\ | I | D/P | L | L | I | D/P | L |
M:\ | L | I | D/P | I | L | I | D/P |
S:\ | Backup | ||||||
T:\ | Replication Folder & Application Data |
I = Index, D = Data (user-defined data filegroup), L = Log and P=Primary file (mdf)
The idea is to physically split up the I/O heavy tables and indexes.
Anyways, the server is not yet productive, however, it successfully completed our I/O heavy migration scripts twice. Once these split changes were made, the server has frozen several times and it always has several event 833 messages in the log when this happens:
QL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [K:\remos2_UserIndex.ndf] in database [tsn_remos2] (7). The OS file handle is 0x00000910. The offset of the latest long I/O is: 0x000000d6490000
Once the server froze during the splitup process and once it froze soon thereafter.
Any ideas what the problem could be?
Thanks for your help,
Brandon
July 24, 2007 at 11:16 pm
It should be a disk problem. The disk hardening was not done properly. When you say that server frooze when the split process was done why did you then go ahead and implement. You should have contacted your system admins to rectify that before that.
Cheers,
Sugeshkumar Rajendran
SQL Server MVP
http://sugeshkr.blogspot.com
July 24, 2007 at 11:54 pm
Hi Sugesh,
Since the server is not yet productive, after it froze the first time we deleted the DBs, restored them again, ran our upgrade scripts again and then did the DB file split again, this time successfully. Soon thereafter it froze again.
What do you mean with "disk hardening was not done properly"? What do we need to change?
Thanks for your help,
Brandon
July 25, 2007 at 1:06 am
833 is indicative of a overloaded IO subsystem or a misbehaving IO filter driver (like an anti virus)
Are the drives local, NAS or SAN?
Run performance monitor on the server during peak activity and watch the following counters
Physical Disk\%Disk free time (should be above 90%)
Physical disk\Average disk queue length (meaningless for SAN, for local storage should not be above 2 per disk spindle)
Physical Disk\sec/write (should be under 10 ms)
Physical Disk\sec/read (should be under 10 ms)
If you're consistently outside recomended values, then you may well be overloading the disks. What speed are the disks?
Sugesh: "Disk Hardening"????? Generally that refers to the writing of data to disk (as in, the records have been hardened to disk)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 25, 2007 at 2:04 am
As Gail shaw says you need to monitor the disk IO performance of the server and see if there is heavy load on it.
Gail: I meant that the disk was not properly formatted/RAIDED as part of initial server hardening.
Cheers,
Sugeshkumar Rajendran
SQL Server MVP
http://sugeshkr.blogspot.com
August 8, 2007 at 8:59 am
I wonder, would it be possible to change at least one of the drives to RAID 1 or 10? Would it improve performance if log files of most loaded databases were put on such RAID? RAID 5 is not very good at writing data I heard.
...and your only reply is slàinte mhath
August 9, 2007 at 6:49 am
My philosophy on the whole setup is that the allocation of files was just wrong. While you may have hardware issues, you're asking each array to hold widely different types of data instead of grouping data types.
We host several databases, some big, some little on the same SQL Server instance. I have always subscribed to the idea that you group all the log files on one array and all the data files on another. The log files are mostly sequential writes, and the data files are random reads and writes. The raid controller has a lot of memory and can help do the actual writing in chunks. It doesn't care that the actual writing is to different databases. I don't see the point of having one array with a mix of logs and data.
For specifics, we write logs to a Raid 1 two disk array, and data to a Raid 5 multi disk array with a dedicated raid controller. TempDB goes on another, non raid disk, and backups go on yet another raid array (actually with our latest installs onto another server's raid array across a gigabit network card.
Student of SQL and Golf, Master of Neither
August 9, 2007 at 9:13 am
1) How many disk controllers do you have? I hope a bunch of them with a lot of cache on each.
2) As others have mentioned, RAID5 has poor write performance and should be avoided for tempdb and a tlogs. Also, in making 5 RAID5 arrays you are losing 5 active spindles, resulting in lower total available throughput.
3) I would consider something like the following:
6-8 disks in RAID10 for tlogs and tempdb.
depending on how much index space you need, 4-8 disks in RAID5 (or even 10 if space ok)
remainder in RAID5 for data
best would be to have a controller for each
send backups off to separate storage system over dedicated gb link. You shouldn't store backups on same system with data.
If you don't need all the storage space, consider a number of RAID 10 arrays instead of 5 given above.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
August 10, 2007 at 7:42 am
The solution I've used was data files (.mdb and clustered indexes) on 1 array, indexes on another array, logs on another array, all were RAID 10. Worked very well. That split (from everything on a single array before) made a large improvement in performance.
To determine what IO operation is taking so long (more than 15 seconds), you need to track it down through Profiler and find out what it's doing. It could be as simple as a bad disk or a messed up fibre channel, or as complex as read/write conflicts because you have logs and indexes and data on the same drives and all three are concurrent.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 24, 2007 at 5:23 am
Hi everyone,
Thanks for your ideas.
I agree with several of you, that we should not mix the logs and indexes and data files from all the DBs on the same drives.
At the moment I have all the DBs on one 10 disk RAID 5 drive and 3 RAID 1 drives for the logs. It seems to be working....
I realize that RAID 10 drives would be better but at the moment new hardware is not an option.
If we continue to have issues then I might change the recovery interval from the default settings to 5 minutes or so.
The thing is that the same hardware (the server itself is even slower and has less RAM, the I/O system is the same) has been running the same system without any problems. Well, I mean the system still has an I/O bottleneck, however the performance adequatley meets user needs. On this system we did not try to split the data files, indexes and logs as I mentioned in the original post, so I hope this is the culprit.
I'll try to keep you all posted.
Regards,
Brandon
November 9, 2007 at 3:05 am
Here is an update for everyone.
After removing the split/mix of data, log and index files from 5 DBs between 5 RAID5 drives the system has been running stable for several months.
Now the system is installed as follows:
All DBs on one large RAID5 drive.
Replication snapshots and backups on another RAID5 drive.
Logs split up among 3 RAID1 drives.
Thanks again for your help!
Brandon
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply