Introduction
We have all heard of RAID and have likely discussed RAID in our jobs as SQL DBAs, Developers and architects. However, many of us out there have a little trouble truely understanding RAID, how it works, what the various RAID Levels are and how they really impact SQL Server performance.
This document attempts to alleviate this lack of knowledge in our circle.
Editor's Note: There is an interesting discussion for this article that clarifies and highlights a few things. Please review the comments.
Disk Architecture
A disk, today, is very similar to a 45rpm record in the 70s (remember those?). It is simply a circular media (platter) with concentric rings (Tracks) that contain our data in segments known as sectors.
Just like a record player, a disk drive has an arm that holds a needle (in our case a "head") that actually accesses the data. In the case of a disk, unlike the read only limitations of a record, we can both read and write.
In order to read from or write to a specific sector, the disk has to spin to a point in which the head can reach the sector and then the actuator arm that holds the head must move to the correct location so that the head lies directly over the sector being accessed.
This process is the basic function of input/output operations (IO).
IOPS
The term IOPS is a term that gets thrown around a lot. Again, it's a term that many people dont truely understand.
Most people understand that the acronym stands for Input Output Operations per Second. But actually translating that definition into real meaning is a little more difficult for some folks.
In basic terms IOPS is a measurement for the amount of time, on average, required to service an input or ouput request (a read or a write).
It is important to know (and I previously did not), that this metric assumes the servicing of a zero byte file. It does this simply for statistical purposes in order to normalize the metric values from disk to disk becase disk sector sizes can be formatted differently, etc.
Physical Disk Limitations
A disk has physical limitations built into it that limit its ability to service certain IOP levels. This limitation is a function of the disks Seek Time and the disks rotational latency.
Seek time is the amount of time it takes, on average, to move the armature to the correct access area in order for the head to access the sector of data required.
Rotational Latency is the the length of time (usually in milliseconds) required to spin the disk to the correct location in order for the head to access the desired data.
The formula to determine how long it takes to perform a single IOP is as follows:
1 IOP Time To Completion = (Seek Time + Rotational Latency)
So, given this formula, we can quite easily calculate the maximum number of IOPS for a given disk.
the maximum Number of IOPS per second is typically what we are interested in and the calculation is as follows:
1/seconds per IO (Seek + Latency)
Take the following Disk:
Statistics * (FAST DISK) | Value |
Rotation Speed (RPMs) | 15k |
Avg Seek Time | 2.9 ms |
Avg Rotational Latency | 1.83 ms |
* HP 300GB 15k SAS drive (~$200) |
We can use our formula to calculate the IOPS:
IOPS = 1/(2.9ms + 1.83ms)
= 1/(4.73ms)
= 1/(0.00473)
= 211 IOPS possible
So we see, that this disk, a 15,000 rpm SAS Drive, can crank out 211 IOPS (not too shabby).
If we look at a more "Value Priced" Disk, we can see how big of a difference there is in maximum IOPs (this is what we pay for).
Statistics * (SLOW DISK) | Value |
Rotation Speed (RPMs) | 7200 |
Avg Seek Time | 10 ms |
Avg Rotational Latency | 2.1 ms |
* HP 300GB 7200 SATA drive (~$100) |
This disks Maximum IOP capacity is as follows:
IOPS = 1/(10ms + 2.1ms)
= 1/(12.1ms)
= 1/(0.0121)
= 82 IOPS possible
This disk, a 7200 rpm, SATA Drive is limited to 82 maximum IOPS.
Big difference in performance between these two disks so we now can understand why there is such a huge difference in price from 1 disk to another (you really DO get what you pay for).
Furthermore, documented data suggests that as you push a disk near its limit on IOPS, the disk will begin to Queue requests in effect causing latency (a very evil word in the world of SQL Server....Avoid it like the plague).
Most documentation that I came across suggests that disks perform optimally at 80% or below their maximum capacity in IOPS. So, our speedy disk that can handle 211 maximum IOPS can optimally serve 168 IOPS before it will start to show signs of excessive queuing.
So now that I know how many IOPS a single disk can serve, the next thing I need to know is how many IOPS we need to satisfy our production SQL Server instance?
I determined this number simply by monitoring our production environment with PerfMon using the physical disk: Disk Transfers/Sec counter.
The numbers were:
Drive | Avg IOPS | Max IOPS |
Data & Indexes | 2313 | 16,164 |
Logs | 81.5 | 1,127 |
TempDB | 141 | 2,838 |
Based on the fact that our fast disk above can only effectively handle 168 IOPS, the conclusion can be made quite quickly that a single disk is not going to be able to provide the IOPS necessary to service these requests.
So, the only solution to this problem is that we need more disks and we need some mechanism to spread the work load across those disks.
If we had 10 of our 300GB 15k SAS Drives, we could store 3TB of Data AND we could service up to 1,680 IOPS (168 * 10).
Likewise, if we had 100 of our 300GB 15k SAS drives, we could store 30TB of Data AND we could handle as many as 16,800 IOPS.
If we resorted to our cheaper disks, in order to handle 16,800 IOPS, we would need at least 205 of the cheaper disks. This actually ends up costing MORE then the cost of 100 of the faster disks ($20,000 vs $20,500). Ironic eh ?
The Need for RAID
So, we determine that we need a bunch of disks to service our performance and\or storage capacity requirements. Now we need a way to spread the workload across these disks. The primary way of doing this is with RAID.
RAID stands for "Redundant Array <of> Inexpensive Disks". RAID provides a mechanism by which we can string together an "array" of disks and expose them as single Logical Drive known as a "LUN".
Depending on how you configure your array, RAID can provide "Fault Tolerance" which is a way to keep from losing data in the case that 1 or more of the disks in the array fails.
Furthermore, since we are stringing multiple disks together, we are alleviating the limitation that a single disk has in how many IOPS it can handle; The more disks (or "Spindles") we add to the array, the more IOPS we get. It's really as simple as that (well,....Sorta).
RAID Levels
Your RAID arrays serve 2 purposes; 1) increased performance in IOPS and 2) Fault Tolerance to protect against data loss. A high level of fault tolerance, may reduce the overall performance of the array. Likewise, a higher degree of performance could negatively impact the arrays ability for fault tolerance.
The act of fine tuning these functions requires configuring what's know as the "RAID Level". RAID Levels are a set of industry known configurations for RAID Arrays. The most common RAID Levels are RAID 0, RAID 1, RAID 5, RAID 1+0 and RAID 0+1.
Depending on the RAID Level you choose, you pay what is known as a RAID Penalty. Some of the RAID Levels write the data multiple times in order to survive a disk failure in the array (fault tolerance). The need to write data twice incurs performance overhead. Furthermore, since certain RAID Levels are writing the data multiple times, you pay a penalty on disk storage because the duplicated data requires more physical disks to store the data (1 byte of data actually consumes 2 bytes of storage. So if you have a 300GB Drive, you are really only able to store 150GB of raw data due to the fact that the bytes have to be stored twice). These RAID Penalties can dramatically impact the cost of your RAID solution (disks are not free).
In order to understand the impact of RAID on your system, and the potential benefits, it's important that you are familiar with each of these common RAID Levels how they work, and the penalties incurred with each.
RAID 0
The first, and most basic RAID Level is RAID 0. RAID 0 addresses the IO limitations of a single disk by "Striping" IO operations across an array of multiple disks. If an IO request is made to write 100MB of data to the array, the RAID controller will spread that 100MBs of data across the array.
This effectively spreads the load across the disks in your array, and, at the same time, reduces the rotational latency penalty because disks dont have to spin as much in order to handle the IO Requests.
Although RAID 0 does dramatically improve IO performance and throughput, it does not provide any degree of fault tolerance. That is, if any single disk in the array fails, the entire array is shot and all the data on it is lost.
Due to the fact that RAID 0 does not provide any degree of Fault Tolerance, most people do not consider it a viable RAID Level for use on production systems.
Its important to note, that in a RAID 0 stripe set, every disk in the array is used to store data. So, if you have 10 300GB disks in a RAID 0 array, you will effectively have 3TB of useable storage. this means, we do not have a RAID penalty regarding the loss of useable disk space.
RAID 1
RAID 1 is referred to as "Mirroring". In a mirror set, fault tolerance is provided by duplicating your disks. Each disk, in a mirror set, has a duplicate disk to which its data is written. RAID 1 writes each piece of data to 2 seperate disks. That way if any individual disk fails, its clone will take over processing, and the user never knows a failure occured.
RAID 1 does have a performance penalty for writes. Each write IOP has to be performed twice (you have to write every piece of data to two disks). Reads however can be sped up because the controller can actually read from both disks in a mirror set at the same time in order to serve a large Read request.
RAID 5
RAID 5 is referred to as "Striping With Parity". It provides the performance advantages of Striping (RAID 0), and then adds a degree of fault tolerance by using what's called Parity. Parity is data that is calculated and stored so that if a disk in the array fails, the parity data can be used to rebuild the data that was in the lost disk.
Although parity is great for fault tolerance, it is expensive from a performance standpoint. RAID 5 has a write penalty of 4x. That is, for every write IOP requested, RAID 5 requires 4 IOPS.
The process required that results in such a high penalty is as follows:
- Read the original data (1 iop)
- Read the current parity data (1 iop)
- Compare the original data with the new write request
- Calculate a new parity value based on the data differences
- Write the new data (1 iop)
- Write the new parity (1 iop)
RAID 1+0
RAID 1+0 is exactly what it says it is. It's a combination of RAID 0 (Striping) and RAID 1 (Mirroring). Typically this RAID Level is referred to as Striped Mirrors.
RAID 1+0 is screaming fast for writes due to striping across multiple disks and due to the fact that there is no parity data involved as there is with RAID 5.
There is a Write penalty in that each value must be written to the Mirror Disk as well (for fault tolerance) but still, the perfomance is incredibly fast.
The storage penalty for RAID 1+0 is the same as that for RAID 1 (Mirroring). Only half of the disks in a RAID 1+0 array can be used to calculate the effective storage space available in the array.
RAID 0+1
RAID 0+1 is similar to RAID 1+0 in that it combines RAID 0 (Striping) and RAID 1 (Mirroring). This nested RAID level is referred to as Mirrored Stripes. Although RAID 1+0 and RAID 0+1 aren't literally identical, the differences between the two are relatively academic. For the sake of this paper, we will assume that they are the same.
The RAID Penalties are literally identical for RAID 1+0 and RAID 0+1.
Other RAID Levels (2,3,4,6,DP, etc)
There are other non-standard RAID Levels, but you dont hear of them very often. RAID 2,3,4,6 and RAID DP are all similar to RAID 5 in that they implement Striping with some sort of parity data to provide fault tolerance. The main difference between these "Flavors of RAID 5" is in how they write the parity data. Some of them reserve 1 disk in the RAID Array to store parity data on, some of them stripe the parity data across the disks, etc etc etc. You can research these as needed but for my purposes, I just assume that they are "RAID 5".
One non-standard RAID Level that is worth a minor discussion is RAID DP. DP stands for "Dual Parity". It is similar to RAID 5 except it writes the parity data twice. This is VERY Expensive for write performance and it increases the Write penalty to 6. (6 write IOPs required for each single write IO request).
RAID Comparisons
Choosing a RAID Level is not that simple, there are multiple things to consider: Cost, Performance, and Capacity amongst others.
This table summarizes the pros and cons of each of the Standard RAID Levels
RAID Level | Fault Tolerance | Read Performance | Write Performance | RAID Write Penalty | Cost |
0 | None | Good | Excellent | 1 | Excellent |
1 | Good | Good | Good | 2 | Fair |
5 | Fair | Good | Poor | 4 | Good |
1+0 | Excellent | Excellent | Excellent | 2 | Poor |
DP | Good | Good | Terrible | 6 | Good |
SQL Storage Recommendations
SQL Server File | RAID LEVEL |
Operating System and SQL Binaries | RAID 1 for Fault Tolerance |
Data and Indexes | RAID 1+0 (RAID 5 is OK if on a budget) |
Logs | RAID 1+0 |
TempDB | RAID 1+0 |
Backups | RAID 5 |
Other Considerations
As if you didnt already have enough to think about, there are a few other things to consider when planning your IO Subsytem, the SQL File layout and your RAID Levels.
RAID Controllers
RAID can be implemented 2 ways; With Software or With Hardware.
In a software RAID configuration, the operating system handles managing the RAID Levels, balancing the IO load amongst your disks, etc.
In a hardware RAID configuration, there is a physical piece of hardware that serves as your RAID Controller.
Typically, the Hardware RAID solutions are much more robust, flexible and powerful. Depending on how much money you spend on the RAID Controller, you typically are given more configuration options.
For example, some RAID Controllers only offer a single RAID Level (RAID 5 for example). The more expensive RAID Controllers offer caching. Caching can be configured to cache Read activity, Write Activity, or in some cases both. The nicer RAID Controllers allow you to configure the Cache percentage to be shared between reads and writes.
Caching is very important for SQL Server, especially for writes. Remember, regardless of RAID Level, there is no penalty for reads. All RAID Levels are fast for reading data. The writes are where we incure performance penalties. You can use the RAID Cache to cache all your writes to memory on the RAID Controller. This can improve write performance dramatically. Typically the RAID controllers that allow for Write Caching will have a battery backed cache system so that if power goes out in the middle of a write, the battery backup will ensure that the data in cache makes it to disk before total failure.
Rember, SQL Server itself is very good at caching reads also. So to use up the valuable Cache on your RAID controller, really doesnt make a lot of sense in the world of SQL Server.
Virtualization
On other consideration you'll want to think about is virtualization. Whether we like it or not, we are moving to a virtualized world. Production SQL instances running in a VMWare environment are becoming more and more mainstream.
Virtualization also impacts our RAID and IO discussions. Depending on the Virtualization product you use, there may be multiple other considerations when choosign your RAID level. For example, how the VM works with Shared Storage, etc.
Be carefule to work very closely with your Disk vendor and/or your VM folks to ensure you get the proper RAID and IO configuration for your virtualized environment.
Summary
Obviously there's a lot of information here. I cant express how important the RAID Level is for your SQL Server's performance and for it's ability to survive one or more disk failures.
Im hoping that this article helps you understand the performance impact that RAID has on your SQL Server environment. As a Database Administrator or Data Architect, you must understand the implications of your RAID configuration. Performance and Fault Tolerance depend on it.
Editor's Note: There is an interesting discussion for this article that clarifies and highlights a few things. Please review the comments.
References
MSDN:
http://msdn.microsoft.com/en-us/library/ms190764.aspx
TechNet:
http://technet.microsoft.com/en-us/library/cc966534.aspx
Ed Whalen - PerfTuning.com