Detailed RAID explanation and question.

  • I have had many discussions with folks about how to properly set up storage for use on a SQL Server. I thought I knew some of the basics of RAID, but there seems to be so much disagreement, I’m not sure what is accurate, or who knows what is accurate, or where to find accurate information.

    So I would like to present what I thought was accurate, but I would really appreciate your feedback.

    The general scenario is: Set up a server for use as a SQL Server 2008 R2 machine, and I am not entirely sure the nature of the data/storage, so I will assume many simultaneous users, many simultaneous read/write transactions, and not unlimited budget.

    So here is what I thought I knew about basic hard drive-based storage.

    One Drive -

    If you have a Windows Server 2008 R2 machine with just one hard drive in it (this is for illustrative purposes only!), and you are running SQL on it, there are at least 3 different things happening on this hard drive at all times. The OS is reading and writing to the drive, SQL is reading/writing to its Data file, and SQL is reading/writing to its Log file. This is terrible for performance, since all operations happen serially, through one data channel (one controller), to the one hard drive, with one set of read/write heads. If 3 writes try to happen simultaneously (OS, SQL Data, SQL Log) then part of the OS data will be written, then the head moves and writes part of the Data, then the heads move and write part of the Log, then back to the OS, over and over, with the heads frantically moving back and forth. The motion of the mechanical device, the hard disk drive read/write heads, is probably one of the slowest parts of any computer.

    One Drive Partitioned - (I have had many people suggest something like this with a RAID, so first look at that one hard drive.)

    You can create 3 partitions on a single hard drive, which present to the OS as C: a D: and E: volumes. If you then have your OS on C:, your SQL Data file on D:, and your SQL Transaction Log file on E:, and 3 writes try to happen simultaneously, the read/write heads have to move frantically back and forth between these partitions on this one hard drive. I cannot see how there would be any performance gain by partitioning a single hard drive for use by SQL Server.

    RAID 0 - Two (or more) hard drives, in a Stripe Set, with no parity.

    (Again, this wouldn’t be used in production, this is to illustrate some points. Also, I am just going to focus on data WRITES, to illustrate the most important points).

    A RAID 0 stripe set increases performance, because each write is first split up by the controller, then part of the data is sent to one drive, and the other part is sent to the second drive (in a 2 drive RAID 0). Though not really accurate, you can think of it as your writes are twice as fast since all data is split in half then written, simultaneously, to two hard disks. Although the read/write heads on the two different drives are not moving exactly in unison, the almost are -- they are always each writing about half of the data that has been sent to the RAID.

    No matter how many hard drives make up a RAID 0, it still presents to the OS as one volume, one big C: drive. You cannot write 3 different pieces of data to a RAID 0 simultaneously. Each write will be faster than to a single drive. But just like with a single drive, if you try to write 3 things simultaneously, part of the OS data is written, then the heads move and write a part of the Data file, then the heads move and write part of the Transaction Log file, etc.

    Partitioning a RAID 0 affords no benefit. Creating 3 partitions on this RAID 0 will present 3 volumes to the OS, such as a C: volume a D: volume and an E: volume, but when a write is sent to any “volume” it is still being sent to the same RAID 0. If you attempt three simultaneous writes to the three different partitions, you the read/write heads have to move back and forth to the C: then to the D: then to the E: partitions all the time. Data cannot be written to 3 places on this one RAID 0 at the same time.

    RAID 1 - Just two hard drives, in a Mirror. The reason to do this is to provide fault tolerance - if (when) one hard drive fails, you lose nothing.

    The controller sends each data write, exactly the same, to each individual hard drive. The read/write heads of the two different hard disk drives are essentially locked together -- when data is being written to one hard drive, it is also simultaneously being written to the other.

    The drives behave just like the single hard drive. So just like with a single hard drive, if 3 writes try to happen simultaneously, the read/write heads have to move to do the OS write, then the heads move to do the Data write, then they move to do the Log write. They just so happen to be moving basically identically on another drive, the mirror drive, as well. There is no performance improvement at all.

    In this situation, if you wanted to increase performance (yet eliminate fault tolerance), you would just use your 2 hard drives individually. If two writes tried to happen at the same time, no problem. The read/write heads on one drive in no way affect the read/write heads on the other. Both writes can happen simultaneously.

    [Three Individual Drives] If you could scare up a third drive for your SQL server, then you could have your OS on one drive, your Data file on a second drive, and the Transaction Log file on a third, and all three can be written to independently, all the time. This would increase performance a great deal.

    RAID 5 - Three (or more) hard drives set up by hardware RAID controller, as a stripe set, with parity.

    Simple form: Part of the data is written to drive 1, the other part of the data is written to drive 2 (so far, just like a RAID 0 stripe set), but also a calculated value (parity) is written to drive 3. [Actually these three things are spread out amongst the three drives.] If any drive is lost, the missing values/data can be re-created from the other two, so if a drive is lost, it is replaced and the RAID controller rebuilds all the data/values on it.

    Since every write to a RAID 5 requires the CPU on the RAID controller to perform a calculation of parity, write performance is directly related to the performance of the controller, and is definitely not recommended for use by Microsoft for storing your SQL Transaction Log file, which is written to all the time. See #5 here http://technet.microsoft.com/en-us/library/cc966534.aspx And also in that same point it shows that it may be OK for storing your SQL Data file, but RAID 1+0 is better (read on).

    RAID “10” -- actually RAID 1+0 or RAID 0+1

    This setup requires a minimum of 4 hard disk drives, and is either two mirrors that are then striped (1+0), or two stripes that are then mirrored (0+1).

    RAID 1+0 is also called a stripe of mirrors. Create RAID 1 mirrors first, then create a RAID 0 stripe set of them.

    RAID 0+1 is also called a mirror of stripes. Create RAID 0 stripe sets first, then create a RAID 1 mirror of them.

    See these articles for details, diagrams, etc.

    http://decipherinfosys.wordpress.com/2008/01/15/difference-between-raid-01-vs-raid-10/

    http://www.aput.net/~jheiss/raid10/

    http://www.pcguide.com/ref/hdd/perf/raid/levels/multLevel01-c.html

    FINALLY we are getting to the questions I have, directly related to SQL Server performance.

    Say you have 6 hard disk drives in your server and you set up a RAID 0+1 (mirror of stripes). First, using the software to configure your RAID controller, you would create a RAID 0 stripe set using 3 hard drives. Then you create the second RAID 0 stripe sets using the other 3 hard drives. Then in your RAID controller’s software you create a mirror of those two stripe sets. You created two stripe sets (0) and mirrored them (1), hence the 0+1 designation.

    As we saw above, in a stripe set, the data is split between these 3 disk drives and written simultaneously to them. So it is almost as if the 3 read/write heads of the 3 hard disk drives are all locked together, performing one write of one piece of data, that has been split between these 3 drives.

    But in this RAID 0+1, this stripe set is mirrored to an identical stripe set. The mirror is causing the exact same thing to happen over on those other 3 hard disk drives. So as far as I can tell, basically all 6 read/write heads on all 6 hard disk drives are moving together in unison.

    And that means if you try to write 3 pieces of data, simultaneously, to your RAID 0+1, first part of the OS data is written, then part of the SQL Data file is written, then part of the SQL Transaction Log file is written, etc.

    This is just exactly the same behavior as having one single hard disk drive! Yes, in this RAID 0+1 the writes are faster, and there is fault tolerance, but the read/write heads are frantically moving back and forth, basically in unison, on every one of the 6 hard disk drives.

    Another way to look at this (as far as I know), is that the OS, Windows Server 2008 R2, will see this as one volume. A big C: drive. All you can do is put your OS, and your SQL Data file, and your SQL Transaction Log file onto this one volume.

    Here is my question: if you were to instead configure your 6 hard disk drives as three individual RAID 1 mirrors (with no stripe set, no RAID 1+0 or 0+1) would it be faster? (again, SQL Server 2008 R2 rather high-transaction setup)

    In this setup you get the full benefit of RAID fault tolerance - if one drive dies you quickly replace it and lose nothing. You get no benefit from striping, because you are not using striping.

    But in this configuration you have a mirror for your OS, a separate mirror for your SQL Data file, and a third separate mirror for your SQL Transaction Log file. This is essential the same as the [Three Individual Drives] discussed above, we just added mirrors to each one.

    So on a production server, Windows Server is always writing to its OS drives, and SQL Server is always writing to its Data file, and also to its Log file. In a production environment with a high volume of read/write operations happening, these three operations are always happening, or trying to happen, simultaneously.

    With a 6-drive RAID 0+1, these operations cannot, in fact, happen simultaneously.

    With 6 drives set up as 3 mirrors, each operation can happen simultaneously and independently. Would this be faster? Better?

    Missing Details.

    I know many details are missing above, but I really wanted to make sure I understand the basics. I have had many conversations with “seasoned IT professionals” about this, and I can assure you I’ve received many, MANY conflicting opinions on all of this!

    I know the hardware controller is critical, and whether or not it can support the various RAID levels, etc. And I don’t know the details -- can a typical enterprise RAID controller support 3 separate independent mirrors, for example?

    I know many real enterprise-class operations use SANs, with many mysterious high-end features. But I’ve also seen them configured in a way that seems to be extremely inefficient -- such as creating a RAID 5 within a SAN and using that for your SQL Server Transaction Log file, just as Microsoft recommends you do NOT.

    Finally, is there any place where all these basics are spelled out clearly? A google search leads to even more sketchy opinions, and high-end storage vendors would be happy to have you pay to take some of their classes -- is there anything in between?

  • OS is reading and writing to the drive, SQL is reading/writing to its Data file, and SQL is reading/writing to its Log file.

    Add one more, the tempdb is churning on large sorts and the like as well.

    The rest of your drive understanding is spot on afaik. Partitioning physical spindles into logical drives gains you nothing in terms of head/data access speed. It's merely a logical organizer.

    And that means if you try to write 3 pieces of data, simultaneously, to your RAID 0+1, first part of the OS data is written, then part of the SQL Data file is written, then part of the SQL Transaction Log file is written, etc.

    This is just exactly the same behavior as having one single hard disk drive! Yes, in this RAID 0+1 the writes are faster, and there is fault tolerance, but the read/write heads are frantically moving back and forth, basically in unison, on every one of the 6 hard disk drives.

    To an extent. Usually in a controller or SAN setup you don't QUITE have this spasmadic methodology, but the OS does get involved some. However, this usually gets involved at the Swap File level. most of your OS should be in memory at this point. Also, most SANs have local disk cache'ing. So, while you've dropped off your write to the database and the SAN has reported successful write, it may be on the SAN's memory, still waiting to actually go to disk. I've had some data corruption due to that, not pretty.

    Here is my question: if you were to instead configure your 6 hard disk drives as three individual RAID 1 mirrors (with no stripe set, no RAID 1+0 or 0+1) would it be faster? (again, SQL Server 2008 R2 rather high-transaction setup)

    To quote the ever loved chant of SSC: "It Depends". The reason it depends is on where the majority of your work is done. If 90% of your work is log transactions, then no. The six heads would most likely be faster, in spite of the head movement, because it has 6 physical spindles to work from. However, if your work is roughly even distributed, yes, you should see faster throughput from the physical data layer.

    But in this configuration you have a mirror for your OS, a separate mirror for your SQL Data file, and a third separate mirror for your SQL Transaction Log file. This is essential the same as the [Three Individual Drives] discussed above, we just added mirrors to each one.

    Yup... and that little bit alone can save you many... many... headaches.

    I know many details are missing above, but I really wanted to make sure I understand the basics. I have had many conversations with “seasoned IT professionals” about this, and I can assure you I’ve received many, MANY conflicting opinions on all of this!

    And usually it's between network guys who understand the equipment and software/engine folks who understand the data access mechanisms... and they're both right, to a point. I just think DB should win. 🙂

    I know the hardware controller is critical, and whether or not it can support the various RAID levels, etc. And I don’t know the details -- can a typical enterprise RAID controller support 3 separate independent mirrors, for example?

    If I remember correctly (and someone please correct me if I'm wrong), most RAID controllers support one and only one physical array, no matter how it's logically partitioned. These are one of the costs of doing separate physical arrays for SQL Server properly. I've heard rumor some can handle 2-4 physicals, but I haven't gotten directly into SAN building in a long time.

    I know many real enterprise-class operations use SANs, with many mysterious high-end features. But I’ve also seen them configured in a way that seems to be extremely inefficient -- such as creating a RAID 5 within a SAN and using that for your SQL Server Transaction Log file, just as Microsoft recommends you do NOT.

    Part of that is because the spindle count is so high that it helps make up for it, and reads from a RAID 5 used to be faster. The controllers are better now. The real reason they pack in RAID 5 is because physical slots to space used = RAID 5 is more cost effective. Sometimes you just have to live with that unless you can talk management that it's worth 2-3x the cost to do it the right way, when the RAID 5 is 'good enough' and the network engineers would like their bonus this year, so they're cutting costs, and arguing against it.

    Finally, is there any place where all these basics are spelled out clearly? A google search leads to even more sketchy opinions, and high-end storage vendors would be happy to have you pay to take some of their classes -- is there anything in between?

    No. The only thing you'll get 'in between' is a bunch of opinions (like mine) from experience in specific environments and systems. Or you go to the speed phreak sites who have done heavy duty end to end hardware testing and look over reams of research and testing to determine what's 'good enough' for you.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Criag - you definitely win the prize for the fastest response to a ridiculously long question!

    And thank you for the detailed responses -- I know, the answer is always, "it depends", along with the less-elegant "Go test it yourself on your best-possible-guess of what a real workload will look like, with not-quite-finished-code, on not-quite-the-same hardware." :hehe:

    Thanks again.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply