March 2, 2017 at 8:41 am
I am in the process of figuring out the best and most reliable server configuration for a new server that will host a SQL Server 2014 Hyper-V VM. What I mean exactly is whether implementing SSDs is a good idea from a long-term usage perspective. I need to ensure this server remains operations 24x7 for at least 6 to 7 years. I do not want to worry about drives have reached their write capacity, and could fail at any moment.
Let me explain our current environment:
I have a physical Dell PE R510 with 32Gbs of RAM, and(2) separate 800GB RAID10 arrays composed of (8) 450Gbs 15K RPM drives, and (1) RAID1 composed of 146Gbs 15K RPM drives for the Windows Server 2008 R2 OS.
SQL Server 2008 R2 hosts a critical financial database of which is currently at 37Gbs in size. We have one active audit database, and six audit history archive databases. I’ve been using the Simple Recovery model with no issues, so our log files are relatively small. The data files (250Gbs in total) reside on the first RAID10, and the log files (15Gbs) and tempdb databases (20Gbs) reside on the second RAID10.
Thus far, I only had one 450 15K RPM hard disk failure that needed to be replaced over the 6 years of constant usage.
SQL backups are completed nightly and sent to a RD1000 locally attached storage. We also have a Quorum OnQ device that serves as an HA and Draas.
New Server Implementation:
I plan on purchasing a Dell R730 configured with the following hardware: 96Gbs of RAM, (8) 900Gb 15K RPM conventional drives (2 RAID10) arrays, and (4) 480Gb Solid State Drive SAS Read Intensive MLC 12Gbps, dual power supplies, 6 NICs, etc. I will dedicate 64Gbs to the SQL instance so that it always that amount of RAM at any given moment. I will either take the safe route and install Windows Server 2012 R2 Standard or possibly deal with some sort of compatibility issues by installing Windows Server 2016 Standard Hyper-V hypervisor. This Hyper-V server will contain two VMs (SQL 2014 Server Standard and a .NET/file storage server). The .NET/Storage instance will have 16Gbs of dedicated RAM, and about 1TB of storage. There is plenty of room to last six years of growth/usage.
My initial thoughts were to create a 250Gb boot volume for the SQL Server on the first RAID10 with conventional disks, and then place the data files, log files, and tempdb database all on the RAID10 SSD array. I could even just use the SSD RAID10 for the boot volume and all the SQL components (everything exclusively on the 900Gb SSD array). My concern is with all the write activity that takes place with SQL server (reindexing, tempdb, and log files writes) and also the Windows OS. I do not want to deal with having to purchase and swap out SSD drives every 3 or so years to ensure they do not burn out and create issues with SQL server availability. I understand the newer enterprise SSD drive technology employs a write algorithm to spread out the disk writes, but is that enough for extended usage over a six or possibly 7 year time frame.
So my second approach was to create a 250Gb SQL Server boot volume, and then place the SQL data files on the first RAID10 with conventional HD disks, and then place the log files on the second RAID10 with conventional HD disks. The tempdb database would be placed on the SSD RAID10 array. It has been mentioned tempdb writes data all the time, and placing this database on the fastest hard drives will improve SQL performance. Also, if the RAID10 SSD were to fail with for any reason, it would “not” affect the integrity of the SQL databases. Meaning I could alter the tempdb location to one of the other two RAID10 arrays and we’re back in business within minutes without needing to restore, etc.
My third choice is simply the same as the second choice mentioned above, but purchasing 128GBs of RAM, and dedicating 96Gbs to the SQL instance, so that most everything is processed in memory minimizing the need to write to disk.
As mentioned before, I simply want everything setup from the beginning with the intension of not needing any hardware changes or contending with unforeseen disk issues. Even with a RAID10 SSD array, I question its reliability simply because SSD failures usually occur one right after another. Meaning it’s a good possibility I could have 2 or more SSD drives fail at one time. I can see one or two failures during separate instances.
I have read the Solid State Drive SAS Write Intensive MLC drives are more reliable than the read intensive or mixed use drives, so those drives could be a possible alternative. I simply am looking for assurances these drives will work well during a 6 year period. Simultaneous disk failures are what I am contemplating.
Any input/suggestions?
March 2, 2017 at 12:13 pm
First, you have to worry about all drives failing. Period. If you think you can get 6-7 years with 10 drives, you're gambling. You might, you might not.
Second, most enterprise SSD storage, and even many consumer grade drives, are lasting longer and longer. That doesn't mean they all do, but on average, they withstand enough writes to work in a database server. There are plenty of people using Flash for SQL Server databases, and have been for years without issues. Many use them for tempdb, which experiences very heavy read/write compared to many user databases.
The best way to plan here is estimate the reads/writes you'll do over time. Use the existing server and then look at drive specs.
March 2, 2017 at 1:41 pm
I understand not all disks will operate without issue over a 6 to 7 year period regardless of the drive type (HHD or SSD). That's why we're employing RAID10 SSD arrays to mitigate that. My existing SQL 2008 Server has been in operation using conventional HHDs for over 6 years with just one drive failure in all. Disk redundancy alleviated any major array failures and/or DB corruption issues.
I am concerned even with a RAID10 SSD array that multiple disks will/may fail simultaneously and also slow down over time. SSDs are not graceful when they fail and normally do so without warning. I've heard if SSDs all run at the same time within an array, that if one fails then others will fail immediately afterwards. If this is the case, that will doom the array should two disks fail in the same RAID10 stripe. I spoke with a Dell Rep today, and I asked about this, and he stated just as long as the SSD drives are not from the same batch that should not be an issue. So I asked how can I confirm they are from different batches before purchase, and he stated you cannot. Seems kind of risky. He also mentioned they are starting to sell more servers with SSDs now (about 25%), and they do not know the reliability or failure rate at this point.
I will be using the SSD array exclusively for the tempdb database so that should minimize any db damage should multiple drives fail simultaneously within the same array. My thought was if the tempdb database were damaged, I could easily define a new location for the tempdb database without incurring db corruption or issues. I've not experienced a tempdb failure, so that is why I ask. I just want to ensure if that happens can I simply do what I suggested above or will database restores be necessary.
When it comes to SQL Server I am very cautious since any failures or downtime are a major issue. I will do my best to ensure we do everything reasonably possible to ensure minimal failures occur during the expected life the server.
March 2, 2017 at 2:34 pm
Interesting post!
It seems strange to run in SIMPLE mode and risk losing a full day of work. Is that because your database is mostly just computing some answers for users and not really doing OLTP? It implies up-time is more important than daily work being saved.
It seems like a high performance 15k drive would have a much shorter life than a slower drive.
If you're trying to promise 24x7 up-time, isn't fail-over the way to go instead of a single host?
SSDs can report on remaining life at any time. Is your host going to be on Mars or somewhere you can't replace a drive if things get shaky in a few years?
If RAIDs are questionable with SSDs then here is a novel solution I've never heard suggested elsewhere. Have two databases and use log shipping (or whatever works) so that a second database on the same instance is on different drives and ready to go in minutes if a failure occurs. Just do a couple renames. Don't use RAIDs but depend on your manual fail-over process.
And one more thing, if you have lots of RAM then very, very few reads will ever need done on the drives. Can tempdb run in a ramdrive? I don't think so because sql server is smart enough to know what is physical and what isn't.
March 2, 2017 at 3:12 pm
Referencing your statement "SSDs can report on remaining life at any time. Is your host going to be on Mars or somewhere you can't replace a drive if things get shaky in a few years?".
Can you explain how SSDs can report on remaining life at any time? I am assuming its a utility provided by the SSD drive manufacturer, or if Dell is selling them for the servers, they provide this means? Just curious how this works since this will be my first experience installing SSDs on a production SQL server.
Referencing the hosting in Mars response: Replacing a drive that fails is not a big deal and is to be expected. My concerns are with multiple SSD drive failures at the same time. There is mention on the Internet that SSDs running concurrently, like in a RAID array, may experience a drive failure may also incur other drives that fail at the same time within the same array. I take it this this sort of compounded failure may be common in real life scenarios. If this is true, then this can be scary since SSD failures occur without any advanced warning. Regular HHDs usually provide predictive failure warnings, and thus far, I have not personally experienced more than one drive failure concurrently within a RAID5 or 10 array.
Also, I would like to add, I opted for Read Intensive MLC drives compared to Mixed Use or Write Intensive MLC SSD drives. Will overall performance and drive life expectancy differ from one to another specifically when dealing with an array hosting a tempdb database or a SQL database for that matter?
I simply want to ask as many questions before I commit to a server purchase to ensure I understand the repercussions selecting one over another.
March 2, 2017 at 3:24 pm
What RAID controller are you going to buy?
The way in which you are describing things leads me to believe that you are not isolating drives onto different raid CHANNELS, only different logical drives.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
March 2, 2017 at 3:32 pm
Most all hard drives support S.M.A.R.T technology which monitors parameters of a drive and predict when a failure may occur. Wikipedia or Tom's Hardware are good sites for information on how it works. There are many free apps. Some SSDs come with one, I think, as part of the driver. It seems like I went into the drive properties on my home laptop to see the status but each program may work differently.
SSDs support about 10,000 overwrites before getting unreliable. Since a transaction log may be small for SIMPLE mode I'm wondering if it will force the drive to wear out in that one spot. With a regular file system an SSD can determine where to allocate space and spread writes out evenly.
March 2, 2017 at 3:40 pm
Below displays the server specs. The only changes I need to make is increase the RAM to 96Gbs. I presume (2) Intel Xeon 10 Core processors should be sufficient. PowerEdge R730 Server | 1 | |
PE R730/xd Motherboard MLK | 1 | |
No Trusted Platform Module | 1 | |
Chassis with up to 16, 2.5 Hard Drives | 1 | |
PowerEdge R730 Shipping | 1 | |
Intel Xeon E5-2630 v4 2.2GHz,25M Cache,8.0 GT/s | 1 | |
QPI,Turbo,HT,10C/20T (85W) Max Mem 2133MHz | ||
Intel Xeon E5-2630 v4 2.2GHz,25M Cache,8.0 GT/s | 1 | |
QPI,Turbo,HT,10C/20T (85W) Max Mem 2133MHz | ||
DIMM Blanks for System with 2 Processors | 1 | |
Standard Heatsink for PowerEdge R730/R730xd | 1 | |
Standard Heatsink for PowerEdge R730/R730xd | 1 | |
Fresh Air Cooling, PowerEdge R730 | 1 | |
2400MT/s RDIMMs | 1 | |
Performance Optimized | 1 | |
Unconfigured RAID for H330/H730/H730P (1-16 HDDs or SSDs) | 1 | |
PERC H730 Integrated RAID Controller, 1GB Cache | 1 | |
Broadcom 5720 QP 1Gb Network Daughter Card | 1 | |
VFlash, 16GB SD Card for iDRAC Enterprise | 1 | |
iDRAC8 Enterprise, integrated Dell Remote Access | 1 | |
Controller, Enterprise | ||
DVD+/-RW, SATA, Internal | 1 | |
Bezel | 1 | |
ReadyRails Sliding Rails With Cable Management Arm | 1 | |
Performance BIOS Settings | 1 | |
Dual, Hot-plug, Redundant Power Supply (1+1), 750W | 1 | |
Electronic System Documentation and OpenManage DVD | 1 | |
Kit, PowerEdge R730/xd | ||
No Operating System | 1 | |
No Media Required | 1 | |
R730/xd PCIe Riser 2, Center | 1 | |
R730 PCIe Riser 3, Left | 1 | |
R730/xd PCIe Riser 1, Right | 1 | |
Thank you for choosing Dell ProSupport Plus. For tech | 1 | |
Dell Hardware Limited Warranty Plus On Site Service Extended Year | 1 | |
Dell Hardware Limited Warranty Plus On Site Service | 1 | |
ProSupport Plus: Mission Critical 4-Hour 7x24 On-Site | 1 | |
Service with Emergency Dispatch, 3 Year | ||
ProSupport Plus: 7x24 HW/SW Tech Support and | 1 | |
Assistance, 5 Year | ||
ProSupport Plus: Mission Critical 4-Hour 7x24 On-Site | 1 | |
Service with Emergency Dispatch,EXT to 2 Year | ||
On-Site Installation Declined | 1 | |
Declined Remote Consulting Service | 1 | |
16GB RDIMM, 2400MT/s, Dual Rank, x8 Data Width | 6 | |
480GB Solid State Drive SAS Mix Use MLC 12Gbps 2.5in | 4 | |
Hot-plug Drive, PX04SV | ||
300GB 10K RPM SAS 12Gbps 2.5in Hot-plug Hard Drive | 2 | |
900GB 15K RPM SAS 12Gbps 512n 2.5in Hot-plug Hard | 8 | |
Drive | ||
Broadcom 5719 QP 1Gb Network Interface Card | 1 | |
Broadcom 5720 DP 1Gb Network Interface Card | 1 | |
NEMA 5-15P to C13 Wall Plug, 125 Volt, 15 AMP, 10 Feet | 2 |
March 2, 2017 at 4:35 pm
I went back and forth with Argenis Fernandez of Pure Storage. He was the one that said you want to try and calculate the rough amount you write and compare to specs. Flash works differently, and manufacturers have implemented firmware differently. Most do have issues if you write to the same spot (same block) over and over. That causes burnout. Early SSDs acted like HDDs and did that. We've evolved in the last 4-5 years to have most drives over provisioned (so maybe 20-30%) of capacity isn't available. This is to allow updates to blocks to be written to new blocks and the old ones garbage collected. Some (like Pure) try to treat things as always append, so they minimize rewrites to a block.
In any case, whether you have SSDs or HDDs, if they're from the same batch, you can have issues. In the past, when I worked with early SANs and HDDs, we bought from three manufacturers and mixed them in trays or servers. We did have issues where one would fail and another would fail before the array would rebuild. The same thing here. Ideally, keep an extra drive in there to use for rebuilds.
Some SSDs report SMART information. I've heard others hard code SMART data at 100%. Beware, and research, or get your hardware person to do this.
There are enterprise grade SSDs and they should last a long time.
My experience with SSDs. Early ones just stopped working. Just like someone pulled the power on a server. Never could get them to restart. Lately, I've had a couple mSata ones report errors (read and/or write) and I've gotten nervous. I've replaced them, and gone back a few times, and they've acted like a dying HDD. Some stuff readable, some not.
Haven't worked with the enterprise FusionIO type drives (or other vendor equivalents), but I haven't also heard much complaints about lifetime.
Some links:
http://www.anandtech.com/show/2614/4
http://unix.stackexchange.com/questions/106678/how-to-check-the-life-left-in-ssd-or-the-mediums-wear-level
March 2, 2017 at 7:04 pm
you can use ram for tempdb
look at http://www.superspeed.com/servers/ramdisk.php
Based on your current setup I would get more RAM on the server and use part of it for the above.
and also look for PCIe NVMe cards as these will be faster than remaining SSDs. Good for tempdb.
Regarding durability of SSD - what really matters is amount of writes your tempdb currently has, and the expected increase if any. As said before you do need to monitor your own system during a period of time to get this info.
And then check the expected volume of writes for 5-10 years and see how it compares to the drive classes.
March 3, 2017 at 7:19 am
frederico_fonseca - Thursday, March 2, 2017 7:04 PMyou can use ram for tempdb
look at http://www.superspeed.com/servers/ramdisk.phpBased on your current setup I would get more RAM on the server and use part of it for the above.
and also look for PCIe NVMe cards as these will be faster than remaining SSDs. Good for tempdb.
Regarding durability of SSD - what really matters is amount of writes your tempdb currently has, and the expected increase if any. As said before you do need to monitor your own system during a period of time to get this info.
And then check the expected volume of writes for 5-10 years and see how it compares to the drive classes.
Your SuperSpeed link does not mention tempdb on a RamDisk. I still have strong doubts sql server will allow any database to be run on a non-persistent medium such as regular SIMMs. RamDisks have not been allowed in the past for me with sql server. TempDB is recreated on startup so it seems Microsoft would allow this to go on a ramdisk. Until I hear someone is really, actually doing this, I won't believe it. Also, I'd love to hear of performance comparisons between tempdb on SSD vs. RamDisk.
March 3, 2017 at 8:47 am
I reviewed the ramdisk link, and SuperCache seems be a more reliable solution. Also, under Server Solutions - Database, it does mention tempdb, but not specific to any of their product solutions. My tempdb is almost 20Gbs in size. I would not implement Deferred-writes, and would think the data files must reside on a non-volatile disk to avoid any type or data loss. We have battery backups, but not very often, I have experienced in improper server shutdown from an extended power outage or recklessness. For instance, a couple of months ago, I mistakenly pulled the power cables out of the SQL server working on another server on the same rack. I really goofed that one, but there was no harm and it booted fine without incident.
Personally I am not found of software overlays to compensate for hardware components. For instance, I experienced a simultaneous disk failure with a software RAID5 array many years ago. Since then, I have stayed away from such similar solutions. For a production server, I would consider it but there would have to be many testimonials its rock solid.
Referencing the PCIe NVMe cards statement, Dell does offer a "Dell 1.6TB, NVMe, Mixed Use Express Flash, HHHL Card, PM1725, DIB [$2,025.64]". I do not know much about it, but it appears this card works with SSDs to provide faster throughput. What I mean, it appears this card will add cost on top of all the SSD drives in the RAID array. If I am mistaken, please let me know.
Also, I have not heard any feedback whether mixed use MLC SSDs are more reliable than Read Only MLC SSD drives. Since we are speaking of a write intensive tempdb, I would think the mixed use MLC would be preferred.
| ||||
480GB Solid State Drive SAS Mix Use MLC 12Gbps 2.5in Hot-plug Drive, PX04SV [$748.82] |
March 3, 2017 at 10:48 am
Bill Talada - Friday, March 3, 2017 7:19 AMfrederico_fonseca - Thursday, March 2, 2017 7:04 PMyou can use ram for tempdb
look at http://www.superspeed.com/servers/ramdisk.phpBased on your current setup I would get more RAM on the server and use part of it for the above.
and also look for PCIe NVMe cards as these will be faster than remaining SSDs. Good for tempdb.
Regarding durability of SSD - what really matters is amount of writes your tempdb currently has, and the expected increase if any. As said before you do need to monitor your own system during a period of time to get this info.
And then check the expected volume of writes for 5-10 years and see how it compares to the drive classes.
Your SuperSpeed link does not mention tempdb on a RamDisk. I still have strong doubts sql server will allow any database to be run on a non-persistent medium such as regular SIMMs. RamDisks have not been allowed in the past for me with sql server. TempDB is recreated on startup so it seems Microsoft would allow this to go on a ramdisk. Until I hear someone is really, actually doing this, I won't believe it. Also, I'd love to hear of performance comparisons between tempdb on SSD vs. RamDisk.
No it does not specifically mention tempdb - but it does mention SQL Server as a whole.
Being that is its show as a clustered resource it even goes further than other options on the market.
As for believing it - they have a evaluation option so just get it and try it yourself.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply