April 24, 2012 at 8:07 pm
Hi Santhosh,
sounds like an ugly problem.
Private Message me and I'll see if I can help you out.
GAJ
Gregory A Jackson MBA, CSM
April 25, 2012 at 11:18 am
Thank you for the research and sharing the results. Good info.
April 26, 2012 at 9:24 am
I'm really surprised no one mentioned [url= http://www.openfiler.com/%5DOpenFiler%5B/url%5D. Some of us don't have the luxury of shelling out hundreds of thousands or millions of dollars for a solution. Especially if that solution is going to be used in a dev/testing environment.
We have a dev set up with SQL Server on bare metal and VMs. Both instances are using iSCSI targets as local disks and the performance over regular 1Gbps Ethernet is great for our purposes.
OpenFiler also offers paid support if you need a little more assurance and feautres (HA, failover) in the enterprise space. Great tool.
April 30, 2012 at 6:00 pm
kfarlee (4/24/2012)
While much of this article is accurate, I need to correct a few misconceptions.I am a PM, working on the storage engine for SQL Server, specifically on access to storage devices. Over the past couple of years, I've been working with the Windows File Server team to see if there was anything we could do about the poor performance of SQL running over SMB.
As it turns out, there was. It seems that SQL had never paid attention to NAS because nobody used it, because the performance sucked. Windows didn't pay attention to workloads like SQL because again, nobody used it, so it wasn't on their radar. We got the File Server team using TPCC as one of their performance benchmarks for the file server, and some interesting things came to light.
The first was a bug which has been in the client side of the protocol stack since Windows 95! It effectively serializes all write traffic. When we started out, SMB was at ~20% of the performance (as measured by TPCC) compared to direct attached storage. By the time we were done, the performance over SMB was at 97% of the performance of direct-attach.
The great thing about this is that it is a client-side fix, which has been released as a Windows QFE to Windows Server 2008 R2, and I believe 2008 as well. Because it's client-side, you get the benefit regardless of what NAS device is on the other end of the wire.
Going beyond that, the File server team has incorporated the RDMA protocol, in which network traffic goes directly from the wire into user memory without being copied to kernel space. This dramatically reduces CPU consumption, which can suck up most of your CPU, even in iSCSI environments. Now, the CPU used by the TCP stack is comparable to Fibre Channel.
There are also a collection of significant enhancements to the protocol, moving the protocol from 1.0 to 2.2.
The Windows architects have published all of these changes, and have been sitting with all of the major NAS vendors, to make sure that they can also take advantage of these improvements.
Bottom line is that just within the past year or so, the landscape has significantly shifted, so that now it is possible to put together a NAS-based configuration which performs on par with a SAN.
Kevin Farlee
SQL Server Storage Engine PM
I was about to post this same information after reading this, so I am happy that you chimed in here Kevin. I've been benchmarking SQL Server on SMB recently due to the support for SMB as a shared storage solution for failover clustering in SQL Server 2012 (which = no need for expensive SAN with good HA), and I've been getting excellent results. 🙂
Keep up the good work on the NAS support for SQL, it is really redefining the storage market space for databases.
Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
My Blog | Twitter | MVP Profile
Training | Consulting | Become a SQLskills Insider
Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]
July 24, 2015 at 3:22 am
Interesting.. we have NetApp FAS2240-4 if I remember its number correct (with SATA disks).
It is connected over Ethernet to VMware hosts.
We tested the performance with iSCSI and it was awful
Then we switched to NFS and it's more or less OK now.
Does it contradict your summary?
July 24, 2015 at 5:30 am
Scott D. Jacobson (4/26/2012)
I'm really surprised no one mentioned [url= http://www.openfiler.com/%5DOpenFiler%5B/url%5D. Some of us don't have the luxury of shelling out hundreds of thousands or millions of dollars for a solution. Especially if that solution is going to be used in a dev/testing environment.We have a dev set up with SQL Server on bare metal and VMs. Both instances are using iSCSI targets as local disks and the performance over regular 1Gbps Ethernet is great for our purposes.
OpenFiler also offers paid support if you need a little more assurance and feautres (HA, failover) in the enterprise space. Great tool.
Because OpenFiler is just an implementation of the iSCSI protocol. You still need to architect a proper hardware solution with either NAS or SAN connectivity to build out a full storage system.
July 24, 2015 at 6:25 am
I'm sorry, but this article is dreadful and despite the claims of research, the author still doesn't have a good understanding of a SAN (or NAS) environment.
First, talk to your IT guys, because a NetApp 3160? That's OLD. Like End of Availability for that model in 2012, which means it was launched ca. 2008. A lot has changed, as with anything technology, in that time. If you're having performance issues that'd be where I'd start. A NetApp 8020 HA pair with an adequate configuration can be had for far less than $80k US. Time to upgrade.
Then the reference to NAS as NFS. While I suppose it was made generically (Network File System) this acronym is typically refers to the filesystem shared by Unix/Linux based servers. While Windows can use and support NFS, no one in the right mind would use that protocol on Windows. What they *would* use would be CIFS or SMB.
Which is why, when I came to this article hoping to ready about some data related to SMB3 and Continuous Availability, I was let down. A serious exploration of this topic would've included a discussion of the fairly recently available NAS protocol from Microsoft (SMB3) and the pros and cons of using it and the Continuous Availability technology. (NetApp, by the way, supports this in recent versions of their firmware and hardware).
July 24, 2015 at 6:38 am
While I don't disagree that a basic NAS versus a basic SAN is slower at a general level, in reality there is a lot more to it than that, and totally depends on how the storage systems are connected into the data center.
SQL Server is definitely not block-based either, it's file-based (Source: http://www.brentozar.com/archive/2009/12/sql-university-the-basics-of-storage/ and https://msdn.microsoft.com/en-us/library/ms189563.aspx). Any DBA worth their paycheck could tell you that (hello, data is stored in .mdf & .ndf files and transaction logs are stored in .ldf files).
Basic NAS and SAN systems use SAS disk technology. Higher end SAN systems will use FC disks. So at a disk technology level, your typical NAS and SAN will have the same level of disk I/O performance given the same spindle count. Only when you go to FC disks in a SAN do you pick up a performance jump there.
From a network connectivity standpoint, you'll generally see both NAS and SAN devices connected by 1 GB copper up links to the server room switch. So unless you upgrade to 10 GB copper connections, fiber, or use FC connections (which I would then argue make the SAN a DAS (direct attached storage) because you then need a HBA in a server to do that, similar to the old SCSI card and RAID Array devices we used to use), you are going to have a bottleneck with server network connectivity to the data on either storage system.
From a storage protocol perspective, both modern day NAS and SAN systems both use iSCSI to communicate data to the storage system, so again, both types of storage systems are on par there. You can use NFS with a NAS, but it is rare to do so today, unless the NAS is being used as stand-alone file server to connect devices using different operating systems (e.g. connecting Windows, Mac, and Linux systems to common network file storage).
Where SANs pick up the majority of their performance over NAS devices is the interaction between the metadata controllers and the storage devices. SANs lay a virtual RAID system over the physical disks in the SAN and the metadata controllers index where data exists in the logical storage blocks. This works like an index on a database table (that's a lay man's description of how a SAN typically works). You also typically have multiple metadata controllers in a SAN system, which then provides redundancy and prevents single point of failures. Redundancy of access to SAN data is what really what makes SANs important to an organization, beyond performance.
Also the costs quoted are way off unless you are speaking to large 100s of Tb or Pb storage. A 32 Tb SAN unit could be built for as low as $1500 (Source: http://thehomeserverblog.com/esxi-storage/building-a-homemade-san-on-the-cheap-32tb-for-1500). Not that you would use that for a production SQL environment, but it's an example that costs for SANs have come way down.
July 24, 2015 at 6:51 am
agtcovert (7/24/2015)
I'm sorry, but this article is dreadful and despite the claims of research, the author still doesn't have a good understanding of a SAN (or NAS) environment.First, talk to your IT guys, because a NetApp 3160? That's OLD. Like End of Availability for that model in 2012, which means it was launched ca. 2008. A lot has changed, as with anything technology, in that time. If you're having performance issues that'd be where I'd start. A NetApp 8020 HA pair with an adequate configuration can be had for far less than $80k US. Time to upgrade...
I think we got caught up in another regurgitated SQL Server Central.com posting. Looks like the original date was from 2012.
July 24, 2015 at 9:49 am
this article was written years ago and just republished.
G~
Gregory A Jackson MBA, CSM
July 24, 2015 at 1:46 pm
You can use the SQL Server view sys.dm_io_virtual_file_stats to see IO Stall Time. Around 7 ms is average, anything above 20 ms is considered poor, and above 50 ms is considered critical. Using NetApp we are getting average IO Stall Times over 2 seconds (2,245 ms :angry:) for specific datafiles, including tempdb. And this is averaged over millions of IOs reading average block sizes of 150 KB.
I’m wondering if the reason why we are seeing an average > 1 sec. stall time for physical disk I/O has to do with block marshaling. We are executing large ETL jobs and BI queries. That pumps out a lot of 1 MB sequential reads. NetApp breaks a 1 MB block down to multiple 4 KB blocks. When an 4 KB block is updated it keeps both before & after versions of the block, causing the blocks to be scattered all over kingdom come. When a 1 MB physical read occurs NetApp has to look up the location of every 4 KB block and then go out and read every block and assemble them back into a 1 MB block before it can respond. In this case the chain is only as strong as the weakest (slowest) link. The NetApp processor doing this may be swamped with these operations (saturation hockey-stick :crazy:), especially if many asynchronous 1MB PIO requests are sent, which is what SQL Server does. (asynchronous = multiple PIOs are sent in rapid succession before waiting for each one to respond.) That’s my working hypothesis.
In any case, NetApp is the winner if you are checking off features in a comparison matrix, with the exception of performance. It is OK on OLTP performance. But it is absolutely junk on batch & OLAP performance and it should not be considered.
If a NetApp employee wants to pick a fight with me over the facts they should come down to my company and tune our SAN. If they can fix our performance problems I’d be happy to state a retraction and the solution. So far we have lived with this performance problem for years.
Here is another possible explanation, but it’s over my head.
We had our meeting with NetApp yesterday and went over the Professional Services findings. Some things they listed are tasks we’ve been addressing since the slovol issues began, aligning mis-aligned VMs, adding disks to aggregates (or, in our case, moving VM’s to larger aggr w/ faster disks). But one thing they confirmed, which was brought to my attention via an off-toasters email discussion hours before (I give that individual much thanks!!), was BURT 393877, “inefficient pre-fetching of metadata blocks delays WAFL Consistency Point.”
Data ONTAP's WAFL filesystem periodically commits user-modified data to the back-end storage media (disk or otherwise) to achieve a Consistency Point (CP). Although a Consistency Point typically takes only a few seconds, a constraint has been designed into the software that all operations needed for a single Consistency Point must be completed within 10 minutes. If a CP has not been completed before a 600-second timer expires, a "WAFL hung" panic is declared, and a core dump is produced to permit diagnosis of the excessive CP delay. During the processing for a CP, some disk blocks are newly brought into use, as fresh data is stored in the active filesystem, whereas some blocks may be released from use. (Although a block which is no longer needed in the active filesystem may remain in use in one or more snapshots, until all the snapshots which use it are deleted.) But any changes in block usage must be reflected in the accounting information kept in the volume metadata. To make changes in the block accounting, Data ONTAP must read metadata blocks from disk, bringing them into the storage controller's physical memory. Because the freeing of blocks often occurs in a random ordering, the workload of updating the metadata for block frees can be much higher than for updating the metadata to reflect blocks just being brought into use.
For greatest processing efficiency, Data ONTAP makes an effort to pre-fetch blocks of metadata which are likely to be needed for a given Consistency Point. However, in some releases of Data ONTAP, the pre-fetching of metadata is done in an inefficient way, and therefore the processing for the Consistency Point may run slower than it should. This effect can be most pronounced for certain workloads (especially overwrite workloads) in which many blocks may be freed in unpredictable sequences. And the problem may be compounded if other tasks being performed by Data ONTAP attempt intensive use of the storage controller's memory. The competition for memory may cause metadata blocks to be evicted before the Consistency Point is finished with them, leading to buffer thrashing and a heavy disk-read load.
In aggravated cases, the Consistency Point may be slowed so much that it cannot be completed in 10 minutes, thus triggering a "WAFL hung" event.
The BURT doesn’t list any specific workarounds, as, apparently, there’s many depending on your environment and what’s causing it. For us, they wanted to take each FAS3160 controller down to the boot prompt and make an environment change. They didn’t say what this change was because it would have to be undone once a version of OnTap is released that fixes the issue.
On a different topic (Hadoop & share-nothing DBMSs, not SQL Server):
“Let me say this unequivocally: You absolutely should not use a SAN or NAS with Hadoop.” link
“Using RAID on Hadoop slave machines is not recommended, because Hadoop orchestrates data redundancy across all the slave nodes.”
“Instead of relying on a SAN for massive storage and reliability then moving it to a collection of blades for processing, Hadoop handles large data volumes and reliability in the software tier. Since each machine in a Hadoop cluster stores as well as processes data, those machines need to be configured to satisfy both data storage and processing requirements.”
link, link[/url]
July 25, 2015 at 11:06 am
I see this article has been re-posted just now, after 3 years of dramatic technological progress.
I wounder if still has a value? Also it would be great if one could compare it with modern in-memory computing or even better - to compare MSSQL's capabilities to Oracle's Database In-Memory.
Thanks in advance!
July 28, 2015 at 7:30 am
Thanks for the education.
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply