Disk (SCSI) Controllers - Parallel Disk I/O

  • Hey

    For SQL Server VMs I use the max number of SCSI controllers supported by the relevant hypervisor. But for the first time in a loooong time I am looking at a physical implementation.

    The most logical thing I can think of is to have multiple disk controllers and place each SQL disk on a dedicated controller, but that will require a beefy server with enough PCI slots.

    How are other people handling this?

    Or does Windows see each port on a multiport card as multiple controllers?

    Or am I overthinking it?

    The use case is a large clinical patient record system, so there will be multiples of high use databases (which I would aim to separate each of them out to a dedicated disk also).

    Thanks

    • This topic was modified 8 months ago by  lanky_doodle. Reason: Added info
  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I think it really depends. Do you know that your disk is going to be the bottleneck? I mean, your approach would make sure the disk is not the bottleneck, but if the disk currently isn't a bottleneck, then you may be spending money on something that would be hardly utilized. It also depends on what disk you buy. If you buy a whole bunch of 5400 RPM HDD's and put them all on their own PCI-e lane, you'll get the fastest speed possible out of those disks, but it'll still be slow. If you put the money into a bunch of M.2 drives, you are probably going to have bottlenecks with your CPU or network more than the disk. I say "probably" here as I don't know your systems. You MAY have a 10 GB NIC going out to all your clients and you'd be good on the NIC front, or you may need more than that. You MAY not do any calculations on the data and just be doing a lot of inserts and selects (without joins or calculations or functions) and the CPU won't be used much. I don't know your system.

    With our systems, we do get some bottlenecks from the disk, but for the most part, the bottleneck is CPU, memory, or NIC but our servers are slow and our disk is fast. Our disks are all SAN disks which works nice because we can have the disk fail over along with the host so we have minimal downtime in the event of failover. CPU, Memory, and NIC though are "good enough" for our business. We don't NEED the fastest possible hardware. We get by with what we have.

    Another nice thing about SAN is that your server doesn't need to care about the disk. Is it 1 disk? Is it multiple disks in RAID? Is it slow disk? Is it fast disk? The server doesn't care. The SAN does, but the server doesn't. AND if we discover that slow disk is not acceptable, we can migrate to fast disk and the server doesn't notice. It's just moved to faster disk while in use and it just works. We can also schedule the move between fast and slow disk if we know there are "slow times" for the disk. Mind you, if the data volume is too large, then migrating from fast to slow or slow to fast may not be possible in the slow time window.

    Another advantage to SAN is that if the server dies due to hardware failure, the risk of disks being damaged due to the failure is much lower. In my old desktop, my motherboard died on me (not certain what went on it, but something) and when it died, it caused a power issue that resulted in everything hooked up with a power cable to fry as well. PSU was fine, but it killed the motherboard, optical drive, floppy drive, and disk drives (SSD's and HDD's). Thankfully I had everything important backed up, but was still a pain in the butt. This was a personal machine, so SAN would have been overkill, but for a business I would definitely invest in a SAN for on-premise storage.

    Also, if the data is important, you will also want to invest in a good backup solution (on and off site). Moving from cloud to on prem has a lot of up front costs and a lot of work and planning to do. I would make sure all of your ducks are in a row before you start the project and be prepared for the downtime as migrating the servers will require downtime and moving stuff from cloud to on prem MAY impact some systems you are not thinking of or may require re-configuration of multiple machines/applications and it is possible that some system (application, desktop, server) is missed in the re-configuration and you may have unexpected downtime at go live.

    My advice though about what you "need" - Nobody on this forum will be able to tell you. What you will need to do is investigate the metrics on your current setup, find the bottlenecks and make sure that your new systems meet or exceed the setup you had previously. And a fun thing is that depending on your SQL workload, you will want to investigate both single core performance AND multi-core performance. For MOST SQL workloads, the single core performance is more important than the multi-core performance. This is NOT the case for ALL workloads, but it is for most in my experience.

    The OTHER fun thing is that your performance issues MAY not even be hardware related - you could have inefficient code (loops or cursors for example). It is easy to throw hardware at the performance problems, but if the issue is with the queries, hardware can only do so much. If you remove loops (where possible, which is often everywhere, but there are some edge cases where they are needed), you may notice that performance is improved drastically and you don't need new hardware yet.

    Another thing to note - a server with enough slots isn't the only constraint. EVEN if you have 10 PCI-e slots, the CPU has some limits on PCI-e speeds. For example, Intel's 12th generation CPU's support PCIe 5.0 with 1 lane being 16x (63GB/s) OR 2 lanes at 8x (32GB/s) (source - https://www.techreviewer.com/tech-answers/how-fast-is-pcie-5/). So the more PCI-e devices you have connected, the slower they run. You need to balance it out with how fast you need your disks to be. Plus, you have limits with the disks as to how fast they can be. So even if your SCSI controller is going as fast as possible, if your disks can't handle those speeds, the controller is going to be idle a lot of the time.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply