I Feel the Need for Speed - Introduction
My server isn't fast enough. Actually it's neither fast enough, nor big enough. Well, let's be totally open here. My server is a smoking, high end, rather expensive piece of hardware that I admire.
However my users, have a different opinion.
They think it needs a little work, a little faster, a little snappier. Not much, just a little. They also want to dump more stuff onto the server and that has me worried. Who wouldn't be if you were tight on space, causing the occasional nightly backup to fail, and the your phone was ringing regularly with performance questions. And you knew the data and load were definitely growing.
Unless you're a BOFH, you'd be worried too. But what do you do?
Most people I know buy a bigger server. Go from a single CPU to dual CPUs, or dual CPUs to a 4 way box. Go from 1GB to 2GB of RAM, 100GB to 200GB of disk. The easy way out, right? Easy to point to a large PO with impressive specs and tell management that you sized this server to exceed the performance of the last one.
And most likely the performance will increase. Moore's law and hardware advances can usually mask performance problems and by expanding the size of all dimensions (CPU, memory, disk), anyone can usually gain performance and look like a genius.
At a price. The price of the new hardware.
So what's the big deal? Well, a base SQL Server machine (hardware only), single CPU, 1GB RAM, 3 drives in RAID 5, is around $3700 from DELL. I know you can go cheaper or more expensive, but this is a basic configuration I've seen used over and over by many companies for a SQL Server database machine. Usually a simple Accounting, CRM, etc. database backend for some software. Now suppose we want to pump this up to the next level. Let's move like this:
Server | CPU | Memory | Disk | Price |
Original | single | 1GB | 3 HDD, RAID 5 | ~US$2800 |
Upgrade | dual | 2GB | 5 HDD, RAID 5, | ~US$4200 |
Note: These are estimates only based on retail prices from Dell.com
Not a huge increase. In fact, if we upgraded from just one of these to the 2nd, we'd probably see performance improvement. Let alone all three (CPU, memory, and disk).
If we had issues on the first box. If not, then we might not see any changes at all. Of course, if we weren't having issues, we might not be upgrading, unless your purse strings are looser than mine. π
Now suppose it's a year later and we need to upgrade again. Move to a 4 way. What does that do?
Server | CPU | Memory | Disk | Price |
Original | single | 1GB | 3 HDD, RAID 5 | US$2800 |
Upgrade | dual | 2GB | 5 HDD, RAID 5, double capacity | US$4200 |
Dreaming | quad | 4GB | 5 HDD, RAID 5, double capacity of original | US$16,500 |
Wheweeee, that's a big number. At a few of the smaller companies I worked for, I wouldn't have the stones to go ask for that one. Unless I really needed it and I knew I needed it AND I was 100% sure that performance would scream. Because otherwise I'd have lost a lot of credibility, and possibly my job. A $16000 wintel server might be a bet on your job at a lot of places. And this is a basic config. I could easily push this to $25,000 with dual network cards, service, UPS, etc.
However now I work at a big company. Big budget. 3 yr life cycle for servers. Large, in-house data center. What's the big deal? Why not just upgrade my server and buy a bigger one? Why not just jump when I can and not worry about it? In many cases I do because of the 3 year life cycle and the fact that loads usually increase, though with the last couple years, most of my SQL Server weren't stressed that badly and we could have run another year on the hardware or even ordered a slightly smaller server with a faster CPU.
However I have one server that is a small problem. It's already a quad 500MHz, 4GB of RAM and loaded to the max with disk. It houses a 600GB database that needs to perform. It gets pretty heavily used around the world, so performance is always an issue. Fortunately we have a crack team working on tweaking the application for users and they do a great job, but there comes a point when we have no choice but to upgrade.
So the other day, some guy from the group that pounds this server comes by my desk and wants to know how we upgrade it. He's putting together a budget and wants some ideas about what we can do to improve performance.
It's at this point that I begin speaking slowly. And you should too in this situation. Knowing that performance on an RDBMS is a nebulous thing, especially as you get more users and less control over the actual application. I start to mention that we need to look over the metrics on the server, identify the potential bottlenecks and then make a recommendation. He's ok with that, just get him a recommendation by the end of the week.
And it's Wednesday.
Fortunately I do have metrics available and start to look over them. The 3 month average is wonderful and looks like this:
CPU % Utilization (total): 13%
Memory Cache Hit Ratio: 97%
Disk Q: 2
Disk Time: 24%
All very good numbers. So why do they complain about performance? You have to dig deeper than this. One of the things that we look at is the usage at peak times because an average can distort this. While people all around the world (and hence time zones) use this, the load is much, much heaver in the US. As such, if I screen the averages for US workdays only, I get a slightly different picture.
CPU % Utilization (total): 31%
Memory Cache Hit Ratio: 92%
Disk Q: 2.7
Disk Time: 28%
Still not bad. However, if there's one thing I know about this system, and the type of thing you need to know about your systems, is that there are peak times of the month when this server is used. Checking on some of those times with spot values shows that the disk usage can run over 50% and the CPU will go above 80%.
One of the developers came up with this estimate for a new server, which I got to see when I presented mine (further below) at a meeting.
Server | CPU | Memory | Disk | Price |
Developer | 8 way, 2GHz | 16GB | SAN Storage, 2TB (we currently have about 800GB in local storage) | US$who knows |
I glanced over this and was initially excited since it would be cool to work on something like this. However, it's this waste, in my opinion, which affects our profitability and potentially my bonus. Well not likely this will, but compounded it does and I still have a cheap, little guy, approach so it bugs me. I pointed out a few things on the quote that didn't make sense.
I'm not sure I agree with the 16 way, especially if a 4 way at a much lowed speed works ok most of the time. The numbers do show some peaks, but I'd bet that a 4 way, 2GHz would perform much better. In fact, I'd propose we go with that, but buy an 8 way box so we can expand if we need to. Metrics show that most of the time, CPUs are not the issue, so I'm not sure I'd buy to surpass all the peaks.
Second, the memory seems like overkill. While the memory is dwarfed by the current DB size in this recommendation, it's also dwarfed by our current architecture, and we still have a high cache hit ratio. I think that this is again overkill just because we can and have the budget. Getting a system above 4GB is still going to be an interesting learning experience, but 16GB seems like overkill. I think more memory could help, but I wouldn't bet on it. Especially at the price it costs. I'd again propose a smaller number, but allow room to grow.
Lastly, there is the disk space request, which I have no basis for arguing about. I'm loath to run out of space and since I'm not sure of the changes that are planned and their impact, I have to accept this as a good number. I'm not thrilled about having all my data, temp, and logs on a shared SAN, but everyone from the vendors to hardware guys says it will work, so I'll defer to them (until I prove it does or doesn't)
Looking over this, I proposed two separate items, a low and high configuration. I do want to have the business have some skin in this game and make some decision themselves. My lean is for the high end, but my low end is something that I think should work.
Server | CPU | Memory | Disk | Price |
Low End | 8 way, 4 x 2GHz, 4 empty | 6GB room to grow to 8GB | SAN Storage, 2TB | US$who cares |
High End | 16 way, 4 x 2GHz, 12 empty | 8GB room to grow to 16GB | SAN Storage, 2TB | US$something I couldn't buy |
Conclusions
Unfortunately I'll never know if my design would have scaled and worked. The project got cancelled for a number of reasons and we never upgraded the server. However, I wanted to show you the thought process I went through when looking at upgrades. Hopefully it will help you to think about a few things when you are faced with a similar situation.
Steve Jones
Β©dkRanch.net November 2003