April 27, 2009 at 6:51 pm
Paul White (4/27/2009)
andrew.hatfield (4/27/2009)
By definition those success stories are based on experience....or marketing deals :laugh:
Sometimes you just can't win 🙂
Paul White (4/27/2009)
andrew.hatfield (4/27/2009)
Looking at your example and highlighting that it is is very light on details, I wouldn't consolidate those instances on a bare metal server as described, let alone virtually. In your example, you are actually reducing the resources available to each SQL instance - memory, CPU and IO.I wouldn't host more than 4 SQL instances on any instance of Windows - virtual or physical.
It is a hypothetical example but a reasonable one. If you do the maths again you will see that 8 instances x 4GB = 32GB (the same) and each instance will have a share of eight CPUs instead of two dedicated cores. I have run four to eight enterprise edition instances on Itanium and x64 in a four-way clustered configuration, and it works very well indeed. I'm glad you see that virtualization would be a poor choice.
andrew.hatfield (4/27/2009)
If you tried this, then I'm not surprised your experience with virtualisation has been unsuccessful.Not me - I'm not qualified, but I did assist the guys that were. 12 core machine 32 GB RAM ESX, multiple SQL Servers hosted in separate VMs on W2K8. It performed like a slug and made very poor use of what was a decent box.
This is a hard discussion to have, because the actual specification of the hardware makes a big difference. Are they 8 single-core processors, 4 dual-core processors, 2 quad-core processors. Are they Intel, AMD or Itanium? etc... The architecture does make a difference, especially with processor scheduling and caching.
Was the sluggish performance IO, Memory, Cache or CPU? A big problem we've seen in many virtualised environments by first timers is to split OS volumes into separate VMDKs, but then keep them all in one Datastore (LUN). Splitting your data VMDKs across Datastores as you would a physical machine is the only way to get the same IO threads. In VMWare, each Datastore only has one IO thread.
Your HBA configuration is also important - are you queuing per LUN or Array? What is the QueueDepth?
What about your SAN? Especially cache and whether it is dedicated or shared.
Paul White (4/27/2009)
andrew.hatfield (4/27/2009)
Where I am currently working, a government organisation with approximately 10,000 staff across the most de-centralised state in Australia, we are going through a SQL and hardware refresh of a number of systems. This includes upgrading SQL as well as virtualising where appropriate. For those systems that are being virtualised, the current approach is to scale-out. We will only host one (1) SQL instance on any VM across many VMs. This includes existing environments that are currently clustered using MSCS. As I have said a number of times in this thread, you can't just throw stuff on servers, hope for the best and complain when it doesn't work. You need to review the current and future requirements, design an appropriate solution, test it, review and improve and then deploy if suitable.I work at a place not too far from you that serves 50 million web pages a day, with the busiest SQL Servers in the country. We do use virtualization - for testing only :laugh:
Cheers,
Paul
(Yes, I saw you're from NZ. We currently have an ad series from the Commonweath Bank where an American Ad agency is pitching the benefits of said bank. "You too could retire to She Bang e Bang" - with the world map panning from Australia to NZ; "This could be your Florida")
Virtualisation is fantastic for testing, especially when you implement proper lifecycle separation. Development environments do not need the same level of performance as production - only functional. Volume Test environments require the same performance as Production.
As I said, not all environments are suitable for virtualisation - but those that are save money.
--
Andrew Hatfield
April 27, 2009 at 7:08 pm
andrew.hatfield (4/27/2009)
Sometimes you just can't win 🙂
True. It has been a fun discussion. We clearly disagree and I guess that's ok.
andrew.hatfield (4/27/2009)
This is a hard discussion to have, because the actual specification of the hardware makes a big difference. Are they 8 single-core processors, 4 dual-core processors, 2 quad-core processors. Are they Intel, AMD or Itanium? etc... The architecture does make a difference, especially with processor scheduling and caching.
Let's say they are quad-core versions of the same CPUs in the original boxes, at a slightly higher clock speed.
andrew.hatfield (4/27/2009)
Was the sluggish performance IO, Memory, Cache or CPU? A big problem we've seen in many virtualised environments by first timers is to split OS volumes into separate VMDKs, but then keep them all in one Datastore (LUN). Splitting your data VMDKs across Datastores as you would a physical machine is the only way to get the same IO threads. In VMWare, each Datastore only has one IO thread.
As far as I recall the main problem was CPU and memory. Dedicating memory and CPU to an idle VM was wasteful when another was stuggling for resources.
andrew.hatfield (4/27/2009)
Your HBA configuration is also important - are you queuing per LUN or Array? What is the QueueDepth?
Per LUN and 32.
andrew.hatfield (4/27/2009)
What about your SAN? Especially cache and whether it is dedicated or shared.
Shared EMC DSM-IV with several hundred 15K FC disks.
andrew.hatfield (4/27/2009)
"This could be your Florida"
Ah then you will have seen the 'Bit That Broke Off' clip too? :c)
***
I suspect that neither of us really has the time to pursue this here. I think we've presented two sides of a continuing debate, which is good.
Cheers,
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
April 27, 2009 at 8:54 pm
Paul White (4/27/2009)
andrew.hatfield (4/27/2009)
Generally, you virtualise a number of systems in a cluster / farm. So you would benefit from a number of systems reducing from one physical server each to 2 - 4 physical servers. Thus an increase in physical utilisation and the other cost savings that come with it (power, cooling, data centre footprint, cabling, network and storage ports)You also lose hardware and physical location redundancy. The original farm or cluster servers would have had to be under-utilised to stand consolidation on fewer boxes. Presumably those boxes would need to be more powerful (=expensive).
I don't see how. If you are running an Active / Passive cluster then you have 50% of your resources free by not using the second node. In an Active / Active cluster then you shouldn't be allocating more than 50% of resources on any one node otherwise when a failure occurs, you won't be able to run the failing instances. Active / Active / Passive - obviously the rule drops to 77% / 33%
Paul White (4/27/2009)
Do Google run a grid of commodity servers or several VM'd boxes I wonder? 😉
I think comparing anyone to Google is unrealistic. They also have their own in-house developed grid cluster software too.
Paul White (4/27/2009)
andrew.hatfield (4/27/2009)
You can take your installation and put it on any of the virtual servers without having to worry about drivers and chipsets.Cool for testing. Any decently set-up production environment uses cloned machines anyway.
Yes, but for every hardware platform you deploy, you need to update your image. Even in a virtualised environment, you still deploy from an image. It's just that you only need the one. This is because regardless of the hardware that is providing the virtualisation platform, as far as the guest OS is concerned, it's always the same. So you can move between Intel and AMD without issue. You can move between IBM, HP, Dell without issue. You can upgrade between platform generations without issue.
Even for your virtualisation you create an image. In the case of VMWare, it's just a simple text script.
Paul White (4/27/2009)
andrew.hatfield (4/27/2009)
When you refresh your virtualisation platform, you don't need to worry about reinstalling your production systems. With vMotion and Storage vMotion, you don't have outages when you need to perform scheduled maintenance of physical hardwareWe find that dynamically load-balanced farms and physically separated data centers work better.
DRS - Dynamic Resource Allocation - will do this for you in a virtual world. You can set resource pools, limits and reservations. It will load balance for you with no interruption to service.
I'm not sure how a physical deployment would make physically separated data centres easier. In a virtual world, you can move VMs around your network without interruption. Obviously, high speed links are required as is storage.
You can still run NLBs for application load balancing.
Paul White (4/27/2009)
andrew.hatfield (4/27/2009)
With Fault Tolerance (coming in vSphere 4), you get service continuity if a VM or host goes down due to constant memory replication across the virtualisation farmSounds like a virtualization-specific issue. PolyServe Matrix Server is part of our real high-availability solution.
I am not familiar with this product, so I can't comment on how it compares.
Paul White (4/27/2009)
andrew.hatfield (4/27/2009)
However, in saying that the overall TCO significantly reduces when you roll out virtualisation across your enterprise.That just sounds like advertising to be honest.
Paul
It's more like maths. (Current Cost) - (New Cost) = (Reduced Cost)
You commented in a later post that you remember the performance issues being related to CPU. Without seeing what was going on, it's difficult to comment authoritatively. Was the SQL profile OLTP, OLAP or ETL? Could you have benefited from scaling-out the SQL workload?
As you've also commented further - I don't think we'll convince each other. As always, choose the best tool for the job. It's not a one size fits all. Some database workloads are suitable candidates for virtualisation, some aren't. Just like any other system.
Thanks for the discussion
--
Andrew Hatfield
April 27, 2009 at 9:21 pm
I can't resist it.
You lose physical/hardware redundancy because water leaking onto a box running four VMs downs four servers.
In Active/Active you can run at close to full tilt so long as your failover script, adjusts resources appropriately. It seems reasonable to have less capacity in the event of something serious enough to down a data centre.
The comparison to Google was a little facetious I admit, but the point about ultimate performance is valid.
The hardware platforms are identical. The only images required are for 32 or 64 bit. The only time new ones are needed are when we move OS, like to Win2K8.
Things like DRS sound fine - but they are just replicating what we already have, know, and love. There does not seem to be any benefit - just a new learning curve and probably extra staff.
PolyServe Matrix Server is a shared-everything clustering solution.
I agree that Current Cost - New Cost = Incremental Cost. I just don't agree that Current Cost > New Cost, once you take account of everything involved. Maybe at some sites, with some requirements, who knows.
The VMs I referred to were dedicated to SQL Server, running an OLTP load, in a farm. The VMs were direct replacements for existing physical servers. Four physical servers remained to compare with the VMs. The VMs were embarrassingly slow. I don't think that is entirely unexpected - adding the VM overhead to the same box was never going to be quicker after all.
Hey and you're welcome.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
June 2, 2009 at 2:52 pm
I'm actually quite interested to know more about virtualize SQL Servers for OLAP use. I've been looking through quite the number of white papers and they mostly cover OLTP (Because most people use SQL Server for that.) I'm trying to figure out how well it works when you load a 1.2 TB Fact table into a set of pass-through VHD.
Someone above mentioned having to plan for peak time for running cubes, and I'd certainly like to here more about the set up for that environment.
For the sake of commenting on one of the above statements: I think it's a great benefit for DBA's to know about as much about infrastructure as possible. Logical and Physical design is very important to how an application performs. DBA's also don't get an opportunity to 'fix' third party software's schema so getting the most out of the back end and performance tuning indexes are some of the only ways to make a badly designed application run well. In a perfect world you could get a vendor to fix issues, or the users to change to a different system, but in reality you more likely to have to just deal with the issues.
Viewing 5 posts - 16 through 19 (of 19 total)
You must be logged in to reply to this topic. Login to reply