December 31, 2015 at 8:58 am
We have a failover cluster that was setup prior to my hire, which I don't think we would be able to rebuild from scratch (a consulting company originally set it up). Both nodes are running in a VM environment with nightly snapshots, and the database also has it's normal database-level backups running as well.
I'm relatively new to VM's, but my organization has seen great reliability to-date on all of our database instances running as 1-instance per VM. I'm a fan of simplicity, and here's my question: There's an organizational push to move all databases into this cluster as they're upgraded to MSSQL 2014 (everyone is always a fan of uptime, right?), but the applications are not necessarily meeting the criteria of always needing to be up. My gut is telling me to keep everything in a 1-instance per VM setup if some downtime is acceptable. I'm not allergic to new tech, but I'm not sure yet how much tougher it is to manage a cluster when problems arise. Does anyone have some advice on the difficulties surrounding their clustering environments? I'm concerned about pushing more databases into a cluster that I might not be able to support.
--=Chuck
January 4, 2016 at 5:34 am
chuck.forbes (12/31/2015)
We have a failover cluster that was setup prior to my hire, which I don't think we would be able to rebuild from scratch (a consulting company originally set it up).
Why do you think you'd need to rebuild it from scratch?
chuck.forbes (12/31/2015)
both nodes are running in a VM environment with nightly snapshots, and the database also has it's normal database-level backups running as well.I'm relatively new to VM's, but my organization has seen great reliability to-date on all of our database instances running as 1-instance per VM.
I instance per VM is a little bit of a waste of resources and potentially licences too (depending how your host servers are licenced)
chuck.forbes (12/31/2015)
I'm a fan of simplicity, and here's my question: There's an organizational push to move all databases into this cluster as they're upgraded to MSSQL 2014 (everyone is always a fan of uptime, right?), but the applications are not necessarily meeting the criteria of always needing to be up. My gut is telling me to keep everything in a 1-instance per VM setup if some downtime is acceptable. I'm not allergic to new tech, but I'm not sure yet how much tougher it is to manage a cluster when problems arise. Does anyone have some advice on the difficulties surrounding their clustering environments? I'm concerned about pushing more databases into a cluster that I might not be able to support.--=Chuck
Why cluster inside the virtual layer, yes it can be done but guest clustering does have its nuances.
Would It not be better to have the sql servers standalone and let the hypervisor take care of the VM redundancy (if it's possible).
What was the driver behind clustering the guest machines?
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
January 4, 2016 at 9:15 am
Why would we need to rebuild it from scratch? The main reason I expected to do it, would be when it's time to upgrade from MSSQL 2014. If it's impractical to upgrade in place, then I would want a new clustered instance available to move databases into.
We're clustering inside the virtual layer, as we're a pretty small shop, and ease of administration (at the server level) is a major driver. That, plus the consultants which built the failover cluster had a ton of experience with similar setups, and were confident that it wasn't going to cause any major issues.
--=cf
January 6, 2016 at 11:44 pm
I'll give you my take on it.
We use mostly 1 instance per VM as well, you are licencing the ESX host cores, not the virtual machine, so there is no point doing instances.
The only clustered SQL instances are physical.
We also use AlwaysOn Availability Groups on some VMs.
Problems with cluster - SAN had firmware upgrade, IO network drivers had some obscure incompatibilities with new firmware, on fail-over, IO path could not switch to other clustered instance, couldnt fail-back, cluster down.
ESX hosting VMs is linux based with different drivers and continued to work.
Problems with ESX VMs - SQL VM, Windows OS somehow became corrupt, blue screen on boot, not recoverable in safe mode, problem with Windows system state backups, OS had to be rebuilt and SQL installed and updated before dbs could be reattached.
Problems with AO AG's - failover due to slow IO one night, app login password was changed at some point but only on one server, app could not login after failover. You also need to be careful with backup jobs and maintenance jobs, actually all jobs. Also required many patches and plenty of troubleshooting to find cause of random failovers.
While I'm here:
Problems with replication - publisher is an upgraded 2000 db, still uses TEXT datatype. Application inserting data had buffer overflow issue, inserted non printable unicode characters into text column which caused some weird bug in replication. Other issue are people connecting to the wrong server and running procs causing replication to go out of sync.
Problems with mirroring - none! Failover for patching, .net app handles it fine. Guess I'll never understand what the big deal is about it being a pain to admin.
January 7, 2016 at 10:30 am
I guess my main concern with the cluster is that if it does have an issue, it might take longer for me to resolve the problem, and in cases where both nodes are affected, it destroys the high availability aspect that the solution touts. I'm just not sure how stable/unstable clustering really is, and I don't think I'll have the time to deep-dive into it unless there's an issue.
And I don't see us getting rid of instances running in non-clustered environments, so I'll always be gaining experience there. Plus, those installations are just simpler as a whole.
I suppose that, worst-case-scenario, if the cluster bombs in a manner which I can't recover it, we could just spin up a single instance and restore into it. Hmm...
--=Chuck
January 8, 2016 at 7:07 am
chuck.forbes (1/7/2016)
I guess my main concern with the cluster is that if it does have an issue, it might take longer for me to resolve the problem, and in cases where both nodes are affected, it destroys the high availability aspect that the solution touts. I'm just not sure how stable/unstable clustering really is, and I don't think I'll have the time to deep-dive into it unless there's an issue.And I don't see us getting rid of instances running in non-clustered environments, so I'll always be gaining experience there. Plus, those installations are just simpler as a whole.
I suppose that, worst-case-scenario, if the cluster bombs in a manner which I can't recover it, we could just spin up a single instance and restore into it. Hmm...
--=Chuck
Which hypervisor are you using?
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply