November 22, 2011 at 10:40 pm
GilaMonster (11/22/2011)
Chris Metzger (11/22/2011)
Am I just being blase about it or are some ppl just overly paranoid?Both.
Am I correct in saying you've never dealt with a disaster scenario?
Sure - I recently had a 48 hour outage due to a SAN failure because the platform wasn't designed or configured correctly (I didn't build it just took it over right before the outage). But anytime I've been the one who designed and built something I've not ever had issues that I couldn't fix quickly (like within a couple of hours max). While I understand being a protectionist and making sure to cover all bases I guess I'm just surprised at the number of ppl who expect things to just blow up and never come back (ie. we all plan for DR but who really expects it to actually be needed? If the datacenter blows up or gets hit by a nuclear missile then I have bigger issues than just failing over to my DR site). I wonder if it makes a diff because I'm responsible for the entire platform's design, construction, and administration and not just the SQL Servers (so there are other concerns and issues I'm dealing with and believe the overall platform's design will easily suffer multiple failures without entirely going offline).
But like someone else said this thread was started to discuss shrinking so I'll leave it at that.
November 22, 2011 at 11:47 pm
Chris Metzger (11/22/2011)
Jim Murphy (11/22/2011)
Ah, I see.If disk space permits, it is always a great idea to retain more. Moving those backups off to tape/NAS throughout the day is also a good thing (including protecting against controller failures, etc.). Moving to separate storage, recent backups are never lost, but the recovery is delayed by hour or three if older data needs to be restored.
Good point Jeff. Thanks for the clarification Gail.
Jim (and anyone else): there will NEVER be a controller failure. The environment is a true hardware cloud. The platform is: VMware vCloud with a Clariion SAN (all SAS). What happens is if a host fails (they are all blade chassis clustered together) then vCloud sees that and moves the guests from the host to another host automatically - the most you'll see is a burp of about 5 minutes while the move takes place and the VMs come back online (as if they were just shutoff or something). The SAN is one of those giant HA SANs so if a disk array fails another one will take over without issue and with almost no interruption (it has redundancy built into it with multiple disk sets/LUNs/controllers/etc).
So the HW failing is no longer going to be an issue on the new platform currently under construction. The existing platform does have all of those issues because there are multiple single points of failure - hence why I'm going to get away from it.
@Jeffrey: Yeah my maintenance plans do that for me instead of using something external/3rd party (for now - I'm looking at new 3rd party tools to switch to instead of using the native ones). And yes I'll be keeping more when I move this thing but for now I have what I have and don't have any other choice. It helps that we have a week of warm backups I can go to if necessary (done at the VM level) but it's, again, rare to need to do so.
It sounds like pretty much investment. Would you like to share it with us?
November 23, 2011 at 12:23 am
@dev: Sure! It definitely is a multi-million dollar investment. I have the real details around here somewhere but the short of it is a true EMC Cloud environment with multiple points of redundancy for all datacenter services. The datacenter is located in a geographically stable region of the US (East Coast) with a DR site on the West Coast. The platform is VMware vCloud which allows clustering of HW resources for a logical private network. The network I've built has logical separation at the "physical" network layer - so each vApp (or vLAN) operates independently of the primary ORG (for both logical and security reasons). The network will be as automated as possible for network services including intrusion monitoring (using SEP12 and SSE for IIS), update management (WSUS for now - SCCM or SMS or something that allows more automation), SQL monitoring, IIS monitoring, and application updates (for our commercially available healthcare app). The SAN is one of those giant EMC Clariion devices (I think its the CX4-960 which has a total capacity of 950TB) and there is room to add another as the datacenter grows. As I said above the DC is a Tier III, SAS70 Type II, DoD-compliant datacenter. The DC belongs to our parent and we'll pay for space there so we can get out of our old co-lo and out of the poorly designed replacement into/onto the new platform - will be paying around $17k/mo for the platform as it exists now (70 VMs covering 54 clients and 150+ plus physical facilities). If someone is interested I can give you our website where they obviously have the glammed-up stuff.
November 23, 2011 at 12:29 am
Chris Metzger (11/23/2011)
@Dev: Sure! It definitely is a multi-million dollar investment. I have the real details around here somewhere but the short of it is a true EMC Cloud environment with multiple points of redundancy for all datacenter services. The datacenter is located in a geographically stable region of the US (East Coast) with a DR site on the West Coast. The platform is VMware vCloud which allows clustering of HW resources for a logical private network. The network I've built has logical separation at the "physical" network layer - so each vApp (or vLAN) operates independently of the primary ORG (for both logical and security reasons). The network will be as automated as possible for network services including intrusion monitoring (using SEP12 and SSE for IIS), update management (WSUS for now - SCCM or SMS or something that allows more automation), SQL monitoring, IIS monitoring, and application updates (for our commercially available healthcare app). The SAN is one of those giant EMC Clariion devices (I think its the CX4-960 which has a total capacity of 950TB) and there is room to add another as the datacenter grows. As I said above the DC is a Tier III, SAS70 Type II, DoD-compliant datacenter. The DC belongs to our parent and we'll pay for space there so we can get out of our old co-lo and out of the poorly designed replacement into/onto the new platform - will be paying around $17k/mo for the platform as it exists now (70 VMs covering 54 clients and 150+ plus physical facilities). If someone is interested I can give you our website where they obviously have the glammed-up stuff.
I pretend I understood most of it... :hehe:
The information I needed, I found it ('multi-million dollar investment' and 'paying around $17k/mo for the platform'). Thanks for sharing it.
November 23, 2011 at 6:51 am
I heard of a story recently about an EMC SAN that had a controller card fail. The spare card took over the workload without problem and no data loss.
Then the engineer came in with a replacement card and puled out the live card by mistake. All data got toasted. See [url:http://www.informationweek.com/news/government/state-local/227100694%5D for details. Never assume you will never get a failure.
Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.
When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara
December 13, 2011 at 8:45 pm
Another good suggestion..I work for RemoteDBAExperts, we support all product lines with a group of great DBA's for each product line and we also provide 24/7 proactive monitoring. This could be something to look into and take a lot off your plate and help keep an eye on things. If you wanna learn more pm me or go to our company site http://www.remotedbaexperts.com.
December 14, 2011 at 2:25 am
EdVassie (11/23/2011)
I heard of a story recently about an EMC SAN that had a controller card fail. The spare card took over the workload without problem and no data loss.Then the engineer came in with a replacement card and puled out the live card by mistake. All data got toasted. See [url:http://www.informationweek.com/news/government/state-local/227100694%5D for details. Never assume you will never get a failure.
Fortunately each disk array within the SAN (which has multiple arrays) has 4 4-channel controllers so it can suffer multiple failures without issue. In fact if one goes bad and the Engineer who comes out does something stupid like the above then the other 2 remaining good controllers will continue to serve the disks without issue. If that had been me they would have found that person hanging from a cross out front of the building as a warning to other idiots not to make the same mistake. :w00t:
December 14, 2011 at 2:27 am
corbeckcup (12/13/2011)
Thanks but I think we're going to go with the automated system and use our offshore resources - it's far more cost effective than paying an outside company to handle it (or even paying the helpdesk for the DC to monitor it which is also an option). I have 15 people on my team so if we can't keep up with it after I finish this build and automating as much as possible then we have bigger issues.
Viewing 8 posts - 31 through 37 (of 37 total)
You must be logged in to reply to this topic. Login to reply