August 2, 2012 at 2:26 am
I'll cheerfully admit I would not even have a clue how to find out what the problems might be that require $2m/day worh of computing, let alone how to submit such problems to the machine's mighty maw. I too would love to hear from anyone that does.
In the meantime this article linked at the bottom of one of Steve's link gives some food for thought (SFW unless you are at a religious institute or something I would think) about the problems faced in a 'sister' industry of our own. Redis sounds interesting.
August 2, 2012 at 3:57 am
As a DBA the first thing that pops into my mind is large scale BI processing and data mining. This will put quite some pressure on the challenge of moving around large amounts of data both fast and secure. It also requires the provider to sell this kind of processing power for an hour a day or so. But in many cases the in-house processing for BI purposes requires additional powerful servers that will be nearly idle most of the day.
If a giant like Google is able to provide data processing to clients on a global scale, it would make better use of the required resources because at any time somewhere someone will need to process some data. After a huge initial amount of data, only the deltas are needed so the amount of data transferred could be quite manageable. Maybe in the future we will receive our daily cubes from Google ...
August 2, 2012 at 6:39 am
I think Microsoft & Google have partnered up to use the Google engine to compute how many times Steve has mentioned cloud computing to dispute his cloud sponsorship payments. :w00t: I just wonder who the next Ross Perot will be with the cloud systems like this seemingly having vast amounts of non-use/down time. Maybe Google can use it to search for bigfoot or find out what makes crop circles, or the gene for male patterened baldness (i threw that in for steve) 😛
August 2, 2012 at 7:06 am
We've seen RFPs asking for 2 terabytes of data in the cloud with 24 hour recovery time, with the data encrypted at rest. I can't find anyone who can provide SQL Server Enterprise Edition (to use TDE and partitioning) at that scale as a cloud provider (which to me means the replication over to a failover node and data is handled by them, or so I hope). Just investigating this -- any suggestions?
Your point about data quality is very important. Having quality data, or at least understanding the limits of your data, is critical in any data mining exercise. Hopelessly wrong or no conclusions can be arrived at when the inputs are incomplete or put together incorrectly. I hope that companies recognize that data governance is worth putting more resources into.
August 2, 2012 at 7:19 am
Steve, if anyone does share with you their big cloud operations please beg for permission to share the gist of it with us.
One application that comes to mind for $2m/day computing costs is pharma research. If 770,000 cores can do in 1 day what on-site resources would do in a month, then the time saved is worth the money tradeoff. I would definitely like to hear about the type (and volume) of data as well as how/why it makes sense to outsource computation.
August 2, 2012 at 8:53 am
thadeushuck (8/2/2012)
I think Microsoft & Google have partnered up to use the Google engine to compute how many times Steve has mentioned cloud computing to dispute his cloud sponsorship payments. :w00t: I just wonder who the next Ross Perot will be with the cloud systems like this seemingly having vast amounts of non-use/down time. Maybe Google can use it to search for bigfoot or find out what makes crop circles, or the gene for male patterened baldness (i threw that in for steve) 😛
LOL, I wish I was getting paid for sponsorships.
Cloud computing is an interesting topic, and it's different from most things we've seen before. I can't decide if different implementations are good or bad sometimes, and for the most part I think we are stuck doing a case by case analysis of where/when it works. So that means learning more about it.
I'd be happy to shave everything off the top of my head. Just waiting for my wife to give me the OK.
August 2, 2012 at 8:54 am
zintp (8/2/2012)
We've seen RFPs asking for 2 terabytes of data in the cloud with 24 hour recovery time, with the data encrypted at rest. I can't find anyone who can provide SQL Server Enterprise Edition (to use TDE and partitioning) at that scale as a cloud provider (which to me means the replication over to a failover node and data is handled by them, or so I hope). Just investigating this -- any suggestions?
No idea so far, but I'll look. I suspect only AWS/Azure/Google could do this right now. Most of the other offerings are less filled out.
The one idea is that most of the companies do offer just a VM, so you can install SQL Server EE and use TDE, however if you are doing that, you're essentially doing a co-location, but allowing someone else to buy the hardware. Not sure what the point is there.
August 2, 2012 at 8:56 am
I recently completed a project that required cleaning up over 100-million rows of name-and-address type data. It was a mess, and the clean-up was very, very resource-hungry. But I won't need those hardware resources again for any forseeable project here. Would have been great to rent some clock-cycles, et al, from Amazon or whomever, for the duration of the project. No budget for it, but we could plan that kind of potentiality into future budgets.
So I think this is a great trend with all kinds of potential uses.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 2, 2012 at 9:15 am
Mike Dougherty-384281 (8/2/2012)
Steve, if anyone does share with you their big cloud operations please beg for permission to share the gist of it with us.One application that comes to mind for $2m/day computing costs is pharma research. If 770,000 cores can do in 1 day what on-site resources would do in a month, then the time saved is worth the money tradeoff. I would definitely like to hear about the type (and volume) of data as well as how/why it makes sense to outsource computation.
I'll definitely try to share whatever I can learn. Some of the places I linked are examples of what I've seen. The big win seems to be the lack of investment needed for large scale computing. There's definitely a tipping point here, and I've seen this in the *Nix world before with large IBM machines where we had extra hardware in the machine that wasn't licensed to us. We could activate this for short periods as needed, paying a "rental" fee.
As an example, in the 2001/2002 area, we had a large 64 CPU AIX server, but we were licensed for 36 CPUs. That's what we "bought". At end of quarter, we could "rent" an additional 10-12 CPUs for 2-3 days, with a license key. AIX allowed hot-add of the CPUs, so this worked well for us. Our calculations showed that this was worthwhile until we needed about 90+ days. Since we were looking at 8-10 days a year, it was better to rent the CPUs than buy them.
I think that's what cloud computing gets you when it's done well. You can burst in those places you need to. If the load is steady, you probably do better with purchasing equipment at some point.
August 2, 2012 at 11:30 am
Steve Jones - SSC Editor (8/2/2012)
zintp (8/2/2012)
We've seen RFPs asking for 2 terabytes of data in the cloud with 24 hour recovery time, with the data encrypted at rest. I can't find anyone who can provide SQL Server Enterprise Edition (to use TDE and partitioning) at that scale as a cloud provider (which to me means the replication over to a failover node and data is handled by them, or so I hope). Just investigating this -- any suggestions?No idea so far, but I'll look. I suspect only AWS/Azure/Google could do this right now. Most of the other offerings are less filled out.
The one idea is that most of the companies do offer just a VM, so you can install SQL Server EE and use TDE, however if you are doing that, you're essentially doing a co-location, but allowing someone else to buy the hardware. Not sure what the point is there.
Microsoft has as platform called Cosmos. That beast sucks in the entire Web in a day or two, to have it indexed for Bing. The basic allocation of space is 50 TB. It is meant for non-volatile data -- an archived Web page may be (and often will be) superceded by today's version, but it does not change the yesterday's state. It is a high-performance system but not in terms of transactions.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply