I hadn't even heard of Hadoop before, but there was a Hadoop World conference recently and it came to my attention on Twitter. I saw a quote that said "JP Morgan Chase is counting on an order of magnitude savings on data warehousing. " Since it's primarily a Linux based system and only set up for development, not production, on Win32 systems, perhaps that's not surprising.
I tried to read through the quickstart on Apache's site for the common core installation and walk through a few examples, but it's a little hard to tell what exactly the buzz is about. Wikipedia was more help, pointing me to the MapReduce papers that Google published. I'll see if I can work through them at some point. Hadoop is available under a free license and the list of companies using it for large data set processing is impressive: Yahoo!, Amazon, Facebook, and more.
So what's the purpose? Hadoop appears to allow clusters of servers to perform data processing very efficiently. It's built on it's own distributed file system that scales to handle petabytes of data. That might seem like more data than you and I will ever need to work with, but I remember when it was a challenge to get enough disk drives together to assemble a terabyte in a server. Now I have 1.5TB in my desktop, with room for more.
It's an interesting project, and with data volumes constantly growing, I wonder when we'll see a similar technology in Microsoft's data processing platform. They already purchased a search technology company based on Hadoop, and we might see this used in Bing.
I expect this type of processing, and others like the StreamInsight features in SQL Server 2008 R2, to complement, rather than supplant the traditional SQL database engine.
Steve Jones
The Voice of the DBA Podcasts
The podcast feeds are available at sqlservercentral.mevio.com. Comments are definitely appreciated and wanted, and you can get feeds from there.
You can also follow Steve Jones on Twitter:
or now on iTunes!
- Windows Media Podcast - 23.7MB WMV
- iPod Video Podcast - 19.4MB MP4
- MP3 Audio Podcast - 3.9MB
Today's podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at www.everydayjones.com.
I really appreciate and value feedback on the podcasts. Let us know what you like, don't like, or even send in ideas for the show. If you'd like to comment, post something here. The boss will be sure to read it.