After all the hype about Big Data, Hadoop, and now HDInsight, I decided to build out my own big data cluster on HDInsight. My overall goal is to have a cluster I can use with Excel and Data Explorer. After all, I needed more data in my mashups. I am not going to get into the details or definitions of Big Data, there are entire books on the subject. I will discuss any issues or tidbits during the process while I am here.
Setting Up the Environment
I am actually doing this on a VM on my Windows 8 laptop. I created a Windows 2012 VM with 1 GB of RAM and 50 GB of storage. (Need some help creating a VM in Windows 8, check out my post on the subject.
Installing the HDInsight Server
First, this product is still in Preview at the time of this writing, so mileage will vary and likely change over the next few months. You will find the installer at http://www.microsoft.com/web/gallery/install.aspx?appid=HDINSIGHT-PREVIEW. This uses the Microsoft Web Platform Installer. When prompted I just ran the installer. This took about one hour to complete on my VM setup. Once it completed, it opened up the dashboard view in IE.
At this point we have installed a cluster called “local (hdfs)”.
Exploring My Local Cluster
Well, things did not go well at first. Whenever I clicked the big gray box to view my dashboard, I received the following error: “Your cluster ‘local (hdfs)’ is not responding. Please click here to navigate to cluster.” I clicked “here” and ended up on a IIS start page. Not really effective. Let the troubleshooting begin.
Based on this forum issue response, I opened the services window to find that none of my Apache Hadoop services were running after a restart AND they were set to manual. To resolve this I took two steps. First, I changed all of my services to run automatically. This makes sense for my situation because the VM would be running when I wanted to use HDInsight. Second, I used the command line option to restart all of the services as also noted in the forum post above.
From a command prompt execute the following code to restart all Hadoop services:
c:\hadoop\start-onebox
And, VOILA!, my cluster is now running.
Maybe we can get a better error message next time.
At this point I walked through the Getting Started option on the home screen and proceeded to do “Hello World”. I used these samples as intended to get data in my cluster and start working with the various tools. Stay tuned for more posts in the future on my Big Data adventures.
Why Not HDInsight Service on Azure?
The primary reason I did not use the HDInsight Service on Azure was that I did not want to risk the related charges. Once I have a good understanding of how HDInsight Server works, I will be more comfortable working with HDInsight Service.
Other Resources
Here are some of the resources I used throughout the build.
HDInsight Service Quick Start and Tutorials