As I noted in my previous post, I am not a fan of scripting. In that post we set up a cluster without using scripts to do so. Before we get to the steps to load our own data, I wanted to take a moment and explore the HDInsight Query Console or Dashboard (according to the link on the Azure Portal). This console is part of a HDInsight Cluster, if you need to create a cluster, check out my previous post. As part of my desire to not script, this console provides some insights into my cluster as well of some tools to explore the contents the cluster.
I am going to highlight areas of interest to me on this, so this is by no means exhaustive. I encourage you to explore more comment on what you find compelling that I may have missed. Also, with the rate of change in Azure, it is likely some of this will change sooner rather than later. Hopefully it will only get better.
Getting Started Gallery
I would guess that this is the section most likely to change since they have “Coming Soon” buttons. Each of these scenarios use a step-by-step approach to helping you set up a job common to HDInsight or Hadoop. When you click a scenario button, you will get a list of steps to walk through to complete the solution. Here is the start page from Azure Website Log Analysis.
In this scenario, you load log files and eventually use Hive to organize and use the data in Excel. This is a great way to learn about using HDInsight to solve some specific problems. Currently the scenarios are grouped into solutions with Azure data and solutions using sample data. When this article was created, the following scenarios were available:
- Microsoft Azure Website Log Analysis Solution (uses Azure Data)
- Microsoft Azure Storage Analytics Solution (uses Azure Data)
- Sensor Data Analysis (uses sample data)
- Twitter Trend Analysis (uses sample data)
- Website Log Analysis (uses sample data)
- Mahout Movie Recommendation (uses sample data)
Hive Editor
The next section is an online Hive editor which lets you create Hive queries. It comes with a quick query for the hivesampletable which is included in your cluster when it is created. Go ahead and submit this query. You will see the editor create the jobs and track the status.
Click View Details to see the Job information and the results of the query. You can download the results and logs from the details page. You can create additional queries. For instance if you want to see the structure of the table you can use the following Hive query:
SHOW CREATE TABLE hivesampletable;
You can find more info on the Hive Query Language at https://cwiki.apache.org/confluence/display/Hive/LanguageManual.
Job History
The Job History page shows all of the jobs you have run. You can also get to the Job Details from previously run jobs here like you did from the Hive Editor window.
File Browser
The File Browser page lets you explore the files and structure in your HDInsight cluster. You click the name to drill down into the contents of your cluster. When you first open the page, your cluster name is shown in the window. Here is the basic structure from a new cluster:
- Cluster
- Containers (e.g. $logs)
- Directories (e.g. example, app-logs)
- Files (these may be multiple directories deep)
- Directories (e.g. example, app-logs)
- Containers (e.g. $logs)
Once you get to the file location, you can download the file.
Hadoop UI and Yarn UI Pages
The Hadoop UI page shows information about the Hadoop cluster you have created. It includes an overview and some specifics on datanodes and snapshots.
The Yarn UI gives some more information about jobs and cluster metrics related to your MapReduce jobs. If you are interested in learning more about Yarn check out this site: http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html.
This console is a great way to become more familiar with Hadoop, HDInsight, and Hive. While it does not give you all the capabilities it is a good no scripting starting point. The next post will discuss how to load a simple set of data into Hadoop for analysis beyond the samples.