Recently I worked on a neat little POC with Patrick Leblanc for a customer in Education who wished to perform sentiment analysis and key phrase extraction on surveys completed by students regarding classes and instructors, which brings me to this blog post.
Using Azure ML and a free subscription to the Text Analytics API, I’m going to show you how to perform sentiment analysis and key phrase extraction on tweets with the hashtag #Colts (after this past Sunday’s 51-16 beat down of the Colts at the hands of the Jacksonville Jaguars, I’m bathing in the tears of Colts fans. Watch the highlights! ). Although my example here is somewhat humorous, the steps can be used to perform sentiment analysis and key phrase extraction on any text data as long as you can get the data into Power Query.
Getting Started with Power BI? Begin here!
To get started there’s a couple things you need to do first:
- Subscribe to the Text Analytics API, which you can find here: https://datamarket.azure.com/dataset/amla/text-analytics.
I’ve subscribed to the free subscription since I don’t need more than 10,000 transactions/month. - If you want to analyze tweets like I will be doing, you’ll need to download Plus One Social, a nifty little tool that will extract tweets and load them into an Access database. Download Plus One Social here: http://plusonesocial.com/. Using Plus One is pretty easy, but if you need instruction on how to use the app, Plus One has a great getting started video on YouTube: https://www.youtube.com/watch?v=viXHXrDGQaA .
After running the app, an Access database is created with all the info I need to analyze the sentiment of the tweets and extract the key phrases. The Plus One Access database is created by default in my Documents folders in a Plus One Social folder.
Next, I’ll open Excel and create a Power Query to pull in the data with the tweets from the Access database created by the Plus One Social app. Open Excel, go the Data ribbon, click New Query, select From Database, and select the option From Microsoft Access Database:
I’m just selecting the Messages table since I’m really only measuring the sentiment and pulling out the key phrases. Once you select the Messages table, click Edit to open the query in Power Query.
Then I’ll click Choose Columns button to limit the result set to just the MessageID, Message, and CreatedDate columns.
Click OK. I want to do a little data cleansing to remove some unwanted characters that I know will cause some problems for the API, like # and @. Also, I want to remove any tweets that begin with RT indicating it was a retweet. I want to try to only capture the distinct tweets.
To remove any rows beginning with RT, right-click a row and select Text Filters and select Does Not Begin With. Then in the Applied Steps window on the right, click the settings cog next to the Filtered Rows step and change the value to RT. Click OK.
I also want to replace the # and @ characters. I can do this by selectin ghte Message column and selecting Replace Values from the Transform ribbon. I’ll replace # with (hashtag) and @ with (at).
Then click Close & Load in the top left of Power Query.
To conduct the sentiment analysis (or the key phrase extraction), we can use the single response API, which you can read about here: https://azure.microsoft.com/en-us/documentation/articles/machine-learning-apps-text-analytics/.
To use the single response API in Power Query, we need to create a parameterized function. To do this, we’ll first go to the Data ribbon in Excel, click New Query, select From Other Sources, and select From OData Feed. In the URL, enter the following URL:
https://api.datamarket.azure.com/data.ashx/amla/text-analytics/v1/GetSentiment?Text=hello+world
Then click OK.
You may be prompted to enter your Azure Marketplace Account key to access the OData feed. Click the Get your Marketplace Account Key hyperlink and copy & paste your account key into the text box. Be aware that this is sensitive data.
After entering in the account key, click Save. You should see a sentiment score, which is the sentiment score for the phrase “hello world”. We need to turn this into a function and parameterize the function.
In Power Query, go to the Home ribbon, click Advanced Editor and use the following text to create the function:
(inputText) => let Source = OData.Feed("https://api.datamarket.azure.com/data.ashx/amla/text-analytics/v1/GetSentiment?Text=" & (inputText)) in Source
=fxGetSentimentScore([Message])
(inputText) => let Source = OData.Feed("https://api.datamarket.azure.com/data.ashx/amla/text-analytics/v1/GetKeyPhrases?Text=" & (inputText)) in Source
With that complete, the only logical step now is to dump it into Power BI, of course!
I thought that was pretty cool and worth sharing. Patrick Leblanc recently posted a blog on how to do the same thing I just did within Power BI. You can read Patrick Leblanc’s blog on Sentiment Analysis with Power BI here: http://patrickdleblanc.com/wordpress/?p=41.
Resources
Subscribe to the Text Analytics API I used here.
Here’s the documentation on the Sentiment Analysis and Key Phrase extraction here. This is helpful!
Download Plus One Social here.
Download the Power BI Word Cloud and other custom visuals here.
Feedback?
I hope you found this helpful! If you have any questions or feedback, feel free to leave it in the comments!