Blog Post

PASS 2015 Session Report – Understanding Real World Big Data Scenarios

,

PASS 2015 continues in Seattle, and today was my session at 1045am on Using Azure Machine Learning (ML) to Predict Seattle House Prices.  The background and info on my session is here http://www.sqlpass.org/summit/2015/Sessions/Details.aspx?sid=7794

Overall I was pretty happy with how it went - and I think everyone who attended had a lot of fun with some of the games and tests I injected into the presentation.  Everyone had a chance to be a Real Estate Agent :) - and at the same time learn some great methods around performing Azure ML Regression Predictive Analytics.

 

BUT – moving right along – I also attended 3 other sessions today, again I cannot blog about all of them in the time I have, but the one which made me think the most about technology implementations and how they can improves lives was Understanding Real World Big Data Scenarios by Nishant Thacker of Microsoft.

It wasn’t about use cases for big data (as this is a horse already bolted), but more around really innovative and interesting ways the ecosystem of Azure technologies could be deployed to solve some complex business problems, or moreso simply ways to make our lives better!

Key Takeaways

  • This session was as much about ideas as it was about technology – and I think the key driver for this session for me was that the Microsoft strategy around cloud and big data is well targeted, well defined, and well considered.  Its an exciting place to be!
  • On a straw poll, 10% of the room had actively implemented a Big Data solution (of some sort/flavour) while 50% are in the process of actively learning about what Big Data could do for their business.
  • He introduced the Azure Data Lake suite, which he made clear was NOT hosted on the Azure Blob Store (which is where HDInsight is currently hosted) but is instead hosted on a seperate high performance storage subsystem specifically tailored to analytics (or more specifically Azure Data Lake Analytics).
  • Here is some info on the Azure Data Lake Store https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-overview/ and on the Azure Data Lake Analytics https://azure.microsoft.com/en-us/documentation/articles/data-lake-analytics-overview/
  • Longer term it seems that Azure HDInsight (Hadoop) will have the option to run either on Azure Blob Store (WASB://) as it does now or direct/natively on Azure Data Lake (ADL://).  I suspect that the ADL will be a more expensive storage option given the performance levels, and so would only be selected for HDInsight clusters in special circumstances?  Or is it that ALL HDInsight storage will eventually end up in the Azure Data Lake given that HDInsight (clusters) are a fundamental part of Azure Data Lake? Yes, well, still to be verified!
  • All of the “real world” scenarios that were painted were absolutely real and exist – and all of them leveraged the new Azure Data Lake in some way, in addition to several other key elements of the Azure ecosystem, such as Azure Machine Learning (ML), Azure SQL Database, Azure SQL Data Warehouse and Event Hubs – in fact all of these components were in all solutions in some way.  Its impressive to see such easy integration in such a massively scalable platform.
  • The scenario that I drew many parallels with the most (for various reasons!) was the Fraud Detection scenario.  It made specific reference to collecting whatever data you have now (such as live transnational data, crossed historical pattern data, crossed with customer data, crossed with social data etc) with from wherever you can get it to make predictions if a certain set of transactions is fraudulent or not given the circumstances and historical patterns – and then taking immediate action accordingly (as opposed to batch action later).
  • It also made clear that despite so much data being collected, the end game was never to delete it, but instead draw on it to make future predictions better. (which to me makes perfect sense, though you would sometimes consider this possible not practical – and there would instead be a dimensionality reduction in the data before longer term storage?)

Overall quite a good session that positioned 3-4 relevant use cases along with the technology implementations for those scenarios.


Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating