[read this post on Mr. Fox SQL blog]
I presented at our local Melbourne SQL Server User Group (SSUG) this week on Azure Cortana Intelligence Suite – and the integration with the Azure Data Catalog. If you are interested to see the presentation then have a look here – https://mrfoxsql.wordpress.com/presentations/
One of the things which stood out to me from attendees is that the number of new services and new cloud terms which are being introduced across the industry is sometimes difficult to keep track of – especially when you have a day job to do! Right now I cannot think of another time in IT where the rate of change is so dramatic.
More-so it can also sometimes be hard to know which services to use together to get the best architecture outcome.
So this blog post is aimed to explain at a high level the Azure Cortana Intelligence Suite of services and one example of how they could be deployed together in a cohesive architecture.
I will talk to each of these services in the context of a business solution to manage a farm of wind turbines!
And so, lets get into some discussion on Azure IoT Solutions!
Azure Suite of Cloud Services
Firstly lets take a brief look at Azure itself. Azure is not a product per se, but an entire ecosystem / collection of cloud services hosted in Microsoft Data Centers around the globe.
This is an excellent 1 page interactive diagrammatic overview of every service in Azure – I strongly recommend to check out the link – http://azureplatform.azurewebsites.net/en-us/
Microsoft have Azure DC’s in 24 global regions, and 8 more coming. However not every Azure service is available in every Azure DC (yet) – and some are in preview – so this link shows which services are where – https://azure.microsoft.com/en-us/regions/#services
Some of Azure the services are IaaS, some are PaaS and some are SaaS. These are not Microsoft terms, but industry terms. If you haven’t heard of them before – here’s 3 great short links to explain them to you.
- IaaS – https://azure.microsoft.com/en-us/overview/what-is-iaas/
- PaaS – https://azure.microsoft.com/en-us/overview/what-is-paas/
- SaaS – https://azure.microsoft.com/en-us/overview/what-is-saas/
What is the Cortana Intelligence Suite?
The easiest way to explain the Cortana Intellogence Suite is that it is simply a logical grouping of several of the existing Azure Cloud Services that are often deployed together in an intelligent end-to-end cloud solution. Every single service under the Cortana Intelligence Suite is PaaS.
Independently each one of these services can be deployed by itself for an individual task – however having all the services bundled together under a single brand makes it easier to understand which services make sense to deploy together.
This is the collection of Azure Cloud Services under the Cortana Intelligence Suite brand – we’ll talk to each of these in the context of my sample wind farm business.
This is the link to the Cortana Services – https://azure.microsoft.com/en-us/solutions/cortana-intelligence/
Cortana Intelligence Suite – An Example Architecture
Often the best way to understand how something works is to see them deployed in an architecture. This is a high level diagram I put together which shows each of the Cortana Intelligence Suite services deployed together in our fictitious Wind Farm business.
On the left we have our farm of say 1,000 wind turbines – these are generating 2 distinct types of data; Streaming and Batch.
- Streaming – Lets say each turbine generates 100 data points per second on things like rotational speed, direction, energy generated, engine temperature, ambient temperature, current weather, etc on various touchpoints around the turbine and environment. Each data point is very small (Key/Value pair) but still across the farm that’s 100,000 data points per second that we somehow need to capture. This is not a volume issue, its a velocity issue.
- Batch – Think about say at midnight every night each turbine generates a daily health check report on all its components and their current state. Lets say each report is 1GB, so that’s about 1TB of data across the farm every day. This is not a velocity issue, its a volume issue.
Now lets look at each Cortana component and what its role is for this solution.
Event Hub
What is it
https://azure.microsoft.com/en-us/services/event-hubs/
Fully Managed Service (PaaS) for ingesting events/messages at a massive scale (think telemetry processing from websites, IoT etc).
What does it do in our wind farm
Provides a “front door” to our wind farm application to accept all of the streaming telemetry being generated from the turbines. Event Hubs wont process any of this data per se – its just ensuring that its being accepted and queued (short term) while other components cane come in to consume it.
Stream Analytics
What is it
https://azure.microsoft.com/en-us/services/stream-analytics/
Fully Managed Service (PaaS) for deploying CEP solution/services. CEP is an industry term for Complex Event Processing, which is high scale event ingestion and in-stream data querying and analytics.
What does it do in our wind farm
In our solution maybe we want to understand not just rotational speed at a specific 1 sec instant, but perhaps the average speed over 1 min windows. Stream Analytics can pull a 1 min interval of data from Event Hubs and create a tumbling window of Average Rotational Speed grouped by Turbine ID. We’d write that query in a SQL-like language.
The data can then be surfaced immediately into say Power BI, or it could be saved into Data Lake Store or SQL Data Warehouse for long term storage.
It could also create an alert for any turbine that is say 10% faster or slower in Average Rotational Speed than the average for all turbines over than 1 min interval – as this may indicate a bearing issue on the physical turbine itself.
Data Lake Store and Data Lake Analytics
What is it
https://azure.microsoft.com/en-us/solutions/data-lake/
https://azure.microsoft.com/en-us/services/data-lake-analytics/
Store – PaaS service. Highly scalable distributed file store for capturing data of any size or shape. Specifically tuned for analytic/streaming workloads.
Analytics – PaaS service. Highly scalable batch based analytics service that is specifically built to efficiently query content in Data Lake store. Dynamic scaling of batch job performance. Jobs written in a new language called U-SQL (combination of c# and SQL)
What does it do in our wind farm
Our farm is generating a lot of data of which some is immediately and obviously useful to our business, like Average Rotational Speed, and other data we have not yet worked out its business value. However we don’t want to throw that data away just yet! Data Lake Store is a cheap and unlimited highly scalable file store, so it gives us the ability to store every data point or unstructured content we receive from our wind farm and work out later how we may use it.
In many ways for our solution Data Lake Store is a long term storage and staging area for every bit of data we receive in our solution.
At the end of the day we want to do some analytics or querying of this data, so Data Lake Analytics allows us to write adhoc SQL-like batch queries in a language called U-SQL (combination of c# and SQL) across massive amounts of data stored in Data Lake Store in order to look for trends or insights that we otherwise would have difficulty in finding.
For our business perhaps we want to look for every historical situation where the instantaneous Rotational Speed exceeded the Average Rotational Speed for a turbine and under what conditions that trigger occurred.
HDInsight
What is it
https://azure.microsoft.com/en-us/services/hdinsight/
Fully Managed Service (PaaS) for deploying Hadoop, Spark, HBase and Storm on Windows or Linux servers. 100% OPEN SOURCE Apache Hadoop (HDP 2.3) compatible
What does it do in our wind farm
Similar to the above, queries against data in the Data Lake Store can come from many different tools. For people more comfortable with Hadoop then HDInsight can be used to perform a very similar analysis to Data Lake Analytics.
Its interesting, but people often ask me “What tool should I used to solve X?” – and the answer is almost always “The one that you’re good at!” – Of course you want to choose a tool that’s fit for purpose, but at the same time you want to choose one that brings value the quickest!
SQL Data Warehouse
What is it
https://azure.microsoft.com/en-us/services/sql-data-warehouse/
Fully Managed Service (PaaS) for deploying an MPP SQL Data Warehouse. MPP is an industry term for Massive Parallel Processing, a massive scale out architecture to improve throughput.
SQL Data Warehouse essentially deploys distributed Azure SQL Databases under the hood – and is the Azure cloud version of the on-prem SQL Server APS appliance.
What does it do in our wind farm
Our wind farm generates lots of data, however only some of it is of immediate operational interest to the business. As such the majority of the data may sit in Data Lake Store, and only a small relational portion of it will be aggregated into SQL Data Warehouse.
In our case we perhaps don’t want to store every Rotational Speed data point for every turbine in our Data Warehouse, but we do want to store the Average Rotational Speed by Turbine ID by hour. And so we create a 1 hour window extract from a Data Lake Analytics U-SQL job and store this in a classic DW architecture (Facts and Dimensions).
So we would have a Turbine Dimension table, a Date Dimension table, a Time Dimension table and a Rotation Fact table with the data being loaded perhaps every hour. These can then be eaisly queried from Power BI, Tableau, Excel or whatever for operational reporting purposes.
Data Factory
What is it
https://azure.microsoft.com/en-us/services/data-factory/
Fully Managed Service (PaaS) for Composing Data Processing, Transformation and Movement Services into Scalable and Reliable Data Pipelines. Although not perfectly 1:1, for SQL Server people (like me!) think SSIS as a cloud data service.
What does it do in our wind farm
As mentioned above the turbines generate streaming data and daily batch data. The batch data doesnt fit Event Hubs so it needs a different path. We can use Data Factory to reach out to the data source for the daily data and pull this into our operational solution. Data Factory is like an ETL tool and is great at orchestrating a source data extract, an in-pipe data transformation, and a tgarget data load.
So for us it will pull the 1GB files from the turbine ops center file server, transform this into a table structure and load this into our SQL Data Warehouse for aggregated reporting. If we weren’t sure on how we wanted to report on it, but still wanted to keep it, then it could also just be loaded into the Data Lake Store for later staging into the SQL Data Warehouse.
Machine Learning
What is it
https://azure.microsoft.com/en-us/services/machine-learning/
Fully Managed Service (PaaS) for composing analytical models for performing predictive analytics. Drag/drop GUI interface to create and deploy predictive analytics solutions.
What does it do in our wind farm
For our business an offline turbine means lost revenue and a costly repair job. As such we want to do something called predictive maintenance. So we can use Machine Learning to create a predictive experiment using the static data in the Data Lake or SQL Data Warehouse to identify the combination of events which often lead to the failure of a turbine.
Perhaps our experiemnt would identifyt that something like when ambiant temperature is high, and Average Rotational speed is high and the last time turbine maintenance occured was more than 180 days then there is a 90% chance of bearing failure in the next 30 days.
As such Machine Learning can make predictions on which turbines may fail, under what conditions and in what timeframe. Therfore our business can send a maintenace team to replace the bearings before that failure occurs – this is a 30 min maintnenace task vs a 24 hour turbine failure downtime.
Data Catalog
What is it
https://azure.microsoft.com/en-us/services/data-catalog/
Fully Managed Service (PaaS) for storing metadata about data assets which are avaialble within an enterprise. This is essentailly a managed data dictionary at an enterprise level.
What does it do in our wind farm
In the wind farm solution there are lots of data touchpoints where data can be surfaced for reporting purposes. However how do you know what data is where, what it is for, how to get to it or how to use it? Data Catalog provides a means by which the data assets, such as tables in the SQL Data Warehouse, can be (semi) automatically documented to provide a central metadata repositroy.
People within the busienss can go to the Data Catalog and do a search for terms like Rotational Speed and the catalog will return a list of all of the data sources and data assets within the busienss where that term has been documented. Once identified the user can then contact the registered expert to learn more, or get a connection string from the catalog that allows them to connect to that data source from their application, such as Power BI, Tableau, Excel etc to extract the data they need to report on.
Summary
Well, that wraps it all up. I hope this has been useful to help you understand the Cortana Intelligence Suite of services and to put them into an end-to-end logical architecture.
If you are interested to see something like this actually deployed and peek under the hood – then check out the preconfigured demo solutions that you can deploy from the Cortana Intelligence Gallery. All you need is a Azure subscription and these demos can deploy everything and connect it together at the click of a button!
https://gallery.cortanaintelligence.com/SolutionTemplate/Predictive-Maintenance-for-Aerospace-1
AND of course, as I always say, please test this yourself as your results may vary!
Enjoy!
Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here