Microsoft Ignite has always announced many new products and new product features, and this year was no exception. Many exciting announcements, and below I list the major data platform related announcements, and I will follow-up with more details in future blogs but wanted to post a quick summary:
Azure Synapse Analytics (preview) – This is an evolution of Azure SQL Data Warehouse. It blends together big data, data warehousing, and data integration into a single service for end-to-end analytics at cloud scale. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources. Azure Synapse brings the data warehousing and Big Data analytics together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs. Read the blog and check out the docs. Join the private preview here. The changes boil down to these areas:
- The core data warehouse engine now supports not only explicitly provisioned workloads (pay per DWU provisioned) but serverless on-demand workloads (pay per TB processed) using T-SQL
- A Synapse workspace that allows you to provision resources including SQL pools (SQL DW) and Apache Spark pools
- Integration of Apache Spark, ADLS Gen2, Power BI, and Azure Data Factory (all are provisioned or linked within the Synapse workspace). Files in ADLS are query-able using a SQL Analytics runtime (T-SQL that does not require an EXTERNAL TABLE command) or a Spark runtime (Spark SQL, Python, Scala, R, .NET)
- A unified web user interface called Azure Synapse Studio which controls all your resources. It integrates a notebook experience that can use Python, Scala, .NET, and Spark SQL. It is here you can ingest (Azure Data Factory), explore (T-SQL script or Spark SQL in a notebook), analyze (Machine Learning Services in a notebook), and visualize data (Power BI), bringing everything together in a single pane of glass
Azure Arc (preview) – A set of technologies that unlocks new hybrid scenarios for customers by bringing Azure services and management to any infrastructure:
- Enables Azure SQL Database (managed instance) and Azure Database for PostgreSQL Hyperscale to run on-premises (using Kubernetes on Linux on the hardware of choice), at the edge and in multi-cloud environments. Eventually Arc will support all the data services
- Expanded the Azure Stack portfolio to offer customers even more flexibility with the addition of Azure Stack Edge. Azure Stack Edge, previously known as Azure Data Box Edge, is a managed AI-enabled edge appliance that brings compute, storage and intelligence to any edge. Also introduced is a new Rugged series of Azure Stack form-factors designed to provide cloud capabilities in the harshest environment conditions supporting scenarios such as tactical edge, humanitarian and emergency response efforts
- Extend Azure management and security to any infrastructure. Hundreds of millions of Azure resources are organized, governed and secured daily by customers using Azure management. Azure Arc extends these proven Azure management capabilities to Linux and Windows servers, as well as Kubernetes clusters on any infrastructure across datacenters, clouds and edge devices
- Read the blog
SQL Server 2019 is GA – Major new features are Big Data Clusters (see this great workshop to learn it), data virtualization, intelligent database, secure enclaves, extensiblity, and accelerated database recovery. Read the blog.
Azure Cosmos DB enhancements:
- Azure Cosmos DB new analytics storage engine (preview) – Azure Cosmos DB containers are now internally backed by two storage engines: a transactional storage engine and a new analytical storage engine. Both the storage engines are log-structured and write-optimized for blazing fast updates. The transactional storage engine is encoded in row-oriented format for blazing fast transactional reads and queries. The analytical storage engine is encoded in columnar format for fast analytical queries and scans and is fully updatable. The transactional storage engine is backed by local SSDs, and the analytical storage is stored on a cost efficient off-cluster SSD storage. They also integrated the Azure Cosmos DB Spark connector directly into Azure Synapse Analytics’ Spark capabilities, making it easy to have Azure Synapse query and operate over data in the Cosmos DB analytical storage (ideal for the use case of Cosmos DB being used as a hot updatable data lake as it ingests telemetry data). Check out the docs
- Azure Cosmos DB autopilot mode (preview) – With Autopilot mode customers select a throughput tier and Azure Cosmos DB will automatically scale RU/s up and down as needed to match capacity with demand. Customers can opt-in for tier upgrades that will automatically shift their account to the next highest tier if demand spikes beyond the upper limit of their current tier. This means customers won’t have to manage or adjust provisioned throughput up or down to match traffic patterns. This saves time, eliminates the need for custom scripting, and removes the risk of over- or under-provisioning throughput. When opted-in to tier upgrades, customers can also rest assured that their apps will never be rate-limited due to excess demand or insufficient throughput. Check out the docs
- GROUP BY support – Check out the docs
- More details and other new features listed at What’s new in Azure Cosmos DB: November 2019
Azure Data Share is now GA – Azure Data Share enables organizations to easily and securely share data with other organizations to expand analytics datasets for enhanced insights.
Azure Database for PostgreSQL – Hyperscale (Citus) is now GA – Scales out Postgres horizontally and is ideal for multi-tenant and SaaS applications that are growing fast—and for real-time analytics apps that need sub-second response times across billions of rows. Check out the docs.
Azure Database Migration Service new Hybrid Mode (preview) – Uses a migration worker hosted on-premises together with an instance of Azure Database Migration Service running in the cloud (the “cloud service”) to manage database migrations. The Azure Database Migration Service hybrid mode is especially useful for scenarios in which there is a lack of site-to-site connectivity between the on-premises network and Azure or limited site-to-site connectivity bandwidth. Check out the docs.
Azure Search – Has been renamed to Azure Cognitive Search. There are new built-in skills: translation, custom entity lookup (preview), and custom document cracking, making it simple to transform content in meaningful ways, such as translating text and identifying custom entities important to your business. Also, there are new data connectors for ADLS Gen2, Cosmos DB Gremlin API, and Cosmos DB Cassandra API.
Azure SQL Database enhancements:
- Azure Stream Analytics integration (preview) – Customers may now stream real time data into a SQL Database table directly from the Azure Portal using Azure Stream Analytics
- New hardware choices (preview) – M-series, now in preview, is a new memory optimized hardware option in SQL Database for workloads demanding more memory and higher compute limits than provided by Gen5. M-series provides 29 GB per vCore and 128 vCores which increases the previous memory limit in SQL Database by 8x to nearly 4 TB. Fsv2-series, now in preview, is a new compute optimized hardware option in SQL Database delivering low CPU latency and high clock speed for the most CPU demanding workloads. Depending on the workload, Fsv2-series can deliver more CPU performance per vCore than Gen5, and the 72 vCore size can provide more CPU performance for less cost than 80 vCores on Gen5. Fsv2 provides less memory and tempdb per vCore than other hardware so workloads sensitive to those limits may want to consider Gen5 or M-series instead. Check out the docs
- PowerApps integration (preview) – The Azure portal offers an entry point to create an app with PowerApps that is automatically connected to a table in a given database. Quickly create low-code robust customer experiences for any device using data from your SQL Database tables. Customers may also seamlessly work with their Azure based services, and benefit from rich extensibility enabling “no limits” development. This new entry point comes in addition to the creation, connection, and configuration experiences supporting SQL Database that already exist in the PowerApps Studio. This new experience starts from the context of a customers’ database, enabling them to create an app backed by SQL in minutes and navigate seamlessly between the SQL Database and PowerApps
- Azure SQL Database serverless now generally available – Serverless enables auto-scaling of compute and hands-off compute sizing, and pay for compute used on a per second basis
Azure Data Factory Wrangling Data Flows (preview) – Allows Data Engineers to do code-free agile data preparation at cloud scale via spark execution. It uses the Power Query data preparation technology (also used in Power Platform dataflows) to seamlessly prepare/shape the data. Check out the docs and video Data Exploration with ADF Data Flows.
Data Migration Assistant and SQL Server 2019 is GA – Customers can now upgrade their on-premises SQL Server to SQL Server 2019 by detecting compatibility issues that can impact database functionality on the new version of SQL Server. Data Migration Assistant recommends performance and reliability improvements for the customers’ target environment, and it enables the ability to move not only their schema and data, but also uncontained objects from their source server to their target SQL Server 2019. Check out the docs.
Azure SQL Database Edge (preview) – This is a small-footprint containerized database (< 500MB) running in ARM- and x64-based devices in a connected or disconnected environment. It is available as a Ubuntu container, which can run on a Linux host. It has built-in data streaming and time-series that you deploy from the Azure IoT Edge as a module. See the blog post.
Large models in Power BI Premium (preview) – Until now, dataset caches in Power BI Premium have been limited to 10 GB after compression. Large models remove this limitation, so dataset cache sizes are limited only by the Power BI Premium capacity size (up to 400GB). View the blog.
Microsoft Flow – has been renamed to Power Automate and had added Robotic Process Automation (RPA), called UI Flows, which records step-by-step actions such as mouse clicks, keyboard use, and data entry, then replay those actions and turn them into intelligent workflows using a simple, guided process. View the blog.
Power Virtual Agents – A no-code/low-code app that allows anyone to create and deploy intelligent AI-powered virtual agents (bots). View the blog.
More info: