Microsoft has a ton of data platform-related products, but there are certain areas where they either don’t have a product or what they have is limited and you need to look at a 3rd-party product to fill that gap. At the company I work at, EY, we are building a data fabric on Azure and I have listed below the areas that we have had to look at other products outside the Microsoft realm:
- Master Data Management (MDM): Microsoft has Master Data Services (MDS), but it is for lightweight MDM needs and has not had any new features in quite a while and requires SQL Server. Microsoft usually recommends Profisee instead. Other options: Informatica, Tamr, boomi, Riversand, Semarchy
- Data Quality: Microsoft has a data quality product called Data Quality Services (DQS), but it does not seem to be supported anymore and is limited in features and also requires SQL Server. Instead, if you are using an MDM tool like Profisee it has built-in data quality features or look at other options: Informatica Cloud Data Quality, Talend Data Quality
- Data virtualization: Microsoft has sort of a “light” version of virtualization with their Serverless SQL pool in Azure Synapse Analytics, which can query remote data stores. It currently only supports querying data in the Azure Data Lake (Parquet, Delta Lake, delimited text formats), Cosmos DB, or Dataverse, but hopefully more will come in the future. Power BI and DirectQuery is also a light version of virtualization (see Data Virtualization in Microsoft Power BI and supported DirectQuery sources). For full virtualization software, check out: Denoto, Dremio, Starburst, Fraxses, Stratio
- Data Catalog: Microsoft has a nice product in this area called Purview, but it is not yet GA. If you need a GA product or one that has been around a while, check out: Informatica, Waterline data, Alation, Collibra, Amundsen, Databricks Unity Catalog (not GA), erwin Data Intelligence, Apache Atlas, data.world
- Attribute-based access control (ABAC): ABAC for security is becoming more popular but Microsoft has limited support for it (see What is Azure attribute-based access control (Azure ABAC)? (preview)). Hopefully ABAC will be added to Purview, but for now look at: Immuta, Okera. For an excellent paper to see the benefits of ABAC over RBAC check out GigaOm Report: Immuta vs. Apache Ranger
- Multi-master cluster warehouse: Basically this means you can have multiple compute clusters all accessing the same database, as opposed to a cluster only able to access one database that it is assigned to (i.e. five clusters all accessing databaseA, instead of cluster1 only accessing databaseA, cluster2 only accessing databaseB, etc). This functionality was demo’d in Azure Synapse Analytics quite a long time ago (see Azure Synapse Analytics & Power BI concurrency), but is still not available yet. Snowflake does have this feature and it is quite popular
Note these are just some of the products for each category based on my knowledge. Please leave a comment for products that I have missed that you like!
The post Data Platform products for Microsoft gaps first appeared on James Serra's Blog.