Azure Data Factory v2 (ADF) has a new feature in public preview called Data Flow. I have usually described ADF as an orchestration tool instead of an Extract-Transform-Load (ETL) tool since it has the “E” and “L” in ETL but not the “T”. But now it has the data transformation capability, making ADF the equivalent of “SSIS in the cloud” since it has the ability to mimic SSIS Data Flow business logic.
Data Flow works without you having to write any lines of code as you build the solution by using drag-and-drop features of the ADF interface to perform data transformations like data cleaning, aggregation, and preparation. Behind the scenes, the ADF JSON code is converted to the appropriate code in the Scala programming language and will be prepared, compile and execute in Azure Databricks which will automatically scale-out as needed.
There is an excellent chart created by that shows the SSIS tasks and the equivalent ADF operation. There are currently 13 different dataset manipulation operations with more to come:
- New Branch
- Join
- Conditional Split
- Union
- Lookup
- Derived Column
- Aggregate
- Surrogate Key
- Exists
- Select
- Filter
- Sort
- Extend
The Data Flow feature in ADF is currently in limited public preview. If you would like to try this out on your Data Factories, please fill out this form to request whitelisting your Azure Subscription for ADF Data Flows: http://aka.ms/dataflowpreview. Once your subscription has been enabled, you will see “Data Factory V2 (with data flows)” as an option from the Azure Portal when creating Data Factories.
Follow Mark Kromer and the ADF team on Twitter to stay up to date on the rollout of the preview. Also check out the ADF Data Flow’s documentation and these ADF Data Flow’s videos.
Don’t confuse this with Dataflows in Power BI, they have nothing to do with each other.
More info:
Azure Data Factory v2 and its available components in Data Flows
WHAT IS AZURE DATA FACTORY DATA FLOW?