This article will describe how the Power Query activity in Azure Data Factory (ADF) and Integration Services (SSIS) can be useful. In addition to that, I will share the differences of the Power Query activity between SSIS and ADF.
Why Power Query and When to Use It?
By using Microsoft’s cloud-based ETL tool, Azure Data Factory (ADF) you can build a pipeline to extract, transform and load data from a source to a target. ADF has many activities available, like the lookup activity, merge activity, data conversion activity, in addition to the Mapping data flow activity and the Databricks Notebook activity to help implement complex transformations. Moreover, ADF recently introduced 'Power Query' as an activity to perform interactive data exploration. Please note that Power query is still in preview for both Azure Data Factory (ADF) and SSIS. Fig 1 below shows the Power Query activity in ADF.
Fig 1: Power Query in ADF
Despite having many activities in Azure Data Factory, why do we need a Power Query activity? Let's share my experience where Power Query was chosen as an activity in a pipeline.
In this situation, the task was to get data from complex Excel files with many calculations and more than 1000 columns. These files were used by business as an application. A few calculated columns need to be loaded into a modern data warehouse from the Excel files. In this scenario, choosing the right activity was challenging. I could choose from Mapping data flows, Databricks Notebook, or Power Query activities. All of them might have worked, but Power Query was the right choice. Let me explain why.
The source file was Excel and had more than 1000 columns with many calculations inside, so it was difficult for a Data Engineer to find out how to derive the expected outcome where no mapping or transformation logic was provided. You can work directly inside the Power Query mash-up editor to perform interactive data exploration, visualize the outcome, and save. This meant that finding solution by using the Power Query activity was faster comparative than choosing other activities in ADF. A business expert and I were able to work closely and produce the output in a very short period of time by using the Power Query mash-up editor (see Fig 2).
Fig 2: Power Query mashup editor in ADF
When we work with a transformation activity in ADF or any ETL tool, we follow a mapping document provided by the business to build the pipeline. This means transformation rules and mappings are predefined. However, when transformation rules have not yet been determined, it is faster to work with the Power Query activity since performing interactive data exploration and visualization is provided through the activity.
You can start with Power BI Desktop to work together with the business to produce the expected outcome, as Power BI Desktop generates M syntax, which is called M Query. When the output is verified and accepted, you can then copy the M Query to an ADF Power Query activity or SSIS Power Query source. In fact, the M query you produced holds the transformation rules, so if you like to implement the same transformation by using a dataflow activity or Databricks, you can do that, too.
What works in SSIS but not in ADF?
The Power Query activity is in preview for both SSIS and ADF. However, if you choose ADF then you need to convert the source file from .xlsx to .csv. The Power Query activity for ADF doesn't support .xlsx as a source dataset. Fig 3 shows the Power Query activity only supports the .csv and .parquet data formats. However, Power Query in SSIS supports Excel as a source.
Fig 3: Source dataset for Power Query in ADF
It is also worth knowing that when you work with the Power Query Source in SSIS, it doesn't have graphical user interface or mashup editor that exist in the Power Query activity in ADF. The reason is you can use Power BI Desktop to build the transformation and then copy the M query and paste it into the Power Query Source in SSIS, as shown in Fig 4.
Fig 4: Power Query in SSIS
In summary, Power Query in both SSIS and ADF is useful Activity and new feature (in preview) and there might be many different scenarios where you want to use Power Query activity. However, this article is based on my experiences with Power Query activity in ADF and SSIS. It's also interesting that the user interface offer in ADF Power Query is identical to Power BI; however, not all M query is supported by ADF Power Query yet.