This post will describe how you use a CASE statement in Azure Data Factory (ADF). If you are coming from SSIS background, you know a piece of SQL statement will do the task. However let's see how do it in SSIS and the very same thing can be achieved in ADF.
Problem statement
For my simple scenario, when the PortfolioTypeCode has a value of either 'Mutual Fund' or 'Pooled Fund', my package should return a 1, otherwise it should return 0.
In SSIS, under data flow you will have an OLEDB source, like shown below in Fig 1.
We then open the OLEDB source and then write a T-SQL command, like the one below and we are done:
SELECT Col1, CASE WHEN PortfolioCode IN('Mutual fund','Pooled fund') THEN 1 ELSE 0 END IsFund, Col2 From Table1
How do you implement this in ADF?
In ADF, to achieve the same goal, you need to use Expressions. ADF has the very same concept of a data flow, like SSIS. In the data flow, after the source dataset is established you can add a 'Derived Column' activity, shown below in Fig 3:
Now you can give a the new column a name and then add the expression shown in Fig 4.
As you have seen at the beginning of the article When you write CASE statement in TSQL , your syntax look like below:
CASE WHEN PortfolioCode IN('Mutual fund','Pooled fund') THEN 1 ELSE 0 END IsFund
However, when it comes to ADF expression, then it needs to translate like below:
case( PortfolioTypeCode=='Mutual Fund',1, PortfolioTypeCode=='Pooled Fund',1,0)
Let's talk about in details about case expression in ADF, the case expression takes 3 arguments: case(condition,true_expression,false_expression). From the above code, when PortfolioTypeCode=='Mutual Fund', if the condition is True, then it will return 1. If this is false, we have added another condition: PortfolioTypeCode=='Pooled Fund'. If this condition is True, then the expression will return 1. If both conditions are false, then the expression will return 0.
We have explained the case expression above and the below Fig 5 shows, how the code look like in ADF.
In SSIS whenever you work with data flow and choose OLEDB source, you can write T-SQL code since underneath it's SQL engine, however; for ADF data flow, spark engine is running behind the scene. ADF data flow expression is pretty much powerful, please find more details about the case expression from Microsoft document.