Introduction
We can use the delete activity in Azure Data Factory to delete files from both on-premises and cloud storage. In this article, we will discuss the delete activity with the various options available for the file deletion. I will show the following operations with the Delete Activity:
- Delete files from a folder.
- Delete contents in the folder and folder itself.
- Delete specific file types in a folder.
- Delete a single file using a wildcard file name.
- Filter files using the last modified date.
- Delete a set of files in a folder.
- Deleting files from the subfolder.
- Logging in Delete Activity.
Delete files from a folder
In this example, We will delete all the files from Folder1. I have six files in the data lake folder. There are 3 csv and 3 text files:
Let's take a look at the dataset properties below:
The delete activity has these options in the source tab:
- Dataset - We need to provide a dataset that points to a file or a folder.
- File Pathtype - It has three options:
- Filepath in dataset - With this option source file will be selected from the dataset.
- Wildcard file path - We need to select this option when we want to delete source files with wildcard file names.
- List of files - We can use a list of filenames to delete from the source folder.
- Filter by last modified - We can optionally files can be filtered by start time and end time.
- Recursively - This option determines whether files need to delete from the current folder or the subfolder.
- Max concurrent connections - Here we can specify a maximum number of the connections to delete files or a folder.
Now in the file path type option below we will use the file path in dataset option so that it will delete all the files in folder1.
Let's run the pipeline:
As expected all files are deleted from folder1 but not the folder, Folder1, itself.
Delete contents in folder and folder itself
In this example, we will delete all contents in the folder and the folder itself. To delete the folder and its content, we need to check the recursively option, as shown below:
Let's run the pipeline:
This time Folder1 is deleted along with all its content:
Delete specific file types in a folder
In this example, we will delete only text files from the source folder. I reset the source files in the data set. Now I have 6 files,3 csv files, and 3 text files.
Let's change the delete activity source settings, using a wildcard. In the wildcard file name, we are using *.txt to delete all text files:
Let's run the pipeline:
Now all three text files are deleted from the folder Folder1 and the other 3 files are still there :
Delete a single file using a wildcard file name
In this example, we will delete a single file using a wildcard file name. I reset the source files in the data set. Now I have 6 files,3 csv files, and 3 text files.
I enter Filename File1.csv in the Wildcard file path:
Now, I run the pipeline :
As expected, File1.csv has been deleted and the five files are still available in the data lake folder:
Filter files using the last modified date
There is an option available to filter files by start time and end time :
In folder, Folder1, three of the files are from 22nd August and three of the files are from 24th August.
Let's use the filter by last modified date in the dataset, When we click on the box, it will open a calendar to select date and time.
I have selected 08/21/2021 as the start date and 08/23/2021 as the end date. So, three files will be deleted as per this date filter.
Now running the pipeline:
As expected three files are deleted from the folder:
Delete a set of files in a folder
In this example, I will delete two files File1.txt and File1.csv. We can create a list with the list of filenames that we want to delete. Here I created a text file, Delete_File_List.txt, which has two file names within it, File1.txt and File1.csv:
Let's change the Delete activity. In the File path type, I selected the option - List of files. Below that, I provided the path to the file list:
I run the pipeline :
As expected, File1.text and File1.csv are now deleted:
Deleting from SubFolder
In this example, we will delete files from SubFolder.
Now I created a subfolder within Folder1 and copied existing files into the subfolder, like below structure -
Folder1 has six files and one subfolder as shown below:
SubFolder1 has six files as shown below:
In the Source option I selected File path type as "File path in the dataset" and left the recursively option unchecked:
I run the pipeline:
As a result, all files are deleted from Folder1 and nothing is deleted from SubFolder1. This image shows what happened:
Let's check the recursively option now :
Now run the pipeline again:
Now all files are deleted from the folder Folder1 and subfolder SubFolder1 along with the folders themselves. This images shows all files deleted.
If a file or folder does not exist
In case the file or folder that is specified in the Delete Activity dataset is not available, the activity will execute and will not return any error.
Logging in Delete Activity
The Delete Activity has an option to log deleted files or folders. Let's see the options below:
- Enable logging - Must be checked to enable logging
- Linked Service - Linked service to store the log file. This must be Azure Storage, Azure Data Lake Storage Gen1, or Azure Data Lake Storage Gen2
- Folder path - (Optional) Path to store the log file. If we don't specify the path, the service will create a container.
The logfile has four columns:
- Name -File/Folder Name.
- Category - Whether it is Folder or File.
- Status - Show deleted status.
- Error - Error message if any.
Let's see some sample log files after executing the delete activity. In the below log file, the folder is deleted and there is no error message.
Another example below, the file was not available for deletion and so the log table shows an error message - File does not exist.
Conclusion
In this article, I shared different ways of using Delete Activity in Azure Data Factory. We can use this activity to clean up unnecessary files and folders from could and on-premises locations. Please note that once file is deleted it can not be recovered unless soft delete is enabled at storage location.