Introduction
Today we will learn a real-time scenario on how to extract the file names from a source path and then use them for any subsequent activity based on its output. This might be useful in cases where we have to extract file names, transform or copy data from CSV, excel, or flat files from blob, or even if you want to maintain a table record which explains from where does the data came from.
As a first step, I have created an Azure Blob Storage and added a few files that can used in this demo.
Activity 1 - Get Metadata
Create a new pipeline from Azure Data Factory
Next with the newly created pipeline, we can use the ‘Get Metadata’ activity from the list of available activities. The metadata activity can be used to pull the metadata of any files that are stored in the blob and also we can use that output to be consumed into subsequent activity steps.
I have clicked and dragged the Get Metadata activity onto the canvas and then renamed it as File_Name.
Create a Linked service pointer to the dataset if you have not done that already.
Once linked server is selected you have to create a new Field list. The Filed list gives you the option to loop through the contents inside the storage folder when using the Child Items dropdown. The Child Items is obtained from the JSON output of the metadata activity, which is an important part. Similarly there are other options available which can be used based on your requirement.
Once done now it’s time to use ForEach parameter to loop through each filename and copy that into the output. Make sure to check box the sequential parameter as it will help to iterate files one by one.
The Items is where you will pass the filenames as array and then foreach loop will take over to iterate and process the filenames.
Use the ChildItems as an array parameter to loop through the filenames -follow the below steps sequentially.
@activity(‘File Name’).output.childItems
Now select the Activities tab in the ForEach and click on edit Activity. This is where you will mention the activities that has to be performed. In this demo, we will copy the filenames to our destination location.
There are Source and Sink tabs which are self-explanatory that it points source and destination. But you simply cannot select the linked service that we already created into the source for the fact that this is the second part of this demo and its activity is to copy dynamic filenames that are output from step one. Hence create a new dataset and select the source azure blob storage location up to the folder only leaving the file name field to be parameterized dynamically.
Activity 2 - ForEach - File Copy
Now move on to the ‘Parameters’ tab to create a dynamic parameter called FileName, which can be referred to in the FilePath on ‘Connection” tab.
Now going back to the pipeline you could see that our newly created FileName parameter is visible in the dataset properties. Click and create another parameter to extract the filenames from the storage using @item().Name dynamic parameter.
Now with the datasource configuration has been completed move on to configuring the Sink, the destination folder. Refer to the folder from the source azure blob location or type the folder name which you want the sink to create automatically if not available. Make sure to set the Import Schema to ‘’None’, else it might throw an error.
It is now the time to test our pipeline. Go to the pipeline validate and run it and view the output tab for the results.
We could see the pipeline ran successfully copying the files iteratively. See the azure blob storage folder where all the files have been copied from source to destination.
Summary
The ‘Activity 2’ of this article; the ‘File Copy’ is only a subsequent activity which is based on output from ‘Get Metadata’ activity. I chose file copy for this demo you can select any activity of your choice.
References
Microsoft Official Documentation