Azure Data Factory | Copy Files Using ADF Pipeline

Vinodh Kumar
4y
15.5k
0
8

Article

Introduction

In this article, we will look at our first hands-on exercise in Azure Data Factory by carrying out simple file copies from our local to blob storage. The steps have been given below with explanation and screenshots.

Create a storage account

After creating a storage account, create a container that will hold the data that we are going to work on. In simple terms, it's like a folder inside a bigger directory which would be useful for segregation

Copy files using ADF pipeline

Create a container using the ‘Containers’ option on the storage account overview page.

Copy files using ADF pipeline

I am going to create an input folder and upload the file we want to be copied. The file I have chosen is a CSV file containing around 13,303 rows of sample data containing addresses and names.

Sample data will look like this…

Copy files using ADF pipeline

The folder is created with a file inside it.

Once created we can open the file and view contents inside with the inbuilt editor. Please note that the data limit for a file to be previewed through the ‘Edit’ tab is 2.1MB.

Copy files using ADF pipeline

You can also get to see the ‘Preview’ button which helps you to view the data in tabular format, just in case you which to see it as is like CSV. Now we have created a storage account and created an input folder and also placed a file inside it which makes our input ready.

Create new resource Data Factory

I am simply creating a Data Factory resource with default parameters so no git configuration or advanced tabs should be looked into.

After clicking the azure data factory studio, you will be opened within a new tab in your browser next to an Azure portal where we will be carrying out further steps.

Click into the Edit (the pencil icon on the left side) mode in the data factory studio.

Copy files using ADF pipeline

As a first-level, we must create linked services through which the connection will be made between the source and the destination. I am going to select blob storage as we are dealing with CSV.

Copy files using ADF pipeline

After the linked service has been created, go back to edit mode to create the output dataset. I have selected Azure Blob Storage and Delimited text (since ours is a CSV file) as Storage and structure options respectively.

Choose the next steps using the browse option to locate the input file. The name can be given as per our choice for reference.

A similar step has to be carried out in creating the output folder and file name so that the copied data can be placed. One thing to note is you cannot browse the output folder/file as there won’t be any, you can name them here for it to be created.

Now that we have created both the input and output datasets and linked services to connect, let us move on to create the pipeline by clicking ‘New Pipeline’.

Copy files using ADF pipeline

Create output from the ‘Sink‘ tab.

Now we are all set to publish the pipeline but before that let's do some quick prechecks like validation and debugging. Validate option will help us to check for any errors or any missed configurations and Debug run will help to see if the data movement is happening.

Copy files using ADF pipeline

Debug is enough to complete your task if it's one time and you don’t want to use it in the future, whereas you have to publish the pipeline if you want them to reuse or schedule it for future use.

Now my debug has been completed successfully let’s go to the storage container to check if the file has been created.

We could see the 12303 rows that we used as sample input has been created onto the output folder.

Copy files using ADF pipeline

Point to note

When you are moving the file, since the ADF copies the contents of the file from the source to the destination instead of moving as a whole, there is no way one could maintain the timestamp of the file.

Conclusion

This is the very basic step for one who wishes to get started with the Azure data factory. We will look into real-time and more complex tasks in future posts.