Introduction
In this article, we will look at our first hands-on exercise in Azure Data Factory by carrying out simple file copies from our local to blob storage. The steps have been given below with explanation and screenshots.
Create a storage account
After creating a storage account, create a container that will hold the data that we are going to work on. In simple terms, it's like a folder inside a bigger directory which would be useful for segregation
Create a container using the ‘Containers’ option on the storage account overview page.
I am going to create an input folder and upload the file we want to be copied. The file I have chosen is a CSV file containing around 13,303 rows of sample data containing addresses and names.
Sample data will look like this…
The folder is created with a file inside it.
Once created we can open the file and view contents inside with the inbuilt editor. Please note that the data limit for a file to be previewed through the ‘Edit’ tab is 2.1MB.
You can also get to see the ‘Preview’ button which helps you to view the data in tabular format, just in case you which to see it as is like CSV. Now we have created a storage account and created an input folder and also placed a file inside it which makes our input ready.
Create new resource Data Factory
I am simply creating a Data Factory resource with default parameters so no git configuration or advanced tabs should be looked into.
After clicking the azure data factory studio, you will be opened within a new tab in your browser next to an Azure portal where we will be carrying out further steps.
Click into the Edit (the pencil icon on the left side) mode in the data factory studio.
As a first-level, we must create linked services through which the connection will be made between the source and the destination. I am going to select blob storage as we are dealing with CSV.
After the linked service has been created, go back to edit mode to create the output dataset. I have selected Azure Blob Storage and Delimited text (since ours is a CSV file) as Storage and structure options respectively.
Choose the next steps using the browse option to locate the input file. The name can be given as per our choice for reference.
A similar step has to be carried out in creating the output folder and file name so that the copied data can be placed. One thing to note is you cannot browse the output folder/file as there won’t be any, you can name them here for it to be created.
Now that we have created both the input and output datasets and linked services to connect, let us move on to create the pipeline by clicking ‘New Pipeline’.
Create output from the ‘Sink‘ tab.
Now we are all set to publish the pipeline but before that let's do some quick prechecks like validation and debugging. Validate option will help us to check for any errors or any missed configurations and Debug run will help to see if the data movement is happening.
Debug is enough to complete your task if it's one time and you don’t want to use it in the future, whereas you have to publish the pipeline if you want them to reuse or schedule it for future use.
Now my debug has been completed successfully let’s go to the storage container to check if the file has been created.
We could see the 12303 rows that we used as sample input has been created onto the output folder.
Point to note
When you are moving the file, since the ADF copies the contents of the file from the source to the destination instead of moving as a whole, there is no way one could maintain the timestamp of the file.
Conclusion
This is the very basic step for one who wishes to get started with the Azure data factory. We will look into real-time and more complex tasks in future posts.