ETL Using Azure Data Factory

Asif Khan
6y
24k
0
3

Article

Azure Data Factory is a service which has been in the Azure ecosystem for a while. But recently, with version 2 of the service, Azure is reclaiming the integration space. Data factory in simple words can be described as SSIS in the cloud (this does not do justice to SSIS, as SSIS is a much more mature tool compared to Data factory. But Data factory is the same idea).

Since Microsoft is showing more inclination towards a cloud-first strategy, they are moving most of their flagship products to the cloud, such as Office, SQL Server, and now SSIS in the form of Data factory.

In this short demo, I will show you how you can use Data Factory to process JSON files and upload them in SQL Azure.

Login to Azure portal and create a new Data Factory
Click author and monitor, this will open DF User Interface.
Select the Author menu item to get the designer.
Create a source and destination dataset. In our demo source is a JSON file which we will place in blob storage and the destination is SQL Azure table.
I have created a simple user.json file with sample user data as below, and uploaded it to Blob storage.
I have also created a table in SQL Azure database with similar fields as in JSON file.
Now, let us create the source dataset. From the long list of dataset options as below, we will select blob as our user.json file is available there.
For the Link Server option we will have to provide details to connect to our blob storage where a user.json file is placed.
Setup the connection tab, as below. Column name and JSON expression will be auto-generated as below,
Go to the schema tab and click Import Schema. This will expose the schema to copy activity in a pipe we will see later,
Next, we will create a destination dataset. The same way as source dataset, we will select SQL Azure to set up a destination dataset.
Configure destination dataset; i.e., set the connection details so that DF can insert data extracted from json file into SQL Azure table.
Select the target table in the database where DF will push data from the connection tab. Go to schema tab to import table schema. This will be used by copy activity to map the schema we imported in step 10 to Table schema.
Create a new pipe with Copy Activity.
Configure the source for the copy activity.
Configure destination (Sink) for the copy activity.
Configure Mapping for copy activity
Click Publish to save the configurations,
Once publish is completed, click the Debug button to run the data factory. During execution, DF will read the Json, buffer the data, and insert the records in a batch to the user table.
Once completed a green tick will appear against the RunId.
Now, we can verify the table in SQL Azure db to validate data. You can see the data available in JSON file is now ported to SQL Azure table.

Summary

Data Factory is an awesome tool to execute ETL using a wide range of sources such as Json, CSV, flat file, etc., to a wide range of destinations such as SQL Azure, Cosmos DB, AWS S3, Azure Table storage, Hadoop, and the list goes on and on. I highly recommend Data factory to be considered for any ETL use case. Also, there are a lot of options to plan error reporting and notification.