How To Get Started With Azure Data Factory

As we all know that data is the new oil in the world, but it is more than that. The data projection and insights generated can make or break a company’s prospects. Every organization will face challenges in some form in any or all the below actions.

  • Acquiring/data procurement
  • Storing and archiving the data / Warehousing
  • Transforming into insights / ETL

These three are very important and basic responsibilities of any Database/BI team in a company. The data they get will be from disparate sources, it should be made sure to be integrated and meaningful transformation has been made. The visual insights obtained after transformation will help the management decide the strategies and set achievable goals.

How useful is the real-time scenario?

For example, HSK Ltd is one of the largest grocery retail stores. The company obviously will try to collect terabytes of data produced by the purchases in the stores and wants to analyze them to gain insights into customer preferences, demographics, and behavior. This will help them serve the products based on the target audience which can drive business and makes customers happy. For this, the company needs to cross-reference data like customer details which are stored in the on-premises data store and must be combined with our collected log data in the cloud data store. For the main part, to gain insights it must process the joined data and publish the transformed data using few other Azure services like Azure synapse and HDInsight and then build a report on top of it. This can be scheduled to run on a daily basis as well.

All of these can be taken care easily by the azure data factory. The best thing is all of these can be achieved without any requirement of coding as part of code-free ETL as a service and is serverless!

How to get started with Azure Data Factory
Image source: Microsoft docs

ADF Functions

Collect

The primary step in building a system is by acquiring data from different sources to process them.

Transform

After data is available in the cloud datastore transformation is initiated. One can prefer to code or use azure tools to build and maintain data flows.

Monitor

To monitor the scheduled activities and pipelines. Many built-in supports via an azure monitor, PowerShell, Azure monitor logs, etc are available.

How to get started with Azure Data Factory

Top Level Concepts

Pipelines

The pipeline is a single unit of the larger part of work, together which performs a task

Activities

In simple words activity is a process step that we configure for completing a task, these are the real actions which we expect. Activities can take an input and produce our desired outcome as a dataset

Datasets

This points out the data we want to use for the activity as input/output

Linked services

Linked services are similar to connection strings. They hold the connection information for the Azure data factory to connect the external sources

Triggers

Triggers act like a scheduler and make sure the execution process step is started

Conclusion

This is an introduction to Azure Data Factory. In the upcoming articles, we will look more into practical and real-time activities.