Azure Data Factory (ADF) 🤠

Prerequisite Knowledge

 
Before we start with the understanding of Azure Data Factory, we should have:

Background

 
Almost all  organizations store data into database systems since data is very important. This data can be raw data, organized or unorganized data. It is very difficult and sometimes not possible to get insights from raw and unorganized data for data scientists to help make business decisions.
 
Different applications can have the same or different database management systems with the same or different data models. In large enterprise applications, it is important to integrate the disparate data systems, transform the data or transfer the data and load the subset of data or complete data in to another system. This refined data can be used as business intelligence (BI). This helps businesses to decide on their strategies and adds value to business goals.
 
Azure Data Factory is a managed cloud service that's built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.
 
 
Image Source: Microsoft Docs
 

Introduction and how it works

  • Azure Data Factory (ADF) is a service from Microsoft Azure that comes under the ‘Integration’ category.
  • This service provides service(s) to integrate the different database systems.
  • ADF is like a SSIS used to extract, transform and load (ETL) the data.
  • ADF can transform structured, semi structured and unstructured data.
 
Image Source: Microsoft Docs
  • ADF can connect to the cloud data sources as well as to on-premise data source with the help of data management gateways.
  • Once we connect and load the data then we can process/transform the data by using Hive pig, C# activities.

    • ADF doesn’t have drag and drop feature like SSIS.

  • Sets of processing activities can combine into a pipeline (also called as workflows) and we can schedule the pipeline as per our need.
  • We can immediately view the pipelines activities with data in the Azure portal with dashboards.
  • This dashboard consists of visual layouts of pipeline and data input/outputs.
  • With the help of dashboards, we can view relationships of the data, dependencies, how data is processing at the backend.
  • We can monitor the execution using Azure monitor logs and its API’s, PowerShell, health panels in portal.
  • We can use the various tools to create the ADF,
    • Using Azure portal
    • PowerShell
    • Visual Studio – Azure .NET SDK
    • REST API

Azure Data Factory Tangible Benefits

  • Integrate structured, semi structured and unstructured data with cloud platform.
  • Easily perform the ETL, ELT code free or using custom business rules.
  • Cost-efficient and fully managed serverless cloud data integration tool that scales on demand.
  • Can connect and integrate to cloud, on-premise and software as system platforms.
  • SSIS integration runtime to easily move SSIS ETL workloads into the cloud with minimal effort.
  • Reduce overhead cost – Advantage of existing investments of SSIS and move SSIS workloads to the cloud with negligible effort.
  • Best solution for complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.
  • ADF has prebuilt connectors to transform the data.
  • Use the visual interface or write your own code in Python, .NET or ARM to build pipelines.
  • We can integrate the Azure DevOps with ADF for visual monitoring and alerts.
Reference Links
  • https://azure.microsoft.com/en-in/resources/videos/azure-data-factory-overview/
  • https://azure.microsoft.com/en-in/services/data-factory/
  • https://docs.microsoft.com/en-us/azure/data-factory/introduction
  • https://www.jamesserra.com/archive/2014/11/what-is-azure-data-factory/
  • https://blog.5nine.com/what-is-azure-data-factory-and-how-can-it-help
  • https://azure.microsoft.com/en-in/services/devops/

Conclusion

 
In this article we have learned about Azure Data Factory, how it works and its services. Keep Learning