Introduction
Azure Data Factory is an overseen managed cloud service that is worked for these perplexing hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects.
What is Azure Data Factory all about?
Azure Data Factory is the stage that tackles data situations. It is the cloud-based ETL and data integration service permitting data-driven work processes for arranging data development and changing data at scale. With Azure Data Factory, pipelines (schedule data-driven workflows) can ingest data from unique data stores. You can assemble complex ETL measures that change data outwardly with information streams or by utilizing register administrations like Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.
To open an ADF that has already been created, you need to open adf.azure.com directly or you can first login on portal.azure.com and go to the data factory as shown below image.
How about if we explain this to you using an example?
For instance, envision an organization that gathers petabytes of logs that are delivered in the cloud. The organization needs to dissect these logs to acquire experiences and insights into client inclinations, socioeconomics, and utilization conduct. It likewise needs to distinguish up-sell and strategically pitch openings, creates convincing new highlights, drive business development, and give a superior encounter to its clients.
How to analyze the blogs? To understand these logs, the organization needs to utilize reference data like client data, product data and showcasing effort data that is in an on-premises information store. The organization needs to use this information from the on-premises information store, consolidating it with extra log data that it has in a cloud data store. To separate experiences, it desires to deal with the joined information by utilizing a Spark cluster in the cloud and distribute the changed data into a cloud data stockroom like Azure Synapse Analytics to handily fabricate a report on top of it. They need to robotize this work process, and screen and oversee it on an everyday plan. They additionally need to execute it when documents land in a blob store compartment.
Data Integration service
Data integration includes the assortment of data from at least one source. After that, it incorporates an interaction where the data might be changed and purified or might be increased with extra data and arranged. Finally, the consolidated data is put away in a data platform service that manages the kind of investigation that we need to perform.
This interaction can be computerized by ADF in a game plan known as Extract, Transform, and Load (ETL).
Since you know ETL means Extract, Transform, Load.
What is Extract?
In this extraction cycle, data engineers characterize the data and its source. Data source identified source subtleties like the membership, resource group, and identity data, for example, secretor a key.
How can data be defined?
Data can be defined as data by utilizing a bunch of documents, a database question, or an Azure Blob storage name for blob storage.
What is meant by Transform?
Data transformation tasks can incorporate joining, parting, adding, determining, eliminating, or turning sections. Map fields between the data objective and the information source.
What is Load?
Azure objections can take data organized as a record, JavaScript Object Notation (JSON), or mass during load. In a testing climate, ETL work can be tested. At that point move the work to a creative climate to stack the production framework.
ETL tools
Azure Data Factory gives roughly 100 endeavor connectors and vigorous assets for both code-based and sans code clients to achieve their information change and development needs.
Orchestration
In some cases ADF will teach another help to execute the genuine work needed for its sake, for example, a Databricks to play out a change question. ADF barely arranges the execution of the question and afterward sets up the pipelines to move the data onto the objective or subsequent stage.
Copy Activity in Azure Data Factory
In ADF, we can utilize the Copy action to duplicate data between data stores situated on-premises and in the cloud. After creating a duplicate copy, we can utilize different exercises to additionally change and investigate it. We can likewise utilize the DF Copy action to distribute change and study results for business intelligence (BI) and application utilization.
Monitor Copy Activity
After making and publishing a pipeline in ADF, we can connect it with a trigger. We can screen the entirety of our pipelines runs locally in the ADF user experience. To screen the Copy action run, go to your DF Author and Monitor UI. A rundown of the pipeline runs on the Monitor tab page, click the pipeline name connect to get to the rundown of movement runs in the pipeline run.
Delete Activity In Azure Data Factory
Before you are erasing them with the Delete activity on the off chance that you wish to reestablish them later on. Back up your files. Data Factory needs to compose authorizations to erase documents or folders or from the capacity store.
How Azure Data Factory (ADF) work?
Connect and Collect
Undertakings have data of different sorts like organized, unstructured, and semi-organized. The initial step gathers all the data from an alternate source and afterward moves the data to a concentrated area for ensuing handling. We can utilize the Copy Activity in an information pipeline to move data from both cloud source and on-premises data stores to an incorporated data store in the cloud.
Transform and Enrich
After data is accessible in an incorporated data store in the cloud, change, or interact with the gathered information by utilizing ADF planning information streams. ADF upholds outside exercises for executing our changes on figure administrations like Spark, HDInsight Hadoop, Machine Learning, Data Lake Analytics.
CI/CD and Publish
ADF offers full help for CI/CD of our information pipelines offers full help for utilizing GitHub and Azure DevOps. After the raw data has been refined, ad the data into Azure SQL Database, Azure Data Warehouse, Azure CosmosDB.
Monitor
ADF has implicit help for pipeline observing through Azure Monitor, PowerShell, API, Azure Monitor logs, and wellbeing boards on the Azure entry.
Pipeline
A pipeline is a logical grouping of activities that execute a unit of work. Together, the activities in a pipeline execute a task.
Summary
Hope you understand about the azure data factory and how does it work. In the next article, I will show you how to create an Azure data factory and how to copy data from one source to the destination through the Azure data factory. So, be with us. Thanks for reading. have a good day.