Integrating Azure DevOps Git with Azure Data Factory

Integrating Azure DevOps Git with Azure Data Factory (ADF) is a powerful combination that enables seamless version control, collaborative development, and continuous integration/continuous deployment (CI/CD) for your data pipelines. In this article, we'll walk you through the steps required to set up and utilize Azure DevOps Git with Azure Data Factory.

Integrating Azure DevOps Git with ADF offers numerous benefits.

  1. Version Control: Manage changes to your data factory pipelines, datasets, and other resources with ease.
  2. Collaboration: Multiple team members can work on the same ADF project simultaneously, with changes tracked and merged effectively.
  3. CI/CD: Automate the deployment of data factory resources through CI/CD pipelines, ensuring consistent and reliable releases.
  4. Change Tracking: Track who made changes, what changes were made, and why, providing better accountability and traceability.

Prerequisites

Before you start, ensure you have the following.

  1. An Azure subscription with Azure Data Factory and Azure DevOps.
  2. A Data Factory instance was created in the Azure portal.
  3. An Azure DevOps project with a Git repository.

Step-by-step guide
 

Step 1. Create a new Azure Data Factory

  • Navigate to the Azure portal. Azure
  • In the left-hand menu, select Create a resource.
  • Search for Data Factory and select Azure Data Factory from the results.
  • Click Create and fill in the necessary details such as Subscription, Resource Group, and Region.
  • After filling out the details, click Review + Create and then Create to provision your Data Factory.
    DevOps Git

Step 2. Create or select an Azure DevOps project

  • Navigate to Azure DevOps.
  • Select your organization and create a new project or choose an existing one.
    Create Data factory
  • Inside your project, create a new Git repository if you don't already have one.
    Project

Step 3. Configure Git Integration in Azure Data Factory

  • Open your Azure Data Factory instance in the Azure portal and click Launch Studio.
    Azure Portal
  • In the Data Factory UI, go to Manage in the left-hand menu and then select Git configuration under the Source control section.
    Source Control
  • Click on Configure select Azure DevOps Git as the Git repository type and click continue.
    Learning Center
  • Authenticate with your Azure DevOps account and select the organization, project, and repository you want to connect to.
  • Specify the collaboration branch (usually main or master), and optionally set the root folder where the ADF artifacts will be stored in the repository.
    DevOps Account
  • Click Apply to save the configuration.

Step 4. Publish Data Factory to Git Repository

  • In the ADF UI, make some changes or create a new pipeline, dataset, or other ADF resource.
  • Click on Publish in the ADF UI toolbar. This will prompt you to save your changes to the configured Git repository.
    Git Repository
  • This will publish your changes into the Azure DevOps repository.

Step 5. Validate the changes in Azure DevOps

  • Go to the Repository in Azure DevOps and check whether changes from Azure Data Factory pushed or not.
     Azure Devops
  • A new branch "adf-publish" should also be created with ARM templates(Helpful for deploying ADF in other environments).
    ARM Templates

Conclusion

Integrating Azure DevOps Git with Azure Data Factory provides a robust framework for managing and deploying your data pipelines. By following these steps, you can ensure better version control, collaboration, and automation for your data integration and ETL processes. This integration not only enhances productivity but also ensures that your data workflows are reliable and scalable.


Similar Articles