How To Deploy Data Pipelines In Azure Data Factory Using CI/CD

Introduction

In this article, we will explore how to deploy Azure Data Factory's data pipelines using CI/CD. Continuous integration (CI) enables us to build and test our code as soon as it is ready. Continuous deployment (CD) provides a way to deploy our changes to different environments. The combined process allows data pipelines to be deployed automatically into the Azure Data Factory live mode.

Challenges

In our earlier article, we integrated Azure Data Factory with Azure DevOps' git repository. With this integration, we have to click on the 'Publish' button in ADF to deploy the data pipeline changes into the publish branch and the live mode. The goal of CI/CD is to automate this process and deploy the changes to the live mode as soon as it is ready for testing.

Let's go over the Azure DevOps pipeline concepts first:

  1. Stages: Stages are used to group major actions together. In our case, we have 2 stages: a CI stage and a CD stage.
  2. Jobs: Within each Stages, we have Jobs. Each job has to be self-contained as they can be sent to different Build server for execution.
  3. Steps: A step is to perform an individual action. Each job contains one or more steps.

Tutorial
 

1. Create service connection

Before we can dive into the DevOps pipelines, we need to create a Service connection so DevOps pipeline can make changes to Azure Data Factory. I have linked the Azure article here.

2. Create DevOps pipeline

  1. In Azure DevOps, click on Pipelines menu then click on 'New Pipeline'. A 'New pipeline wizard' appears.


     
  2. Under 'Connect' section, we will select 'Azure Repos Git'. 


     
  3. Under 'Select' section, let's select our 'Demo' repository. 


     
  4. Under 'Configure your pipeline' section, select 'Starter pipeline'. 


     
  5. Under 'Review' section, we can see the sample code for a starter pipeline: 

    1. trigger: This trigger shows any changes in 'main' branch of our repo will start a new build.
    2. Inside the steps, we have 2 actions. The first action will print out 'Hello, world!' and the second action will print out 'Add other tasks to build, test, and deploy your project.'
    3. Let's click on 'Save and run' and commit our changes.
       
  6. After saving our changes, a build has successfully completed. 

Define the variables

Before we start with the Build stage, lets remove all the steps and create 'variables' in the DevOps pipeline. The variables will be used throughout this pipeline.

We have defined 5 variables. The value for the first 4 variables are found from our Azure Data Factory page in Azure Portal,

The adfName, resourceGroupName, and subscriptionId are used to create a new variable called adfResourceId. This is how Azure uniquely identifies our ADF instance. The main benefit of creating the first 4 variables allow us to reuse this source code by change the variables for any new projects requires Azure Data Factory.

trigger:
- main

pool:
  vmImage: ubuntu-latest

variables: 
  - name: adfName
    value: sb-dp-adf-project1

  - name: resourceGroupName
    value: sandbox-dataPlatform-project1

  - name: adfLocation
    value: Canada Central

  - name: subscriptionId
    value: [redacted]

  - name: adfResourceId
    value: /subscriptions/$(subscriptionId)/resourceGroups/$(resourceGroupName)/providers/Microsoft.DataFactory/factories/$(adfName)

Implement Build (CI) stage

In the Build stage, our goal is to retrieve the files from the 'main' branch of the git repository and generate the ARM templates for the Deployment stage.

This stage consists of 5 main processes:

  1. Find the Build stage and the job name. Recall that Stages contains Jobs and Jobs contains Steps.
    stages: 
    -stage: Build_Adf_Arm_Stage
    jobs: 
    -job: Build_Adf_Arm_Template
        displayName: 'ADF - ARM template'
        steps:
  2. The next step, we need to install the dependencies. Azure provides a tool call ADFUtilities. This package is used to validate and create the deployment template. In order to install this package, we need to install Nodel.js and npm package management. The 'Build.Repository.LocalPath' is a global variable provided by Azure DevOps to indicate where the source code will be located.

    - task: NodeTool@0
           displayName: 'Install Node.js'
           inputs:
                 versionSpec: '10.x'
      - task: Npm@1
           displayName: 'Install npm package'
             inputs:
                 command: 'install'
                 workingDir: '$(Build.Repository.LocalPath)/build/'
                 verbose: true
    
  3. We need to validate our source code using our Azure Data Factory. To do this, we will call the 'validate' function. By default, our code is checkout in the '$(Build.Repository.LocalPath)' folder and we stored our adf source code under adf-code directory. The working directory is where the ADFUtilities is installed.
    -task: Npm @1
    displayName: 'Validate Source code'
    inputs: 
        command: 'custom'
        workingDir: '$(Build.Repository.LocalPath)/build/'
        customCommand: 'run build validate $(Build.Repository.LocalPath)/adf-code $(adfResourceId)'
  4. Generate ARM template from source code using our Azure Data Factory. We are using the 'export' function and output the ARM template in the 'armTemplate' folder inside the workingDirectory.
    -task: Npm @1
    displayName: 'Generate ARM template'
    inputs: 
        command: 'custom'
        workingDir: '$(Build.Repository.LocalPath)/build/'
        customCommand: 'run build export $(Build.Repository.LocalPath)/adf-code $(adfResourceId) "armTemplate"'
  5. Finally, we can publish our ARM template as DevOps pipeline artifact. This will create a zip file with the artifact name 'sb-dp-adf-project1-armTemplate'.
    -task: PublishPipelineArtifact @1
    displayName: 'Publish ARM tempate'
    inputs: 
        targetPath: '$(Build.Repository.LocalPath)/build/armTemplate'
        artifact: '$(adfName)-armTemplate'
        publishLocation: 'pipeline'

Add the package.json file

Before we can save and run the pipeline, we have create a 'package.json' file. This file will contain the direction to obtain the ADFUtilities package:

  1. In the repository, we need to create a 'build' folder.
  2. Inside the folder, we will create 'package.json' file.
  3. This npm package will use this JSON file find the ADFUtilities package. 


     
  4. Save and commit the change into the 'main' branch.

Test Build/CI process

Now, we can save the changes to our DevOps pipeline and run the pipeline. If this is the first time this project uses the service connection, you might need to grant permission.

After the build is completed, we can review the artifact created. 

The ARM template files created are the same in the publish branch,

With this artifact created, we can start on the deployment stage.

Implement Deploy to live mode (CD) stage

We need to define a new stage for the deployment process. In this example, we will define the Dev deployment stage. We will be retrieving the pipeline artifact and deploying it into the Live mode of Azure Data Factory.

  1. Define the 'Deploy_Dev_Stage' and ensure this code only runs if the CI Build stage is successful. For deployment, we are going to use the 'deployment' job as well. This allow us to define the strategy on the deployment. For this example, we are going to run this deployment once only.
    -stage: Deploy_Dev_Stage
    displayName: Deploy Dev Stage
    dependsOn: Build_ADF_ARM_Stage
    jobs: -deployment: Deploy_Dev
    displayName: 'Deployment - DEV'
        environment: DEV
        strategy: 
        runOnce: 
        deploy: 
        steps:
  2. Download the artifact. Since each job can be ran on a different build server, we need to download the artifact to ensure our files exists. We are going to extract the artifact files under the targetPath. The 'Pipeline.Workspace' variable is a global variable defined by Azure DevOps.

    -task: DownloadPipelineArtifact @2
    displayName: Download Build Artifacts - ADF ARM templates
    inputs: artifactName: '$(adfName)-armTemplate'
    targetPath: '$(Pipeline.Workspace)/$(adfName)-armTemplate'
  3. Override the parameters and deploy the ARM template. The parameters we are overriding are primarily from the Linked services in Azure Data Factory. This enables us to deploy the artifact into Development, QA and Production environments by replacing the specify the connection information. It is also important to set the deploymentMode to 'Incremental' to ensure only the ADF pipelines are updated.

    -task: AzureResourceManagerTemplateDeployment @3
    displayName: 'Deploying to Dev RG task'
    inputs: deploymentScope: 'Resource Group'
    azureResourceManagerConnection: 'myServiceConnection'
    subscriptionId: '$(subscriptionId)'
    action: 'Create Or Update Resource Group'
    resourceGroupName: '$(resourceGroupName)'
    location: '$(adfLocation)'
    templateLocation: 'Linked artifact'
    csmFile: '$(Pipeline.Workspace)/$(adfName)-armTemplate/ARMTemplateForFactory.json'
    csmParametersFile: '$(Pipeline.Workspace)/$(adfName)-armTemplate/ARMTemplateParametersForFactory.json'
    overrideParameters: '-factoryName "$(adfName)" -LS_SalesDatabase_connectionString "Integrated Security=False;Encrypt=True;Connection Timeout=30;Data Source=mySalesDatabase.database.windows.net;Initial Catalog=mySalesDb;User ID=myDataUser" -LS_DataLake_properties_typeProperties_url "https://myDataLake.dfs.core.windows.net" -LS_KeyVault_properties_typeProperties_baseUrl "https://myKeyVault.vault.azure.net/"'
    deploymentMode: 'Incremental'
  4. Run the pipeline and verify the result. We can see the Deploy Dev Stage is completed and ran successfully. 

Summary

By creating an Azure DevOps pipeline, we have successfully automate our data pipeline deployment process. This approach eliminates the need to manually click on the 'Publish' button. In the future, we can extend this solution to deploy the artifact into QA and Production environments.

You can obtain the source code from my github repo.

Happy Learning.

References