Creating Pipeline In Azure Data Factory And Monitoring

Article

Introduction

This article is in continuation of my Azure Data Factory article in which we were working with creating an Azure Data Factory account, security roles on Azure Data Lake store, and creating datasets in an Azure data factory account with HDInsight Cluster. And here, in this article, we will be creating Pipeline on the Azure Data Factory account to copy the data from one data store to another.

Note

Read my previous articles to create a data factory account with the help of the below links and you can work on this demo only if you have worked on the datasets on data factory.

Links

Follow the below steps now

Step 1

Go to New Data Store Blade - More - New Pipeline.

Step 2

We will get a coding snippet, as shown below

Step 3

Now, copy and paste the following code in the JSON response format code editor.

Note

Replace the storage account name with your own storage account name in the below code.

{
"name"
"MyFirstPipeline", "properties" {
"description"
"My first Azure Data Factory pipeline", "activities" [{
"type"
"HDInsightHive",
"typeProperties" {
"scriptPath"
"adfgetstarted/script/partitionweblogs.hql", "scriptLinkedService"
"AzureStorageLinkedService", "defines" {
"inputtable"
"wasb//[email protected]/inputdata", "partitionedtable"
"wasb//[email protected]/partitioneddata"
}
},
"inputs" [{
"name"
"AzureBlobInput"
}],
"outputs" [{
"name"
"AzureBlobOutput"
}],
"policy" {
"concurrency"
1, "retry"
3
},
"scheduler" {
"frequency"
"Month", "interval"
1
},
"name"
"RunSampleHiveActivity",
"linkedServiceName"
"HDInsightOnDemandLinkedService"
}], "start"
"2016-04-01T000000Z", "end"
"2016-04-02T000000Z", "isPaused"
false
}
}

Here, by this JSON code, we are creating a pipeline consisting of a single activity that uses Hive to process data on an HDInsight Cluster.

Step 4

Click on “Deploy” now.

Once it is deployed, you can find “MyFirstPipeline” in Azure Data Factory under Pipelines.

Step 5

Now, let's work on monitoring the Pipeline in Azure Data Factory. Go to the homepage of Azure Data Lake Store account and click on the diagram as shown below

Step 6

Here, we can find an overall diagram of datasets and pipelines that we have used in the Azure Data Lake Store account.

Step 7

We can also right click on an action like “MyFirstPipeline” and open the pipeline in the diagram to find the activities.

Step 8

We can find the Hive Activity in the Pipeline here.

Step 9

Click on Data Factory to move back.

Pipeline

Similarly, we can monitor the Azure Blob Input activity and Azure Blob Output Activity too.

Note

In my next article, we will be working on the same with Visual Studio for Azure Data Factory.