Introduction
This article will help you to create an Azure Data Factory and create a linked service in it.
Why do we need a Data Factory?
The Data Factory will help us in creating pipelines which will help us in copying the data from one data store to another.
Requirements
- Microsoft Azure account
Follow the steps given below
Step 1. Log in to the Azure portal, using the link given below and you will get the home screen of the Azure portal.
Link- www.portal.azure.com
Step 2. Click New -->Databases --> Data Factory
You will get a new blade now for configuring your new Data Factory.
Fill in the details for the name of the Data Factory, Subscription, Resource Group, and Location, and pin to the dashboard what you wish to do. Click Create once the details are given.
Your Azure Data Factory will be deployed now.
Your Data Factory on Azure, which has been deployed, goes here.
Need for Linked Services
We need to create a data factory with a few entities first before we start working with the pipeline. Thus, now we will be creating a Linked Service to link the data stores to our data store to define the input/output and represent the same. Later, we will be creating the pipeline.
Further, we will be linking our Azure storage account with the Azure HDInsight Cluster towards our Azure Data Factory. The storage account will have the input and output data for the pipeline here.
Step 3. Open Azure Data Factory, which was created now. Go for author and deploy.
Now, click "New Data Store" and go for "Azure Storage".
We will be getting a JSON script to create Azure Storage Linked Service now.
Here is the editor with the JSON script.
Note. You should have a storage account, which was created earlier to configure this connection string
Step 4. Replace the connection string code given below with your storage account credentials
"connectionString" "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
Click "Deploy" once, as the connection string for the storage is configured.
Once the Linked Service is deployed, we can find the Draft-1 editor, which will be unavailable on the pane and we can see AzureStorageLinkedService on the left side of the Data Factory pane.
Step 5. We will be creating an Azure HDInsight Linked Service cluster now to the Data Factory.
Move to the Data Factory Editor and click "more" at the topmost right pane in the "New Datastore".
Click "New compute" here.
Select the “OnDemand HDInisght Cluster”.
Step 6. Copy the code snippet given below and place it in the editor of the Drafts/Drafts-1.
{
"name": "HDInsightOnDemandLinkedService",
"properties": {
"type": "HDInsightOnDemand",
"typeProperties": {
"version": "3.2",
"clusterSize": 1,
"timeToLive": "3",
"linkedServiceName": "AzureStorageLinkedService"
}
}
}
The code given above defines the JSON properties including Version, ClusterSize, TimeToLive, and LinkedServiceName. Once the code is copied towards the editor, click Deploy.
Now, you can find the two things at Linked Services as AzureStorageLinkedService and HDInsightOnDemandLinkedService.
Follow my next article to work on pipelines in Azure Data Factory.