Follow the below steps now.
Step 1
Here, we will be creating a dataset to give the input and output dataset for Hive processing.
Login to Azure portal. Move to Azure Data Factory account. Click on the Data Factory editor.
Note
This data account is where we have the Storage account configured, Azure Linked Services and an Azure HDInisght Cluster.
Step 2
Click on "Author and deploy".
Step 3
In the New Data Store blade, click on More - New Dataset - Azure Blob Storage.
We will be getting a blade with the code snippets, as shown below.
Step 4
Copy the coding given below and paste it in the coding editor pane.
Step 5
Click on “deploy” once, as the code is copied.
The table will be deployed and we will be getting a notification, as shown below.
The JSON properties defined above will be dealing with the properties of type, linkedServiceName, filename, type, columndelimiter, frequency/interval and external.
Now, we can find the Dataset on the left pane of the Data Factory blade.
Step 6
Let's create on Output dataset now.
Go to the New data store blade - More - New DataSet - AzureBlobStorage.
We will be getting a window with the coding blade of JSON response, as shown below.
Replace the code given above with JSON response code snippets, given below.
- {
- "name": "AzureBlobOutput",
- "properties": {
- "type": "AzureBlob",
- "linkedServiceName": "AzureStorageLinkedService",
- "typeProperties": {
- "folderPath": "adfgetstarted/partitioneddata",
- "format": {
- "type": "TextFormat",
- "columnDelimiter": ","
- }
- },
- "availability": {
- "frequency": "Month",
- "interval": 1
- }
- }
- }
Step 7
Click on "Deploy" once the JSON response is copied.
Here goes our Data Factory deployed with the new entity.
In Datasets, we can find two entries as AzureBlobInput and AzureBlobOutput, as shown below.
Follow my upcoming articles to create a pipeline and to monitor it.