Select "Create a resource" and choose Analytics -> Data Factory.
Give a valid name to the Azure Data Factory and choose resource group. If you don’t have any existing resource group, please create a new one.
Azure Data Factory will be created shortly.
Step 2 - Store Data in Blob Storage
We can upload a sample CSV file to the Blob Storage now.
Go to Storage Account and click “Storage Explorer” (Currently, it is in the preview mode).
By right clicking the Blob Container, you can see the "Create Blob Container" context menu. Just click it.
Please choose a valid container name (it is case-sensitive) and choose Public access level as Container so that we can access this container later from our Azure Data Factory.
Open the container and upload a sample CSV file to the blob container. I will upload an employee data CSV file which contains only 3 records. Please click the “Upload” button to proceed.
Step 3 - Create a new Database and Collection in Azure Cosmos DB Account
Open Cosmos DB account and click Data Explorer.
Click “New Database” button, give database name, and choose Throughput value. (It is not mandatory, you can simply ignore it).
You can add a new collection to this database by right clicking the database and choosing “New Collection” button.
Give a name to the Collection and give partition key also. Partition key is like a Primary key in SQL Server database.
Step 4 - Create a Pipeline in Azure Data Factory
We have already created Azure Data Factory. We can create a “Copy Data” pipeline now. Please open Azure Data Factory and click “Author and Monitor” button
It will open the ADF dashboard. Choose “Create Pipeline” option.
We can create a new activity now. In the filer box, please type “Copy” it will show the “Copy Data” option under Move & Transform tab. You can drag this activity to the work area as I did.
We can rename the Activity in “General” tab. I have given a small description also.
In the Source tab, you can select the source dataset. Please click “New” Button. It will list all the data sources available in ADF. Currently, Microsoft supports more than 70 data sources.
As our source dataset is Blob storage, please choose it and click the “Finish” Button.
Choose the “Connection” tab and click “New” button to create a new linked service for source dataset.
You can choose your Azure subscription and choose the already created Storage account name and click the “Finish” button.
We can choose container name in Connection by browsing it and choose the container name from blob storage.
You can ignore the file name. It will automatically pick the file from the container.
In our CSV, the first row contains the column name. Select “Column name in first row” option.
You can click the pipeline and choose the Sink tab. It is for Destination dataset.
Please click the “New” button to choose Destination dataset and select Cosmos DB as destination data source and click “Finish” button.
Choose Connection tab and click “New” button to create new linked service for Cosmos DB.
Choose your Azure subscription from the dropdown list and select Cosmos DB account name and database name from the list and click “Finish” button.
You can choose the Collection name from the list. (We have already created the collection in Cosmos DB account)
We have created Source Dataset and Sink Dataset in our pipeline. We can validate the pipeline and datasets before publishing it.
We can see the validation errors if anything occurred.
Our validation was successful. We can now publish ADF.
It will take some time to publish all the changes.
After a successful publish, we can Trigger the pipeline.
Click “Trigger” Button and choose “Trigger Now” It will open a window and choose “Finish” button.
We will be notified with a message that the pipeline succeeded.
Our data integration is completed now. We can open the Cosmos DB to check the copied data from Blob Storage. You can see that there are three records (documents) available in Cosmos DB. As I mentioned earlier my CSV file contains 3 records.
You can download the ARM (Azure Resource Manager) template for this ADF for future use. ARM template contains the pipeline and dataset details.
Normally there are two ARM templates available for each ADF.
“arm_template.json” and “arm_template_parameters.json”
arm_template.json