In this tutorial we will create a Cosmos DB service using SQL API and query the data in our existing Azure Databricks Spark cluster using Scala notebook. We use Cosmos DB Spark Connector for this.
Step 1 - Cosmos DB creation with sample data.
Please follow the simple steps to create Cosmos DB service.
We must create one database and collection and enter some documents (records) to this collection.
I have added a total of 5 documents in this sample collection.
Step 2 - Cosmos DB Spark Connector library creation
We will go to our existing Azure Databricks cluster and add Cosmos DB Spark connector library. This library is an open source library made by Microsoft employees and other contributors written in JAVA and Scala. (Scala combines object-oriented and functional programming in one concise, high-level language. Scala's static types help avoid bugs in complex applications, and its JVM let you build high-performance systems with easy access to huge ecosystems of libraries.)
The entire source code for this connector can be found at Github
Please go to the Azure Databricks dashboard and click Import Library button.
You can browse the library file from your local system (which we downloaded from maven repository link)
After successfully uploading the JAR file you can click Create Library button.
Step 3 - Querying the Cosmos DB data using Scala notebook.
Create a notebook from dashboard (New Notebook button)
We created this notebook with Scala language. Azure Databricks supports Python, R and SQL also.
Import the required libraries to our notebook using the below command and click Shift Enter.
It will execute our command. It is the short cut to run the commands.