Why You Should Learn Azure Cosmos DB?

Do you know which database is behind ChatGPT’s success story? and is a leader for the AI era? It can also seamlessly accommodate all your operational data models, including relational, document, vector, key-value, graph, and table. Welcome to the world of Cosmos DB.

Note: We can use Cosmos DB and Azure Cosmos DB interchangeably, but they refer to the same entity. Azure Cosmos DB is a fully managed, multi-model database service designed specifically for Microsoft’s Azure ecosystem. As such, it cannot be hosted outside of the Azure environment.

What is Cosmos DB?

Owned and maintained by Microsoft, Azure Cosmos DB is a globally distributed, multi-model database service.

Now, what we mean by globally distributed, is the database that allows the distribution of data across multiple Azure regions, making it possible to build applications that are globally available and resilient to regional failures. It also provides multi-master replication - allowing for write operations to be handled in multiple locations simultaneously.

Further, by multi-model, we mean its support for various data models, including.

  • Document: JSON documents, similar to MongoDB.
  • Graph: Using the Gremlin graph traversal language.
  • Key-value: Simple key-value pairs.
  • Column-Family: Similar to Cassandra.

Advantages of Cosmos DB

Below is a list of comprehensive advantages of Cosmos DB. These features are available out of the box, and we can always configure some parameters.

  • Scalability: Cosmos DB can automatically scale throughput and storage based on the application’s needs. This means it can handle large volumes of data and high request loads seamlessly.
  • Configuration Needed: Set the desired performance level in Request Units (RUs) and configure auto-scaling policies if required.
  • Performance: It guarantees low latency for read and write operations (typically in the single-digit millisecond range) and provides features such as automatic indexing of data without requiring schema or secondary indexes.
  • Configuration Needed: Adjust indexing policies, partitioning strategies, and possibly network configurations to optimize performance based on your specific workload.
  • Integrated with Other Azure Services: It integrates well with other Azure services like Azure Functions, Azure Kubernetes Service (AKS), and Azure Machine Learning, enabling the development of complex, intelligent applications.
  • Configuration Needed: Set up and configure integrations with services like Azure Functions, AKS, or Azure Machine Learning based on your application requirements.
  • Open Source SDKs and APIs: Cosmos DB provides SDKs for various programming languages (including JavaScript, .NET, Java, Python, and more) and supports APIs for SQL, MongoDB, Cassandra, Gremlin, and Table storage, making it flexible and easy to use in different environments.
  • Configuration Needed: Install and configure the appropriate SDKs and ensure your application calls the correct APIs.
  • Real-Time Analytics: It supports real-time analytics with features like built-in vector search capabilities, which enhances its ability to handle and query large datasets efficiently.
  • Configuration Needed: Configure indexing and analytics features to leverage real-time insights based on your data.
  • Security: Cosmos DB offers several security features, such as encryption at rest, role-based access control (RBAC), and VNET integration to ensure secure access and data protection.
  • Configuration Needed: Set up role-based access control (RBAC), configure VNET integration, and manage other security settings as per your requirements.

Request Units

Every operation (create, read, update, delete, query) consumes a certain number of Request Units based on the complexity and size of the operation.

Let’s say you have a collection of documents in Cosmos DB, and you want to perform operations on this collection. Each operation will consume resources such as CPU, Memory, and Input/Output. Here are some basic operations and how they might consume RUs.

  1. Read Operation
    • Operation: Reading a small JSON document.
    • RU Cost: Suppose reading a 1 KB document costs 1 RU.
    • Example: If you read 1000 such documents, it will cost you 1000 RUs.
  2. Write Operation
    • Operation: Writing a small JSON document.
    • RU Cost: Writing a 1 KB document may cost around 5 RUs.
    • Example: If you write 100 such documents, it will cost you 500 RUs.
  3. Query Operation
    • Operation: Executing a simple query to retrieve documents based on a filter.
    • RU Cost: Suppose executing a simple query that returns 10 documents costs 10 RUs.
    • Example: If you run this query 100 times, it will cost you 1000 RUs.

Putting It All Together

Imagine you have an application that performs the following operations in one second:

  • Reads 500 documents (1 RU per read): 500 RUs.
  • Writes 50 documents (5 RUs per write): 250 RUs.
  • Executes 20 queries (10 RUs per query): 200 RUs.

Total RU Consumption

  • Reads: 500 RUs
  • Writes: 250 RUs
  • Queries: 200 RUs

Total RUs consumed per second: 950 RUs.

As of now, the following free options are available.

APIs in Azure Cosmos DB

Azure Cosmos DB offers multiple database APIs to cater to different types of applications and workloads, allowing developers to use the data models and query languages they are familiar with. Here are the primary APIs supported by Azure Cosmos DB, along with their purposes:

  1. SQL API
    • Purpose: This is the core API for interacting with Azure Cosmos DB and is designed for document-oriented data. It uses a SQL-like query language to interact with JSON documents.
    • Use Cases: Ideal for applications requiring rich querying capabilities over JSON data, such as web and mobile applications, content management systems, and more.
  2. MongoDB API
    • Purpose: This API allows developers to use MongoDB drivers to interact with Cosmos DB, providing compatibility with MongoDB applications.
    • Use Cases: Suitable for applications that are already using MongoDB and want to leverage Cosmos DB’s global distribution and scalability without changing their codebase.
  3. Cassandra API
    • Purpose: This API provides wire protocol compatibility with Apache Cassandra, allowing users to use existing Cassandra tools and SDKs.
    • Use Cases: Ideal for applications that use Cassandra’s wide-column store model, such as IoT, time-series data, and other applications requiring high write throughput.
  4. Gremlin API
    • Purpose: This API supports graph-based data models using the Gremlin query language, which is part of the Apache TinkerPop project.
    • Use Cases: Best for applications that involve complex relationships and graph traversal, such as social networks, recommendation engines, and fraud detection systems.
  5. Table API
    • Purpose: This API is designed for key-value storage and is compatible with Azure Table Storage.
    • Use Cases: Suitable for applications requiring simple key-value storage with a need for high availability and scalability, such as configuration stores and session data.

Key Features and Benefits

  • Flexibility: Multiple APIs allow developers to choose the data model and query language that best fits their application’s requirements.
  • Compatibility: Developers can migrate existing applications using MongoDB, Cassandra, or Gremlin to Cosmos DB with minimal changes.
  • Scalability: All APIs benefit from Cosmos DB’s underlying infrastructure, providing global distribution, automatic scaling, and low-latency access.
  • Integrated Features: Regardless of the API used, developers can leverage Cosmos DB’s features like multi-master replication, multiple consistency levels, and comprehensive monitoring and diagnostics tools.

Example Scenarios

  1. SQL API: A mobile app needing advanced querying capabilities over user profiles and activity logs.
  2. MongoDB API: A content management system initially built on MongoDB wanting to scale globally using Cosmos DB.
  3. Cassandra API: An IoT platform collecting and analyzing time-series data from sensors distributed worldwide.
  4. Gremlin API: A social networking application analyzing and visualizing user connections and interactions.
  5. Table API: A web application storing user sessions and configuration settings.

In simpler terms,

SQL API

If you come from a SQL Server background, you would use the SQL API, which allows you to use a SQL-like query language to interact with JSON documents stored in Cosmos DB. For example, see the below query.

SELECT * 
FROM c 
WHERE c.property = 'value'

This is a SQL-like query used to fetch data from a Cosmos DB container using the SQL API.

MongoDB API

If you are familiar with MongoDB, you can use the MongoDB API to interact with Cosmos DB as if it were a MongoDB database. This means using MongoDB drivers and MongoDB query language.

db.collection.find({ 
  property: 'value' 
})

This is a MongoDB query used to fetch data from a Cosmos DB container using the MongoDB API.

Please note that If you are a developer experienced with SQL Server, to use the MongoDB API in Cosmos DB, you need to write your queries in MongoDB syntax, not SQL syntax. Cosmos DB does not automatically convert SQL queries into MongoDB queries or vice versa. You select the API that fits the model and language you wish to use, and you write your operations in the corresponding query language.

So, when we initiate Cosmos DB in our Azure instance, we can select the API based on our skills. Remember that we cannot change that later.

Conclusion

The rise of AI-powered applications has added another layer of complexity (in addition to the demand for being highly responsive and always online), as many of these applications integrate multiple data stores. For instance, some organizations develop applications that concurrently connect to MongoDB, Postgres, Redis, and Gremlin. These databases vary in implementation workflows and operational performance, making it more challenging to scale applications. Azure Cosmos DB streamlines and accelerates your application development by serving as a single database solution. In the era of AI and cloud, we should not ignore the importance of Azure Cosmos DB offerings. Learning and leveraging its capabilities can greatly benefit your application in the future.