​Deep Dive Into Azure Cosmos DB

Introduction


Azure Cosmos DB is a globally distributed and multi-model database service. It is a NoSQL database that is used to store non-structured data. It is a high-performance non-normalized database. It supports automatic scaling, high availability, and low latency. Azure Cosmos DB is an updated version of Microsoft Document DB Service. It is a horizontal scale database with multiple APIs and multiple model support. It is globally distributed which means it is available in every Azure region (multiple Geo locations) and allows us to replicate data to as many data centers and regions as we wish.
 
Deep Drive in Azure Cosmos DB
 

Why do we require a globally distributed database?


Let's take an example to explain the scenario. We have our own database running on the cloud in the US region and we have customers from across the world like India, the UK, or any other country. The main problem with our application is data latency. It is not the same for all customers but this problem is faced by the customers who are not accessing our application from the US. This problem may resolve by Cosmos DB by choosing multiple data centers to store application data that is nearest to our customers. The data gets replicated to each data center we select.
 

Key benefits of Azure Cosmos DB


Turnkey global distribution

Cosmos DB enables us to create a highly available and responsive application. It replicates our data to all Azure regions added to our Cosmos account. We can add/remove the Azure regions to our Cosmos account at any time.
 
Low latency

As mentioned earlier, Cosmos DB replicates our data to all Azure regions that are added to our Cosmos account. As a result, users can interact with the data stored nearest to them. It guarantees less than 10 ms latencies for both, read and write operations. This feature enables us to create highly responsive apps.
 
High availability

Cosmos DB provides 99.99% high availability for read/write operation. We can also add regional failover of our Cosmos DB account. This capability of Cosmos DB ensures high availability of our application.
 
No schema or index management

It is very painful to maintain schema and indexes in-sync with globally distributed apps. The Cosmos DB is a NoSQL database, so there is no need to deal with schema or index management. As a result, there is no application downtime when migrating schemas. It automatically indexes all the data.
 
Elastic (Horizontal) scalability of throughput and storage

We can elastically scale up from thousand to millions of requests per second for single API call and pay only for the required throughput. Using Cosmos DB, we can also manage elastically scaled throughput based on our requirement, it may vary both by time and geography.
 
Secure by default and enterprise ready

It is certified for many compliance standards. Also, all data in Cosmos DB is encrypted when stored or transferred. It also provides row-level authorization and follows strict security standards
 
Multi-model support

Azure Cosmos DB natively supports multiple data models, such as documents, key-value, graph, and column-family. It uses ARS (atom-record-sequence) system, so it translates all data models to atom record-sequence based models.
 
Deep Drive in Azure Cosmos DB

Multi-API support

Azure Cosmos DB supports multiple APIs that can interact with data regardless of the data model. This enables developers to select their preferred technology to develop the application. The Cosmos DB supports the following APIs. Some of the programming languages such as Java, .NET, nodeJS and Python support almost all the APIs.
  • SQL API
  • Mongo DB API
  • Tables API
  • Gremlin API
  • Cassandra API
Deep Drive in Azure Cosmos DB
 
Multiple consistency choices

Consistency means we get the same result for each run of the same action. Data inconsistency results in duplication or loss of data. Cosmos DB provides the following five different types of consistency models. Based on our business requirement, we can select the appropriate consistency model. The consistency is inversely proportional to high availability, high throughput, and low latency.

Strong

It is high consistency but data latency is high and performance is low. It guarantees the return of the most recent committed version of the item.
 
Bounded staleness

It allows us to set periods to replicate data to read, so there is a lag between reading and writing data to the database. If the time is set to zero, it acts as strong consistency. It can be applied if we select more than one region.
 
Session

It is default consistency. In this consistency type, the write region users always get the latest data but the read region users can get the latest data by lag time. It is most popular as it provides better throughput.
 
Consistency Prefix

It allows us to maintain the sequence to replicate the data to replicas. It guarantees that reads are never out of order writes.
 
Eventual

There is no ordering guarantee for reads. It provides high performance. It does not wait for data to commit when reading. It is opposed to strong consistency. It provides the weakest consistency.
 
Deep Drive in Azure Cosmos DB 
 

Create an Azure Cosmos DB account

 
Following are the instructions to create an Azure Cosmos DB account in the Azure portal. If you already have the Azure cosmos DB account to use skip these steps.
 
First of all, you need to login into the Azure portal. Select Create a resource > Databases > Azure Cosmos DB. Alternatively, you can "Azure Cosmos DB" from left panel.
 
Deep Drive in Azure Cosmos DB 
 
Deep Drive in Azure Cosmos DB
 
You need the following information for your Azure Cosmos DB account.
  • Subscription - Select the Azure subscription that you want to use for creating Azure Cosmos account
  • Resource Group - select an existing resource group or create a new resource group
  • Account Name - Enter a unique account name. This is the identity of our Azure Cosmos account.
  • API - Select API. It provides five APIs: Core (SQL), MongoDB, Gremlin, Azure Table and Cassandra
  • Location - Select a geographic location to host your Azure Cosmos DB account
  • Geo-Redundancy - From this option, you can enable global distribution
  • Multiple-region Write - From this option, you can enable Multiple-region write capabilities.
Click "Review + create" button to create an Azure Cosmos DB account. Optionally, you can set the Network and tags section. It will take a few minutes to create & deploy Cosmos DB account.
 
Deep Drive in Azure Cosmos DB
 
The next step is to create collection/container. There are many different ways to create Cosmos DB container. We can use either Azure portal or Azure CLI or supported SDK for same. In this article, I demonstrate how to create a container with the SQL API.
 
Select any existing Cosmos DB account with Core (SQL) API. Go to "Data Explorer" pane and select "New Container". It allows the use of a container to create a new database or existing database. Also, we need to enter collection id, partition key, and throughput and then click on "Ok".
 
Deep Drive in Azure Cosmos DB
 
Now, we can see "CosmosDbTest" database and under that "Products" collection. When expanding "Products" collection, we can find "Items". This contains all the existing data. We can add a new item by clicking "New Item" button and on the right pane, we can mention data (in JSON format) that we want to add in the collection.
 
Deep Drive in Azure Cosmos DB
 
As a hierarchy, it looks as the following image.
 
Deep Drive in Azure Cosmos DB
 

How to Configure Default Consistency


As I explained, the default consistency level for Cosmos DB is Session. We can change the default consistency level by clicking "Default consistency" from the left menu. It shows five types of consistency at right side, select consistency level and click "Save".
 
Deep Drive in Azure Cosmos DB
 

Create a database and collection through .NET Core code

 
We can also create a database and collection using the .NET code. To connect our Azure Cosmos DB account, we need Cosmos DB account URL and key. The cosmos DB account URL and key can be found under "Keys" setting. Here, you can find both kinds of key: read-write key and read-only key.
 
Deep Drive in Azure Cosmos DB
 
To access Azure Cosmos DB account from .NET code, we need to install "Microsoft.Azure.DocumentDB" dependency using NuGet packages.
 
Deep Drive in Azure Cosmos DB
 
Using CreateDatabaseAsync method of DocumentClient class, we can create a database. In the following code, first, I have check whether the database exists or not using ReadDatabaseAsync method. If this method throws NotFound exception, it means that the database does not exist.
  1. public async Task CreateDatabaseIfNotExistsAsync(string endpoint, string authKey)  
  2. {  
  3.     string databaseId = "CosmosDbTest";  
  4.     var client = new DocumentClient(new Uri(endpoint), authKey, new ConnectionPolicy { EnableEndpointDiscovery = false });  
  5.     try  
  6.     {  
  7.         await client.ReadDatabaseAsync(UriFactory.CreateDatabaseUri(databaseId));  
  8.     }  
  9.     catch (DocumentClientException e)  
  10.     {  
  11.         if (e.StatusCode == System.Net.HttpStatusCode.NotFound)  
  12.         {  
  13.             await client.CreateDatabaseAsync(new Database { Id = databaseId });  
  14.         }  
  15.         else  
  16.         {  
  17.             throw;  
  18.         }  
  19.     }  
  20. }  
Using CreateDocumentCollectionAsync method of DocumentClient class, we can create collection. n the following code, first, I have check whether collection exist under database or not using ReadDocumentCollectionAsync method. If this method throws a NotFound exception, it means that collection does not exist under the mentioned database.
  1. private static async Task CreateCollectionIfNotExistsAsync(string endpoint, string authKey, string databaseId)  
  2. {  
  3.     string collectionId = "Products";  
  4.     var client = new DocumentClient(new Uri(endpoint), authKey, new ConnectionPolicy { EnableEndpointDiscovery = false });  
  5.     try  
  6.     {  
  7.         await client.ReadDocumentCollectionAsync(UriFactory.CreateDocumentCollectionUri(databaseId, collectionId));  
  8.     }  
  9.     catch (DocumentClientException e)  
  10.     {  
  11.         if (e.StatusCode == System.Net.HttpStatusCode.NotFound)  
  12.         {  
  13.             await client.CreateDocumentCollectionAsync(  
  14.                 UriFactory.CreateDatabaseUri(databaseId),  
  15.                 new DocumentCollection { Id = collectionId },  
  16.                 new RequestOptions { OfferThroughput = 1000 });  
  17.         }  
  18.         else  
  19.         {  
  20.             throw;  
  21.         }  
  22.     }  
  23. }  

Create document


The document is user-defined JSON content. Every document must have an id property. We can create a document using CreateDocumentAsync method of the DocumentClient class. Using ReadDocumentAsync method of DocumentClient class, we can read document data from the database. This method throws an exception when the document does not exist in the collection. Using ReplaceDocumentAsync method, we can update the document. The method DeleteDocumentAsync is used to delete the document from the database. 
  1. //Read  
  2. public static async Task<T> GetItemAsync(string id, string category)  
  3. {  
  4.     try  
  5.     {  
  6.         Document document = await client.ReadDocumentAsync(UriFactory.CreateDocumentUri(DatabaseId, CollectionId, id), new RequestOptions { PartitionKey = new PartitionKey(category) });  
  7.         return (T)(dynamic)document;  
  8.     }  
  9.     catch (DocumentClientException e)  
  10.     {  
  11.         if (e.StatusCode == System.Net.HttpStatusCode.NotFound)  
  12.         {  
  13.             return null;  
  14.         }  
  15.         else  
  16.         {  
  17.             throw;  
  18.         }  
  19.     }  
  20. }  
  21.   
  22. //Create  
  23. public static async Task<Document> CreateItemAsync(T item)  
  24. {  
  25.     return await client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri(DatabaseId, CollectionId), item);  
  26. }  
  27.   
  28. //Update  
  29. public static async Task<Document> UpdateItemAsync(string id, T item)  
  30. {  
  31.     return await client.ReplaceDocumentAsync(UriFactory.CreateDocumentUri(DatabaseId, CollectionId, id), item);  
  32. }  
  33.   
  34. //Delete  
  35. public static async Task DeleteItemAsync(string id)  
  36. {  
  37.     await client.DeleteDocumentAsync(UriFactory.CreateDocumentUri(DatabaseId, CollectionId, id));  
  38. }  

Azure Cosmos Emulator for local development and testing


The Azure Cosmos Emulator help us to test Azure Cosmos DB service in development phase. It is hosted in the local environment. Using this emulator, we can develop and test our application locally, without Azure / Cosmos DB subscription hence, there is no cost for Cosmos DB service consumption for development. Once we are satisfied with the data structure and our code, we can switch to Azure Cosmos account in the cloud. You can download Azure Cosmos Emulator from here. Currently, It only supports SQL API for "Data Explorer" view. We can migrate data between the emulator and Azure Cosmos DB service by using Azure Cosmos DB Data Migration Tool.
 
The emulator running on localhost is listening on port 8081 by default. We can also start/stop the emulator using command-line.
 
​Deep Drive in Azure Cosmos DB
 

Limitations of Azure Cosmos Emulator

  • It supports client for SQL API, other API such as MongoDB, Table, Graph, and Cassandra APIs are not fully supported
  • It is not a scalable service, so it will not support a large number of containers
  • It allows only a single region, so it does not support different consistency levels
  • It supports only a single fixed account and well-known master key.
  • Using emulator, we can create up to 25 fixed size containers or 5 unlimited containers.

Summary


Azure Cosmos DB is globally distributed, high performance and multi-model database service. It is a NoSQL database used to store non-structured data. It is a horizontal scale database with multiple API and multiple model support. It supports multiple APIs such as SQL, Mongo DB, Tables, Gremlin and Cassandra. It supports multiple data models such as documents, key-value, graph, and column-family. It provides 99.99% high availability for both read/write operations.
 
You can view/download source from here.


Similar Articles