In this article, we will see how Cosmos DB partitions can be created and used in order to achieve scaling.
In Cosmos DB, databases are the containers of collections and collections are where we keep all the documents. Each collection is assigned a performance level, i.e., the throughput for that collection. In my previous series of articles on Cosmos DB, we have seen and created a database collection, where we select fixed (10 GB) as storage capacity. Once you have more documents than 10 GB, you can distribute these documents among multiple collections. Each collection has 10 GBs of throughput. This is one of the ways of scaling the workload in Cosmos DB. Another way, on the other hand, is by creating partitions. Now, let's see how a partition is created and how internally it helps to scale.
Log into the Azure portal and explore your Cosmos account. You can find your previously created collection over there but that can't be used for this demo as for partitioning, we need to select a separate storage capacity option while creating the collection. So, click on 'Add Collection' to create a new collection with partition.
In order to create a collection with partition, it’s mandatory to select 'Unlimited' as Storage Capacity and we have to provide the partition key. Now, the next question is what should be the partition key.
- Partition key should always be there in each document. In our case we have defined 'Operator'.
- Each partition has a hard limit of 10 GB, which means you can't store documents having the same partition key with a size more than 10 GB, so define the partition key very logically.
Sample Document
- {
- "id": "3d3e9785-66ff-4560-8a16-99fa80c69401",
- "Operator": "JIO",
- "Provider": "CyberPlat",
- "Region": "Telengana",
- "Mobile": "9769496026",
- "Amount": "500",
- "Status": "Pending",
- "ExecutionTime": "500 ms",
- "CreatedDate": "2018-06-23T22:01:44.3895961+05:30"
- }
How Documents are stored?
Each document is uniquely identified by the partition key and the row key, i.e., id. Our partition key acts as a logical partition for our data and is provided to the Cosmos DB. Cosmos DB creates the number of physical partitions to serve the throughput; i.e. request units per second given by us while creating a collection. If the number of request units is greater than the serving capacity of physical partitions, Cosmos DB provisions physical partitions by dividing the number of requests with a serving capacity of physical partitions.
Cosmos DB stores documents evenly across the number of physical partitions. So, the number of logical partitions that each physical partition stores is equal to 1/ (number of physical partition * number of partition key values). Once the physical partition reaches its storage limit, Cosmos DB seamlessly splits the partition into two new physical partitions.
In normal language, just remember the below points.
- Each document size can't be more than 2 MB.
- The size of documents having the same partition key can't be more than 10 GB, it's a hard limit.
- If you aren't able to understand or define the partition key, then select some random key which is not going to be repeated, so ultimately, you won't reach the limit of 10 GB.
In our demo scenario, we have defined "operator" as the partition key. If we try to store the data with the same partition key (e.g. Vodafone) with the size of more than 10 GB, Cosmos DB returns the following message -
'Partition key reached maximum size of 10 GB'
We can check storage consumption in the form of a graph under the metrics link. Below is the graph showing the storage of collections without partition.
The following graph represents the storage for each partition.
When you click on each spike, it shows the partition key and storage space consumed.
I hope this article will help you to define a partition key while creating your Cosmos DB collection.