Azure Cosmos DB is a new globally distributed database. It can easily be scaled out (horizontally partitioning), and it is widely available around the world.
In this article series, I'll introduce Cosmos DB with code samples.
- Table API
- SQL API
- API for MongoDB
- Cassandra API
- Gremlin API
Azure Cosmos DB is inspired by Dr. Leslie Lamport's theory. I want to thank Microsoft for their excellent choice and for the new fantastic Distributed Database System.
Before we begin, I want to make one thing very clear - Azure Cosmos DB is not to replace MS SQL Server. MS SQL Server is a Relational DBMS, and Cosmos DB provides native support for NoSQL, including Cassandra, MongoDB Gremlin, Spark, and SQL, etc. So, they are complementary to each other and not replacing each other.
As you can see in Figure 1, the relational databases are dominating from 1990 to 2000. The main problem was the old-style software architectures (Layered Architecture, Service-Oriented Architecture) with a relational database as a back-end is a vertical scalability/scaling up (fixed schema problem was less important). It was very expensive and sometimes hard to scale the application up. That was the main reason which helped ignite the NoSQL and Microservices revolution. Meanwhile, the image becomes more transparent, and as Martin Fowler has said, the polyglot persistence is the future for the database design.
Figure -1- Evaluation of Database Systems
Note
NewSQL is so important, like NoSQL, I recommend you read about it or to search for Michael Stonebraker on Google and YouTube.
What does it mean that Azure Cosmos DB primarily is NoSQL?
To answer the question, first, we have to know what is NoSQL.
NoSQL is a class of database management systems (DBMS), that does not follow all of the rules of a relational DBMS.
- Not using the relational model
- Running well on clusters
- Mostly open-source
- Schema-less
- Different Data Models
The text mentioned above "Running well on clusters" is easy to say, but it is hard to apply. To understand the clusters problem, we have to understand the distributed computing and distributed data store. So, let us take a look into Eric Brewer CAP theorem.
CAP theorem
CAP stands for Consistency (C), Availability (A), and Partition Tolerance (P). When you design an application with a distributed database, then you must choose between those three guarantees and the theory said you could have a maximum of two choices from the three.
The CAP theorem demonstrates that any distributed system cannot guarantee C, A, and P at the same time; instead, there are always trade-offs between C, A, and P.
In the next post, I will write about the BASE Concept.
Cosmos DB
If you are comfortable with the CAP theorem, you will know that there are always trade-offs. Azure Cosmos DB has five consistency models so that you can decide for yourself what you deem most important and what you are willing to sacrifice.
The currently available consistencies are,
Strong
With strong consistency, you are always guaranteed to read the latest version of an item similar to reading committed isolation in SQL Server. You can only ever see data which is durably committed. Strong consistency is scoped to a single region.
Bounded-staleness
In bounded-staleness consistency, read will lag behind writes, and guarantees global order and is not scoped to a single region.
Session
Is the most popular consistency level, since it provides consistency guarantees, but also has better throughput.
Consistent Prefix
The global order is preserved, and the prefix order is guaranteed. A user will never see writes in a different order than that in which it was written.
Eventual
Is like asynchronous synchronization. It guarantees that all changes will be replicated eventually, and as such, it also has the lowest latency because it does not need to wait on any commits.
Cosmos Table API
Table API belongs to the key-value database with a schema-less design for rapid development and auto-scaling. Table API is based on structured NoSQL data stored in the cloud and is fitted for global distribution scenarios.
Scenarios to use Cosmos Table API
Users data, Devices, IoT, Structured Data.
Figure -2- Key Value Database, in this case, Key is an integer, and the value is a sequence of bytes.
Table Structure
Account
Allows you to access Azure Cosmos DB and the Table API.
Table
Is a collection of entities. You can compare it just like a table in the relational database.
Entities
An entity is a set of properties, similar to the row of the relational database.
Properties
A property is a name-value pair. It is like a dictionary; the property name is the dictionary key. Each entity has three system properties that specify a partition key, a row key, and a timestamp.
Features
- No limits on numbers of tables, rows or a table size
- Dynamic load balancing
- NoSQL- Schema-less entities with strong Consistency
- Best for key/Value lookups on partition key and row key
- Entity group transaction for atomic batching.
- Guaranteed high availability.
- Automatic secondary indexing.
Entity Group Transaction
Groups the entity changes in a batch operation, then commits the changes together. Either all changes will be committed successfully, or all will fail. This operation can be executed under one condition: The entities must belong to the same partition; I have demonstrated below with a code sample.
Concurrencies
Pessimistic Concurrency
Locking the Entity so one call can write and blocking the other calls until the writing process is finished.
Optimistic Concurrency
The caller receives a notification about the concurrency changes in the entity, and he can decide which behavior is correct. Optimistic Concurrency is the default Azure one. I have simulated the Optimistic Concurrency problem below in the code sample.
Last write wins
However, write the data last that goes in the most current row.
Cosmos API Code Sample
Pre-Installation
As I said before, each entity has a partition key and row key. Partition Index and the row Index are used to create the clustered index, so please choose them carefully. Those keys are the glue for good design and excellent performance, and I highly recommend you to follow Microsoft design guidelines.
Entities with the same partition are put in a single tablet server, and the row key is used to identify the entity itself in the same partition.
I have defined a domain entity object “User” which need to be persisted.