Introduction
Welcome back to the MongoDB article series - Part 7. In the previous article, we discussed the advanced index concept MongoDB. We also discussed how to create an index, including different types of indexes like compound index, sparse index, unique index, etc. Now, in this article, we will discuss the replication process in MongoDB Database. If you want to read the previous articles of this series, then follow the links.
Replication is one of the important features in MongoDB. Most probably replication is one of the most important assignments or works for the MongoDB administrator. With the help of replication, we can provide constant data availability from the server. In this article, we will discuss the replication process in MongoDB and also discuss how to implement replication in the MongoDB server.
What is Replication in MongoDB?
In MongoDB, Replication is the process through which we can synchronize the data of a server among multiple servers. In this way, we can provide data redundancy and increase data availability among multiple servers. The replication process always protects a MongoDB database from the loss of a server due to hardware failure or any other reason. In this way, we can provide uninterrupted availability of MongoDB data using replication servers. With the help of replication, we can ensure that the same data is always available in more than one MongoDB Server.
So, if due to any hardware failure or any other reason, the main MongoDB server goes down, then also we can access the data from the replicated server since data has been replicated into another server at regular intervals through the replication process. Also, replication can be done for the purpose of load balancing. If we have a large number of users access the MongoDB database, then instead of connecting a single MongoDB server, we can connect users into multiple servers so that the load can be equally distributed.
We can achieve the below advantages if we use replication in MongoDB for the production environment –
- Using Replication, we can keep the data safe.
- Replication process always ensure the high availability of data
- We can take care of disaster recovery
- No downtimes required for maintenance (like backups, index rebuilds, etc.)
- Replica Set is always transparent to the application.
What is a Replica Set in MongoDB?
In MongoDB, the replication process can be set up by creating a replica set. In MongoDB, a replica set contains multiple MongoDB servers. In this group of MongoDB servers, one server is known as a Primary Server and others are known as Secondary servers. Every secondary server always keeps copies of the primary’s data. So, if any time the primary server goes down, then the new primary server is selected from the existing secondary server and process goes on. The replication process works as below with the help of a replica set –
- Replica Set is a group of one or more standalone MongoDB Servers (normally 3 MongoDB Servers are required).
- In a Replica Set, one server is marked as Primary Server and rest are marked as a Secondary Server.
- Data writes into the Primary Server from the application first.
- Then all the data replicates to the secondary servers from the primary server.
- When the primary server is unavailable due to hardware failure or maintenance work, the election process starts to identify the new primary server and select a primary server from the secondary server lists.
- When the failed server recovered, it will again join the replica set as a secondary server.
An above diagram of MongoDB replication is shown in the below image. In this image, a client application always communicates with the primary node and the primary node replicates the data to the multiple secondary nodes.
How to Configure Replication in MongoDB
In this section, we will discuss how to convert a standalone MongoDB Instance into a replica set. This process is not an ideal process for the production environment. Because in production, if we need to establish a replica set then we need to provide three different MongoDB instances for the replica set. But it is a good process for gaining knowledge about the idea of replication and knowing about the configuration of replication.
Step 1
Startup a mongo shell with the --nodb options from the command prompt. It will start a shell without any connection with the existing mongod instance.
Step 2
Now, create a replica set with the below commands,
- replicaSet = new ReplSetTest({name:'rsTest', nodes : 3})
This command instructs the shell to create a replica set with three node servers:- one primary and two secondaries.
Step 3
Now run the below commands one by one to start the mongod server instances,
- replicaSet.startSet() -- this command start the three mongod processes.
- replicaSet.initiate() -- this command configures the replication
Now, we have three mongod processes locally on ports 20000,20001 and 20002.
Now open another command prompt and connect the mongod running on port 20000.
- conn1 = new Mongo("localhost:20000")
- connection to localhost:20000
- rsTest: PRIMARY>
Note that, when we connect a replica set member, the prompt changes to rsTest: PRIMARY. Here PRIMARY is the state of the member and rsTest is the identifier of the replica set.
Now, if we want to check that the mongod instance is actually is a primary node or not, then we need to run the below command to check the status of the replica set –
Change Replication Configuration
After defining a replica set, we can change the replica set at any time. We can add new members, removing any existing members. There is a mongo shell helper method is available to add new replica set members or remove existing replica set members. To add a new member into the replica set, we need to run the below command,
Similarly, we can remove any members from the existing replica set using the below command,
- rs.remove("server-2:20002")
If we need to check the existing configuration of the replication, then we need to run the below command in the shell,
Syncing
The main objectives of the replication process are to keep the same or an identical set of data on multiple servers. For performing this task, MongoDB always maintains a log of operations or oplog which contains every writes information into the primary server. This log is a capped collection that exists in the local database on the primary server. The secondary servers are queries of this collection for obtaining the operation details so that they can replicate that data.
Every secondary server maintains its own oplog where MongoDB captures each operation related to the replication process from the primary server. These log files allow any replica set members to use as a sync source for other members. The secondary server always first fetches the information related to the pending operations from the primary members, then apply that operation to their own data set and then writes down the logs about that operation into the oplog.
If the secondary server goes down and after some time interval, the same secondary is going up and it starts the syncing process the last operation is done by itself according to its oplog file. As the operation first applied to the data and then it writes to the oplog, the secondary server may replay the operation that it has already applied to its data.
What is Heartbeat in Replication?
In Replication, Heartbeat is the process to identify the current status of the node servers within the replica set. Basically, replica set members send heartbeats (pings) to each other every two seconds. If a heartbeat does not return within 10 seconds, then the other members mark the delinquent member as an inaccessible member. This process is required to know the other member's state, like who’s primary, from which member they need to sync from or which node is down. A heartbeat request is basically a short message that checks everyone's current state.
One of the most important activities of heartbeats is to check that the primary server is available to reach for all secondary servers. If the majority of the secondary servers can’t reach to the primary server, then the process automatically demotes that primary server as a secondary server.
Elections
In the MongoDB replication process, if a member can’t reach the primary node then that member raises the election flag to the other members of the replica set. In this way, that member seeking election will send out a notice to all of the members to it can reach, so that other members of the replica set can’t raise the same election flag within the process. Assuming that there is no chance to raise an objection against the election request, the other members will vote for the member seeking election. If the member receives the majority votes from other members, then the election is successful and it will promote as the primary node. If it did not receive the majority of votes then it will remain as secondary and maybe try to become a primary node in the future.
If the network condition is healthy and most of the servers are up, then the entire elections process should be very fast. In this scenario, it will take two seconds to notify all the members that primary has gone down (since heartbeat response was not received yet) and the election process starts immediately.
Conclusion
In this article, we discussed the replication process in MongoDB like what is replication, what is replica set, how to configure it and the process mechanism of the replication process, etc. In the next article, we will discuss sharding in MongoDB.
I hope, this article will help you. Any feedback or query related to this article is most welcome.