Data replication is the concept of having data within a system be geo-distributed, preferably using a non-interactive, reliable process. In traditional RDBMS databases, implementing any sort of replication is a struggle because these systems were not developed with horizontal scaling in mind. Most NoSQL databases support automatic replication. MongoDB provides automatic replication.
Introduction to Replication
Replication is a process or method to synchronize the data across multiple servers. Replication in MongoDB is done by a replica set. A replica set in MongoDB is a group of MongoDB processes that maintain the same data set. Replica sets provide redundancy and high availability with multiple copies of data on different database servers. Replication removes dependencies from a single server so replication protects a database from the loss of a single server. Replication provides a mechanism to recover from hardware failure and service interruptions. Replication is also used to increase the read capacity. Replication provides choices for the client so he can select a different server for read and write operations. Replication maintains copies in different data centers to increase the locality and availability of data for distributed applications.
Important terms in Replication
Now we consider some terms used in replication.
Primary and Secondary Instance
MongoDB does replication using replica sets. A replica set is a group of mongod instances that host the same data set.
A replica set contains two types of MongoDB instances.
Primary Instance: The primary instance receives all write operations.
Secondary Instance: The secondary instance applies operations from the primary so that they have the same data set.
In a replica set, only one primary instance is allowed and all other instances are secondary instances. This primary instance accepts all write operations from clients. A replica set is a group of two or more nodes (generally a minimum of 3 nodes are required).
When a primary instance receives a write operation from a user then it updates its oplog (operation log). The oplog is a special kind of capped collection for storing all the operations that modify the data of the database. MongoDB first applies the operation on the primary instance then records the operation in the primary’s operation log (oplog). Now, the secondary instance copies the operations and applies them asynchronously. All secondary replica sets contain a copy of the primary instance’s oplog.
Figure 1: Primary Instance
A secondary instance copies the oplog of the primary instance and performs operations on their data sets such that the secondary instance’s dataset reflects the primary instance’s datasets. The following three-member replica set has two secondary members. The secondary instance replicates the primary’s oplog and applies the operations to their data sets.
Arbiter Instance: All datasets of mongod are present in the primary and secondary instances. But sometimes mongod contains another instance known as the arbiter. The arbiter instance doesn’t contain any replica set but it maintains a quorum in the replica set by presenting to a heartbeat and an election request by other replica sets.
Figure 2: Arbiter Instance
The arbiter is mainly used in the election of the primary. Sometimes, due to automatic failover or maintenance, the election establishes a primary and a new primary node is elected among all the secondary nodes. If there is an even number of replica sets then an arbiter is added to obtain a majority of votes.
Figure 3: Arbiter
Automatic Failover: During automatic failover or maintenance of a primary instance, the primary instance doesn’t communicate with the secondary instance. If for more than 10 seconds the primary instance doesn’t communicate with the secondary or arbitrary instance, the replica set attempts to select a secondary member to become a new primary.
Figure 4: Automatic Failover
The first secondary instance that receives the first majority of votes becomes the new primary.
Figure 5: New Primary
Create Replica Set
Now I will explain how to create a simple replica set. We create a three-member replica set from an existing mongod instance. This three-member replica set contains enough redundancy to survive network partitioning and other system failures.
The following is the procedure to deploy a replica set.
Step 1
We create a three-member replica set so we must create the three data directories for each running member. For this, run the following command in a command prompt. Before running this command, close all running mongod server instances.
- md \srv\mongodb\rs0-0 \srv\mongodb\rs0-1 \srv\mongodb\rs0-2
This command will create a directory named “rs0-0”, “rs0-1”,”rs0-2”, as in the following.
Figure 6: MongoDB Folder
Step 2
Now close this command prompt, open another command prompt, and run the following command.
First Member
- mongod --port 27017 --dbpath /srv/mongodb/rs0-0 --replSet Rpset0 --smallfiles --oplogSize 128
This command will start the first mongod instance. In this command, “Rpset0” represents the name (id) of the replica set.
Second Member
Open another command prompt and run the following command:
- mongod --port 27018 --dbpath /srv/mongodb/rs0-1 --replSet Rpset0 --smallfiles --oplogSize 128
Third Member
Open another command prompt and run the following command:
- mongod --port 27019 --dbpath /srv/mongodb/rs0-2 --replSet Rpset0 --smallfiles --oplogSize 128
In the above procedure, we start 3 instances. Each instance runs on a separate port.
Step 3
Now we connect a mongod instance using a mongo shell. Open another command prompt and execute the command “mongo –-port Port_Number”. Port_Number specifies the instance to connect to. We can choose any port number among 27017,27018,27019. Here I selected port number 27017.
The preceding image shows that port number 27017 of localhost is becoming active.
Step 4
Now execute the “rs.initiate()” command. This command is used to initiate an instance.
Command: mongo –port 27017
Output
Step 5
Now run the rs.conf() command. This command shows the current replica set configuration object assembly.
Command: mongo --port 27017
Output
Now we can see that the mongo shell is connected to the primary.
Step 6
Now we add the remaining two mongod instances in the replica set using the “rs.add()” method.
Syntax
rs.add(<hostname>:<PortNumber>)
You can find your hostname using “rs.conf()” method.
Here my hostname is “Pankaj”.
Now we add remaining mongod instances to the replica set.
Command
rs.add(“Pankaj:27018”)
Output
In the above command, we add a second mongod instance to the replica set. If I check the replica set using the rs.conf() method, then I will find the following data.
- {
- "_id": "Rpset0",
- "version": 2,
- "members":
- [{
- "_id": 0,
- "host": "Pankaj:27017",
- "arbiterOnly": false,
- "buildIndexes": true,
- "hidden": false,
- "priority": 1,
- "tags":
- {
- },
- "slaveDelay": 0,
- "votes": 1
- },
- {
- "_id": 1,
- "host": "Pankaj:27018",
- "arbiterOnly": false,
- "buildIndexes": true,
- "hidden": false,
- "priority": 1,
- "tags":
- {
- },
- "slaveDelay": 0,
- "votes": 1
- }],
- "settings":
- {
- "chainingAllowed": true,
- "heartbeatTimeoutSecs": 10,
- "getLastErrorModes":
- {
- },
- "getLastErrorDefaults":
- {
- "w": 1,
- "wtimeout": 0
- }
- }
- }
The preceding data indicates that the second mongod instance has been added to the replica set.
Now we add a third mongod instance to the replica set.
If I check my replica set using the rs.conf() method one more time, then I will find the following data.
The contents of the replica set is:
- {
- "_id": "Rpset0",
- "version": 3,
- "members":
- [{
- "_id": 0,
- "host": "Pankaj:27017",
- "arbiterOnly": false,
- "buildIndexes": true,
- "hidden": false,
- "priority": 1,
- "tags":
- {
- },
- "slaveDelay": 0,
- "votes": 1
- },
- {
- "_id": 1,
- "host": "Pankaj:27018",
- "arbiterOnly": false,
- "buildIndexes": true,
- "hidden": false,
- "priority": 1,
- "tags":
- {
- },
- "slaveDelay": 0,
- "votes": 1
- },
- {
- "_id": 2,
- "host": "Pankaj:27019",
- "arbiterOnly": false,
- "buildIndexes": true,
- "hidden": false,
- "priority": 1,
- "tags":
- {
- },
- "slaveDelay": 0,
- "votes": 1
- }],
- "settings":
- {
- "chainingAllowed": true,
- "heartbeatTimeoutSecs": 10,
- "getLastErrorModes":
- {
- },
- "getLastErrorDefaults":
- {
- "w": 1,
- "wtimeout": 0
- }
- }
We can see that all three mongod instances are present in the replica set and a fully-functional replica set has been created.
Now the replica set elects a new primary and all remaining mongod instances will become the secondary. Now we determine which mongod instance is elected as the “primary”.
Use the rs.status() method to check the status of the replica set. When we execute the rs.status() method we will find the following details about the replica sets.
Command: rs.status()
Output - {
- "set": "Rpset0",
- "date": ISODate("2015-08-28T17:57:04.198Z"),
- "myState": 1,
- "members":
- [{
- "_id": 0,
- "name": "Pankaj:27017",
- "health": 1,
- "state": 1,
- "stateStr": "PRIMARY",
- "uptime": 698,
- "optime": Timestamp(1440784405, 1),
- "optimeDate": ISODate("2015-08-28T17:53:25Z"),
- "electionTime": Timestamp(1440783975, 2),
- "electionDate": ISODate("2015-08-28T17:46:15Z"),
- "configVersion": 3,
- "self": true
- },
- {
- "_id": 1,
- "name": "Pankaj:27018",
- "health": 1,
- "state": 2,
- "stateStr": "SECONDARY",
- "uptime": 451,
- "optime": Timestamp(1440784405, 1),
- "optimeDate": ISODate("2015-08-28T17:53:25Z"),
- "lastHeartbeat": ISODate("2015-08-28T17:57:03.928Z"),
- "lastHeartbeatRecv": ISODate("2015-08-28T17:57:03.359Z"),
- "pingMs": 0,
- "syncingTo": "Pankaj:27017",
- "configVersion": 3
- },
- {
- "_id": 2,
- "name": "Pankaj:27019",
- "health": 1,
- "state": 2,
- "stateStr": "SECONDARY",
- "uptime": 218,
- "optime": Timestamp(1440784405, 1),
- "optimeDate": ISODate("2015-08-28T17:53:25Z"),
- "lastHeartbeat": ISODate("2015-08-28T17:57:03.928Z"),
- "lastHeartbeatRecv": ISODate("2015-08-28T17:57:03.944Z"),
- "pingMs": 0,
- "configVersion": 3
- }],
- "ok": 1
- }
We can see that “Pankaj:27017” is elected as the primary and all the remaining mongod instances are secondary.
The following are advantages of replication:
- Provides support for disaster recovery.
- Keeps data safe.
- Removes dependency from a single server.
- Provides 365*24 data availability.
- Increases read scaling due to extra copies of data.
- Downtime doesn’t effect performance and provide services every time.
The following are points to remember:
- Replicas provide master-slave configuration but have the capability of automatic failover.
- In a replica set, there is a minimum of 2 and a maximum of 12 mongod instances.
- Replica sets contain one primary node and all the remaining nodes are secondary.
- During automatic failover or maintenance of the primary node, an election is made and the first secondary instance receiving a majority of votes becomes the new primary.
- After the recovery of the primary node, it joins the replica set and works as a secondary node.
- The user does read operations with the primary node, but the user can specify a read preference to send and read operations to a secondary.
Today, we read about replication and how to create a simple replica set in MongoDB. NoSQL databases support automatic replication, but traditional RDBMS databases don’t support replication because these systems were not developed with horizontal scaling. Replication is an important factor that makes NoSql preferable for storing huge amounts of data.
Thanks for reading this article!