History of MongoDB
It was in Autumn 2007 that Kevin Ryan, Dwight Merriman and Eliot Horowitz, successful entrepreneurs, decided to found the company 10gen, with the aim of offering a Platform as a Service product, similar to Heroku, AWS Elastic Beanstalk or Google App Engine, but based on on opensource components.
Their experience through different web projects such as DoubleClick and ShopWiki has taught them that an application that becomes popular will run into scalability issues at the database level. In their search for a database to integrate into their PaaS product, no open source solution met their needs for scalability and compatibility with a cloud architecture.
This is why the 10gen team has internally developed a new document-oriented NoSQL database technology. They will baptize it MongoDB, inspired by the word "Humongous" which could be translated by "Gigantic", like the data it is supposed to host.
Why MongoDB ?
MongoDB was built for speed. The data is based on BSON documents, short for JSON binary. BSON allows MongoDB to be that much faster in calculating to find data in documents. In order to be even more efficient in its requests, MongoDB invites the denormalization of the data in its documents. Where a good practice in SQL was to have specific tables and foreign keys to refer to data during joins, MongoDB encourages denormalization by duplicating the data where it is requested. Although MongoDB offers reference mechanisms, they must be used wisely in order to benefit from the performance provided by a MongoDB database.
Speed
The data is based on BSON documents, short for JSON binary. BSON allows MongoDB to be that much faster in calculating to find data in documents. In order to be even more efficient in its requests, MongoDB invites the denormalization of the data in its documents. Where a good practice in SQL was to have specific tables and foreign keys to refer to data during joins, MongoDB encourages denormalization by duplicating the data where it is requested. Although MongoDB offers reference mechanisms, they must be used wisely in order to benefit from the performance provided by a MongoDB database.
Flexibility
Unlike SQL databases, the data in a Mongo collection can be completely heterogeneous. This is called the Schemaless. The advantage of not necessarily having a strict data structure is to be able to quickly change its data structure. This flexibility is greatly appreciated in projects at the prototype stage which are still discovering how their data should be structured. However, the Schemaless has its drawbacks. It becomes more difficult to perform data analysis operations if all the documents do not follow the same structure. This is why it is also possible to impose a Schema on the collection.
Cloud and distributed infrastructure
To ensure stability, one of the key concepts of MongoDB is to always have more than one copy of the database available in order to ensure an always fast availability even in the event of failure of the host machine. This ability to replicate the database across multiple machines in multiple locations easily helps improve the horizontal scalability of a database.
Install MongoDB on Windows 10
Step 1 - Download the MongoDB MSI Installer Package
Step 2 - Install MongoDB with the Installation Wizard
Log in with Admin privileges and double click on the .msi package you just downloaded. This will launch the installation wizard
Step 3
Accept the licence agreement then click Next,
Step 4
Select the Complete setup,
Step 5
Select “Run service as Network Service user” and make a note of the data directory, you will need this later.
**Run the service as Network Service user (Default) : This is a Windows user account that is built-in to Windows
**Run the service as a local or domain user :
- For an existing local user account, specify a period (i.e. .) for the Account Domain and specify the Account Name and the Account Password for the user.
- For an existing domain user, specify the Account Domain, the Account Name and the Account Password for that user
By default, the location for the MongoDB data directory is c:\data\db. So you need to create this folder manually or using the Command Prompt like below :
- C:\>md data
- C:\md data\db
Then you need to specify set the dbpath to the created directory in mongod.exe :
- C:\Users\hp> cd C:\Program Files\MongoDB\Server\4.2\bin
- C:\Program Files\MongoDB\Server\4.2\bin>mongod.exe --dbpath "C:\data"
On Windows, the location is <install directory>/bin/mongod.cfg. Open mongod.cfg file and check for dbPath option,
Step 6
You don’t need Mongo Compass, so deselect it and click Next.
Step 7
Click Install to launch the installation,
Step 8
Click Finish to complete the installation,
Step 9 - To verify that Setup was Successful
To check mongodb version use the mongod command with --version option. On windows you will have to use full path to the mongod.exe and mongo.exe to check mongodb version, if you have not set MongoDB Path. But if MongoDb Path is being set, you can simply use the mongod and mongo command.
Working with MongoDB
Step 1
To start MongoDB, run mongod.exe from the Command Prompt navigate to your MongoDB Bin folder and run mongod command, it will start MongoDB main process and The waiting for connections message in the console.
mongod is the "Mongo Daemon" it's basically the host process for the database. When you start mongod you're basically saying "start the MongoDB process and run it in the background".
mongo is the command-line shell that connects to a specific instance of mongod.
Step 2
Step 3
Import data from the .json file,
mongoimport command is used to restore (import) a database from a backup(export) taken with mongoexport
command. where, DB_NAME – Name of the Database of the Collection to be exported COLLECTION_name - Name of Collection of DB_NAME to be exported Type –JSON, it is optional.
- mongoimport –host localhost :27017 –db DBLP – collection publis < C:\Users\hp\Desktop\dblp.json\dblp.json --jsonArray
In the mongo console check that the data has been inserted,
Step 4
Find the list of all publications published in 2007,
- db.publis.find({year :2007})
The
find() method returns a cursor to the results. In the mongo shell, if the returned cursor is not assigned to a variable using the var keyword, the cursor is automatically iterated to access up to the first 20 documents that match the query.
Step 5
List of all articles (“Article” type),
Using SQL,
- SELECT * FROM publis
- WHERE [type] LIKE '%Article%'
Using MongoDB,
- db.publis.find({"type" : "Article"})
Step 6
Find the list of all publishers (type "publisher"),
Using SQL,
- SELECT distinct publisher FROM publis
Using MongoDB,
- Db.publis.distinct( "publisher" )
Step 7
Find the list of publications by author "David Gelbart",
Using SQL,
- SELECT * FROM publis
- WHERE authors LIKE '%David Gelbart%'
Using MongoDB,
- db.publis.find({"type" : " David Gelbart " }
Step 8
Sort "David Gelbart" publications by book title and year,
Using SQL,
- SELECT * FROM publis
- WHERE authors LIKE ‘%David Gelbart%’
- ORDER BY title, [Year]
To sort documents in MongoDB, you need to use sort() method. The method accepts a document containing a list of fields along with their sorting order. To specify sorting order 1 and -1 are used. 1 is used for ascending order while -1 is used for descending order.
Using MongoDB,
- db.publis.find({
- authors: ’’David Gelbart‘’
- }).sort({
- title: 1,
- year: 1
- })
Step 9
Sort "David Gelbart" posts by end page :
Using SQL,
- SELECT * FROM publis
- WHERE authors LIKE ‘%David Gelbart%’
- ORDER BY endpage
Using MongoDB,
- db.publis.aggregate([{
- $match: {
- authors: «David Gelbart»
- }
- }, ($sort: {
- "pages.end": 1
- }
- }])
- db.publis.find({
- authors: ’’David Gelbart‘’
- }).sort({
- pages.end: 1
- })
Step 10
Project the result on the title of the publication, and its type,
- db.publis.aggregate([{
- $match: {
- authors: "David Gelbart"
- }
- }, {
- $sort: {
- "pages.end": 1
- }
- }]), {
- $project: {
- title: 1,
- type: 1
- }
- }]);
Step 11
Count the number of its publications,
- db.publis.aggregate([{
- $match: {
- authors: "David Gelbart"
- }
- }, {
- $group: {
- _id: null,
- total: {
- $sum: 1
- }
- }
- }]);
Step 12
Count the number of publications since 2007,
- db.publis.aggregate([{
- $match: {
- year: {
- $gte: 2007
- }
- }
- }, {
- $group: {
- _id: "null",
- total: {
- $sum: 1
- }
- }
- }]);
Step 13
Count the number of publications since 2007 and by type,
- db.publis.aggregate([{
- $match: {
- year: {
- $gte: 2007
- }
- }
- }, {
- $group: {
- _id: "$type",
- total: {
- $sum: 1
- }
- }
- }]);
Step 14
Count the number of publications by author and sort the result in decreasing order,
- db.publis.aggregate([{
- $unwind: "$authors"
- }, {
- $group: {
- _id: "$authors",
- number: {
- $sum: 1
- }
- }
- }, {
- $sort: {
- number: -1
- }
- }]);
Map Reduce with Mongo
Step 1
For each book-type document, return the document with the “title” key.
- var mapFunction = function() {
- if (this.type == "Book") emit(this.title, this);
- };
- var reduceFunction = function(key, values) {
- return {
- articles: values
- };
- };
- db.publis.mapReduce(mapFunction, reduceFunction, {
- out: "result"
- });
- db.resultat.find();
or,
- var mapFunction = function() {
- emit(this.title, this);
- };
- var reduceFunction = function(key, values) {
- return {
- articles: values
- };
- };
- var queryParam = {
- query: {
- type: "Book"
- },
- out: "result_set"
- };
- db.publis.mapReduce(mapFunction, reduceFunction, queryParam);
- db.result_set.find();
Step 2
For each of its books, give the number of its authors.
- var mapFunction = function() {
- if (this.type == "Book") emit(this.title, this.authors.length);
- };
- var reduceFunction = function(key, values) {
- return {
- articles: values
- };
- };
- var queryParam = {
- query: {},
- out: "result_set"
- };
Step 3
For each document having "booktitle" (chapter) published by Springer, return the number of its chapters.
- var mapFunction = function() {
- if (this.publisher == "Springer" && this.booktitle) emit(this.booktitle, 1);
- };
- var reduceFunction = function(key, values) {
- return Array.sum(values);
- };
- var queryParam = {
- query: {},
- out: "result_set"
- };
- db.publis.mapReduce(mapFunction, reduceFunction, queryParam);
- db.result_set.find({
- value: {
- $gte: 2
- }
- });
Step 4
For each of its books, return the number of its authors.
- var mapFunction = function() {
- if (this.publisher == "Springer") emit(this.year, 1);
- };
- var reduceFunction = function(key, values) {
- return Array.sum(values);
- };
Step 5
For each “publisher & year” pair (publisher must be present), return the number of publications.
- var mapFunction = function() {
- if (this.publisher) emit({
- publisher: this.publisher,
- year: this.year
- }, 1);
- };
- var reduceFunction = function(key, values) {
- return Array.sum(values);
- };
Step 6
For the author "Toru Ishida", return the number of publications per year
- var mapFunction = function() {
- if (Array.contains(this.authors, "Toru Ishida")) emit(this.year, 1);
- };
- var reduceFunction = function(key, values) {
- return Array.sum(values);
- };
- var queryParam = {
- query: {},
- out: "result_set"
- };
Or,
- var mapFunction = function() {
- emit(this.year, 1);
- };
- var reduceFunction = function(key, values) {
- return Array.sum(values);
- };
- var queryParam = {
- query: {
- authors: "Toru Ishida"
- },
- out: "result_set"
- };
Step 7
For the author "Toru Ishida", return the average number of pages for his articles (Article type)
- var mapFunction = function() {
- emit(null, this.pages.end - this.pages.start);
- };
- var reduceFunction = function(key, values) {
- return Array.avg(values);
- };
- var queryParam = {
- query: {
- authors: "Toru Ishida"
- },
- out: "result_set"
- };
Step 8
For each author, list the titles of their publications,
- var mapFunction = function() {
- for (var i = 0; i < this.authors.length; i++) emit(this.authors[i], this.title);
- };
- var reduceFunction = function(key, values) {
- return {
- titles: values
- };
- };
Step 9
For each author, list the number of publications associated with each year,
- var mapFunction = function() {
- for (var i = 0; i < this.authors.length; i++) emit({
- author: this.authors[i],
- year: this.year
- }, 1);
- };
- var reduceFunction = function(key, values) {
- return Array.sum(values);
- };
Step 10
For the publisher "Springer", give the number of authors per year,
- var mapFunction = function() {
- for (var i = 0; i < this.authors.length; i++) emit(this.year, this.authors[i]);
- };
- var reduceFunction = function(key, values) {
- var distinct = 0;
- var authors = new Array();
- for (var i = 0; i < values.length; i++)
- if (!Array.contains(authors, values[i])) {
- distinct++;
- authors[authors.length] = values[i];
- }
- return distinct;
- };
Step 11
Count the publications of more than 3 authors.
- var mapFunction = function() {
- if (this.pages && this.pages.end) emit(this.publisher, this.pages.end - this.pages.start);
- };
- var reduceFunction = function(key, values) {
- return Array.avg(values);
- };
- ar queryParam = {
- query: {},
- out: "result_set"
- };
Step 12
For each publisher, give the average number of pages per publication,
- var mapFunction = function() {
- if (this.pages && this.pages.end) emit(this.publisher, this.pages.end - this.pages.start);
- };
- var reduceFunction = function(key, values) {
- return Array.avg(values);
- };
- ar queryParam = {
- query: {},
- out: "result_set"
- };
Step 13
For each author, give the minimum and maximum of years with publications, as well as the total number of publications,
- var mapFunction = function() {
- for (var i = 0; i < this.authors.length; i++) emit(this.authors[i], {
- min: this.year,
- max: this.year,
- number: 1
- });
- };
- var reduceFunction = function(key, values) {
- var v_min = 1000000;
- var v_max = 0;
- var v_number = 0;
- for (var i = 0; i < values.length; i++) {
- if (values[i].min < v_min) v_min = values[i].min;
- if (values[i].max > v_max) v_max = values[i].max;
- v_number++;
- }
- return {
- min: v_min,
- max: v_max,
- number: v_number
- };
- };
Summary
In this tutorial, we learned about basics of MongoDB and how to work with MongoDB.