MongoDB Aggregations (Day 14)

Before reading this article, I highly recommend reading the following previous parts of the series:

Introduction

Aggregation functions perform operations on groups of documents and return the computed result. Aggregation works mainly to group the data of multiple documents and perform various operations on the grouped data and returns a single or multiple results.

Types of aggregate functions

MongoDB performs aggregate operations in one of the following three ways.

  • Single-purpose aggregate methods and commands.
  • Pipeline.
  • Map Reduce

First, we will create a “Demo” collection and insert the following data into that collection. 

  1. {  
  2.     "_id": ObjectId("55dc6c94b32228b6ef8753c0"),  
  3.     "Name""Pankaj Choudhary",  
  4.     "Age": 21,  
  5.     "Salary": 25000  
  6. }   
  7. {  
  8.     "_id": ObjectId("55dc6ca8b32228b6ef8753c1"),  
  9.     "Name""Sandeep Jangid",  
  10.     "Age": 22,  
  11.     "Salary": 27000  
  12. }   
  13. {  
  14.     "_id": ObjectId("55dc6cb9b32228b6ef8753c2"),  
  15.     "Name""Rahul Prajapat",  
  16.     "Age": 23,  
  17.     "Salary": 37000  
  18. }   
  19. {  
  20.     "_id": ObjectId("55dc6ccab32228b6ef8753c3"),  
  21.     "Name""Sanjeev Baldia",  
  22.     "Age": 22,  
  23.     "Salary": 28000  
  24. }   
  25. {  
  26.     "_id": ObjectId("55dc6cdcb32228b6ef8753c4"),  
  27.     "Name""Narendra Sharma",  
  28.     "Age": 25,  
  29.     "Salary": 25000  
  30. }   
  31. {  
  32.     "_id": ObjectId("55dc6cf5b32228b6ef8753c5"),  
  33.     "Name""Nitin Yadav",  
  34.     "Age": 28,  
  35.     "Salary": 35000  
  36. }   
  37. {  
  38.     "_id": ObjectId("55dc6d09b32228b6ef8753c6"),  
  39.     "Name""Omveer Choudhary",  
  40.     "Age": 32,  
  41.     "Salary": 37000  
  42. }
First, we will read about single-purpose aggregates.

Single-Purpose aggregate methods and commands

As implied by its name, single-purpose aggregate methods are used for specific aggregation operations on sets of data. Single-purpose aggregate methods are less complex but with limited scope compared to pipeline and map reduce operations. Single-purpose aggregate methods and commands provide straightforward semantics for common data processing options.

The following are some important single-purpose operations.

Count

The Count operation takes a number of documents and depending on the match query returns the count of the documents. In MongoDB, the count command and cursor.count() method are used to do the count operations.

Example 1
  1. db.Demo.count()
Output

7

In this example Demo.count() is a cursor method that returns the count of all the documents present in the Demo collection.

Example 2
  1. db.runCommand({count:'Demo',query:{Salary:{$gt:30000}}})  
Output

{ "n" : 3, "ok" : 1 }

In this example, the count command takes all the documents in which the value of the Salary field is greater than 30000 and returns the count.

Distinct

The Distinct operation takes a document and depending on the match query returns the unique values for a field. In MongoDB the cursor.distinct() method and the distinct command performs the distinct operations.

Example 1
  1. db.Demo.distinct("Salary")
Output

[ 25000, 27000, 37000, 28000, 35000 ]

In this example, we use the distinct cursor method that returns the distinct value for the “Salary” field.

Example 2
  1. db.runCommand({distinct:'Demo',key:"Salary",query:{Salary:{$gt:30000}}})
Output
  1. {  
  2.     "values":   
  3.     [  
  4.         37000,  
  5.         35000  
  6.     ],  
  7.     "stats":   
  8.     {  
  9.         "n": 3,  
  10.         "nscanned": 0,  
  11.         "nscannedObjects": 7,  
  12.         "timems": 598,  
  13.         "planSummary""COLLSCAN"  
  14.     },  
  15.     "ok": 1  
  16. }
In this example the distinct command returned the distinct values for the “Salary” field where the value of the “Salary” field is greater than 30000.

Group

The Group operation takes a number of documents and depending on the match query creates a group of fields grouped by their value and finally returns an array of documents with the computed result for each group. In MongoDB, the group command and the cursor.group() methods do the group operations.

Example 1 
  1. db.Demo.group({key:{Age:1},reduce:function(cur,result){result.Salary+=cur.Salary},initial:{Salary:0}})  
Output
  1. [  
  2.   {  
  3.     "Age": 21,  
  4.     "Salary": 25000  
  5. },  
  6.   {  
  7.     "Age": 22,  
  8.     "Salary": 55000  
  9. },  
  10.   {  
  11.     "Age": 23,  
  12.     "Salary": 37000  
  13. },  
  14.   {  
  15.     "Age": 25,  
  16.     "Salary": 25000  
  17. },  
  18.   {  
  19.     "Age": 28,  
  20.     "Salary": 35000  
  21. },  
  22.   {  
  23.     "Age": 32,  
  24.     "Salary": 37000  
  25. }]
In this example, we used the group cursor method. This method groups the documents on the basis of the value of the “Age” field and computes the sum of Salary for each group.

Example 2
  1. db.runCommand({group:{ns:'Demo',key:{Salary:1},cond:{Salary:{$lt:30000}},$reduce:function(cur,resu  
  2. lt){result.Age+=cur.Age},initial:{Age:10}}})  
Output
  1. {  
  2.     "retval":   
  3.     [  
  4.       {  
  5.         "Salary": 25000,  
  6.         "Age": 56  
  7.     }, {  
  8.         "Salary": 27000,  
  9.         "Age": 32  
  10.     }, {  
  11.         "Salary": 28000,  
  12.         "Age": 32  
  13.        }  
  14.     ],  
  15.     "count": NumberLong(4),  
  16.     "keys": NumberLong(3),  
  17.     "ok": 1  
  18. }
In this example, we use the group command. This command groups the documents on the basis of the value of the Salary field and computes the sum of the Age field for each group.

Aggregate Pipeline

Pipeline means the possibility to execute an operation on some input and use the output as the input for the next command and so on. An aggregate pipeline is a framework modeled on the concept of data processing pipelines. In aggregate pipelines, documents enter a multi-stage pipeline that transforms the documents into an aggregated result. The pipeline operation uses the match query to fetch the exact documents and grouping to generate the group of documents. Pipeline operations provide tools for sorting documents by specified fields. A pipeline operation contains many stages, like filters that operate like queries and the transformation of documents that modify the form of output. Mainly, aggregate commands operate on a single collection and pass the entire collection into the aggregation pipeline.

Aggregate pipelines are an alternative to map-reduce. It may be an appropriate choice for aggregate operations because map-reduce are unnecessarily complex. But pipelines have some limited behavior on values types and result size.

Syntax

db.Collection_Name.aggregate({Pipeline expression})

Parameter

Pipeline expression: Pipeline expressions specify the transformation to apply on the input documents. A pipeline expression only operates on the current documents of the pipeline stage.

In an aggregate pipeline, documents pass through many stages and each stage transforms the documents into another form. It is not recommended that each pipeline stage produce a document with respect to each document. Some stages might reduce some documents and might generate new documents.

A pipeline aggregate contains a number of stages, each stage takes some documents as input and does operations on these documents and generates output. The following are the possible stages in an aggregate pipeline:

Stage

Description

$project Select specific fields from a collection
$match Specifies the selection criteria, to reduce the amount of documents
$group Used to divide the documents into various groups
$sort Sort the documents
$skip Used to skip a number of documents
$limit Defines the number of documents in an output result
$unwind Unwinds the documents using arrays
$redact Reshapes each document by restricting the content for each document
$out It is the last stage of a pipeline that writes the resulting documents of the aggregate pipeline to a collection.

Let's see an example to understand the concept of aggregate pipelines.

Example 1

  1. db.Demo.aggregate([{$group:{_id:"$Age",Salary_Is:{$sum:"$Salary"}}}])
Output
  1. {  
  2.     "_id": 32,  
  3.     "Salary_Is": 37000  
  4. }   
  5. {  
  6.     "_id": 25,  
  7.     "Salary_Is": 25000  
  8. }   
  9. {  
  10.     "_id": 23,  
  11.     "Salary_Is": 37000  
  12. }   
  13. {  
  14.     "_id": 22,  
  15.     "Salary_Is": 55000  
  16. }   
  17. {  
  18.     "_id": 28,  
  19.     "Salary_Is": 35000  
  20. }   
  21. {  
  22.     "_id": 21,  
  23.     "Salary_Is": 25000  
  24. }
In this example, we group the documents by the field “Age” and for each group, we calculate the sum of Salary. The following is the equivalent SQL query for this example.

Select Age, sum(Salary) from Demo group by Age.

Example 2 
  1. db.Demo.aggregate
  2. ([{  
  3.     $group: {  
  4.         _id: "$Age",  
  5.         Sum_Salary: {  
  6.             $sum: "$Salary"  
  7.         },  
  8.         Avg_Salary: {  
  9.             $avg: "$Salary"  
  10.         },  
  11.         Min  
  12.         _Salary: {  
  13.             $min: "$Salary"  
  14.         },  
  15.         Max_Salary: {  
  16.             $max: "$Salary"  
  17.         }  
  18.     }  
  19. }])
Output 
  1. {    
  2.     "_id": 32,    
  3.     "Sum_Salary": 37000,    
  4.     "Avg_Salary": 37000,    
  5.     "Min_Salary": 37000,    
  6.     "Max_Salary": 37000    
  7. }     
  8. {    
  9.     "_id": 25,    
  10.     "Sum_Salary": 25000,    
  11.     "Avg_Salary": 25000,    
  12.     "Min_Salary": 25000,    
  13.     "Max_Salary": 25000    
  14. }     
  15. {    
  16.     "_id": 23,    
  17.     "Sum_Salary": 37000,    
  18.     "Avg_Salary": 37000,    
  19.     "Min_Salary": 37000,    
  20.     "Max_Salary": 37000    
  21. }     
  22. {    
  23.     "_id": 22,    
  24.     "Sum_Salary": 55000,    
  25.     "Avg_Salary": 27500,    
  26.     "Min_Salary": 27000,    
  27.     "Max_Salary": 28000    
  28. }     
  29. {    
  30.     "_id": 28,    
  31.     "Sum_Salary": 35000,    
  32.     "Avg_Salary": 35000,    
  33.     "Min_Salary": 35000,    
  34.     "Max_Salary": 35000    
  35. }     
  36. {    
  37.     "_id": 21,    
  38.     "Sum_Salary": 25000,    
  39.     "Avg_Salary": 25000,    
  40.     "Min_Salary": 25000,    
  41.     "Max_Salary": 25000    
  42. }  
In this example we group documents by “Age” field and calculate the sum, average, minimum and maximum salary. The following is the equivalent query in SQL.

Select Age, sum(Salary), avg(Salary), max(Salary), min(Salary) from Demo group by Age.

Example 3
  1. db.Demo.aggregate  
  2. ([{  
  3.     $group:   
  4.   {  
  5.         _id: "$Age",  
  6.         Sum_Salary:   
  7.         {  
  8.             $sum: "$Salary"  
  9.         },  
  10.         Avg_Salary:   
  11.         {  
  12.             $avg: "$Salary"  
  13.         },  
  14.         Min  
  15.         _Salary:   
  16.         {  
  17.             $min: "$Salary"  
  18.         },  
  19.         Max_Salary:   
  20.         {  
  21.             $max: "$Salary"  
  22.         }  
  23.     }  
  24. },   
  25.    {  
  26.     $match:   
  27.     {  
  28.         _id:   
  29.          {  
  30.             $gt: 21,  
  31.             $lt: 27  
  32.          }  
  33.     }  
  34. }])
Output 
  1. {  
  2.     "_id": 25,  
  3.     "Sum_Salary": 25000,  
  4.     "Avg_Salary": 25000,  
  5.     "Min_Salary": 25000,  
  6.     "Max_Salary": 25000  
  7. }   
  8. {  
  9.     "_id": 23,  
  10.     "Sum_Salary": 37000,  
  11.     "Avg_Salary": 37000,  
  12.     "Min_Salary": 37000,  
  13.     "Max_Salary": 37000  
  14. }   
  15. {  
  16.     "_id": 22,  
  17.     "Sum_Salary": 55000,  
  18.     "Avg_Salary": 27500,  
  19.     "Min_Salary": 27000,  
  20.     "Max_Salary": 28000  
  21. }
In this example, documents are passed through two stages. The first stage is “group”. In that stage, we group documents by the “Age” field and calculate the sum, average, minimum and maximum salary. The second stage is the “match”, that stage filters all the documents depending on the value of the Age field.

The following query also generate the same output
  1. db.Demo.aggregate
  2. ([{  
  3.     $match:   
  4.     {  
  5.         Age:   
  6.         {  
  7.             $gt: 21,  
  8.             $lt: 27  
  9.         }  
  10.     }  
  11. }, {  
  12.     $group:   
  13.     {  
  14.         _id: "$Age",  
  15.         Sum_Salary:   
  16.         {  
  17.             $sum: "$Salary"  
  18.         },  
  19.         Avg_Salary:   
  20.         {  
  21.             $avg: "$Salary"  
  22.         },  
  23.         Min_Salary:   
  24.         {  
  25.             $min: "$Salary"  
  26.         },  
  27.         Max_Salary:   
  28.         {  
  29.             $max: "$Salary"  
  30.         }  
  31.     }  
  32. }]) 
The following is the equivalent query in SQL.

Select Age, sum(Salary), avg(Salary), max(Salary), min(Salary) from Demo Where Age>21 And Age<27 group by Age

Example 4
  1. db.Demo.aggregate
  2. ([{  
  3.       $limit: 3  
  4.   }, 
  5.   {  
  6.       $group:   
  7.       {  
  8.           _id: "$Age",  
  9.               Sum_Salary:   
  10.               {  
  11.                   $sum: "$Salary"  
  12.               },  
  13.               Avg_Salary:   
  14.               {  
  15.                   $avg: "  
  16.                   $Salary "  
  17.               },  
  18.               Min_Salary:{$min:"  
  19.               $Salary "},Max_Salary:{$max:"  
  20.               $Salary "}
  21.         }  
  22.   }])
Output 
  1. {  
  2.     "_id": 23,  
  3.     "Sum_Salary": 37000,  
  4.     "Avg_Salary": 37000,  
  5.     "Min_Salary": 37000,  
  6.     "Max_Salary": 37000  
  7. }   
  8. {  
  9.     "_id": 22,  
  10.     "Sum_Salary": 27000,  
  11.     "Avg_Salary": 27000,  
  12.     "Min_Salary": 27000,  
  13.     "Max_Salary": 27000  
  14. }   
  15. {  
  16.     "_id": 21,  
  17.     "Sum_Salary": 25000,  
  18.     "Avg_Salary": 25000,  
  19.     "Min_Salary": 25000,  
  20.     "Max_Salary": 25000  
  21. }
In this example, documents are passed through two stages. The first stage is “limit” that selects the top 3 documents from the collection and pass these documents to the second stage. The second stage groups the documents by the “Age” field and calculates the sum, average, minimum and maximum salary.

Important Point

The order of the stages in the aggregate pipeline is very important. Like db.Demo.aggregate($limit,$group) and db.Demo.aggregate($group,$limit) the methods don’t provide the same result.

Now we will execute the previous method but interchange the order of the “limit” and “group” stages.
  1. db.Demo.aggregate
  2. ([{  
  3.     $group: 
  4. {  
  5.         _id: "$Age",  
  6.         Sum_Salary: 
  7.         {  
  8.             $sum: "$Salary"  
  9.         },  
  10.         Avg_Salary: 
  11.         {  
  12.             $avg: "$Salary"  
  13.         },  
  14.         M  
  15.         in_Salary: 
  16.         {  
  17.             $min: "$Salary"  
  18.         },  
  19.         Max_Salary: 
  20.         {  
  21.             $max: "$Salary"  
  22.         }  
  23.     }  
  24. }, {  
  25.     $limit: 3  
  26. }])
Output 
  1. {  
  2.     "_id": 32,  
  3.     "Sum_Salary": 37000,  
  4.     "Avg_Salary": 37000,  
  5.     "Min_Salary": 37000,  
  6.     "Max_Salary": 37000  
  7. }   
  8. {  
  9.     "_id": 25,  
  10.     "Sum_Salary": 25000,  
  11.     "Avg_Salary": 25000,  
  12.     "Min_Salary": 25000,  
  13.     "Max_Salary": 25000  
  14. }   
  15. {  
  16.     "_id": 23,  
  17.     "Sum_Salary": 37000,  
  18.     "Avg_Salary": 37000,  
  19.     "Min_Salary": 37000,  
  20.     "Max_Salary": 37000  
  21. }
We can see that the result of this method and previous method is not the same, so the selection of the order of the stages is very important. This selection may change the expected result.

Example 5
  1. db.Demo.aggregate
  2. ([{  
  3.     $group: 
  4.     {  
  5.         _id: "$Age",  
  6.         Sum_Salary: {  
  7.             $sum: "$Salary"  
  8.         },  
  9.         Avg_Salary: {  
  10.             $avg: "$Salary"  
  11.         },  
  12.         M  
  13.         in_Salary: {  
  14.             $min: "$Salary"  
  15.         },  
  16.         Max_Salary: {  
  17.             $max: "$Salary"  
  18.         }  
  19.     }  
  20. }, {  
  21.     $project: {  
  22.         Sum_Salary: 1,  
  23.         Max_Salary: 1,  
  24.         _id: 0  
  25.     }  
  26. }, {  
  27.     $limit: 4  
  28. }])
Output
  1. {  
  2.     "Sum_Salary": 37000,  
  3.     "Max_Salary": 37000  
  4. }   
  5. {  
  6.     "Sum_Salary": 25000,  
  7.     "Max_Salary": 25000  
  8. }   
  9. {  
  10.     "Sum_Salary": 37000,  
  11.     "Max_Salary": 37000  
  12. }   
  13. {  
  14.     "Sum_Salary": 55000,  
  15.     "Max_Salary": 28000  
  16. }
In this example documents passed through the following three stages. The first stage is “group” as described in the previous examples. The second stage is “project”. In that stage, we only select the Sum_Salary and Max_Salary fields to display. The third stage is “limit”, in that stage we select the top 4 documents from the result of the previous stage (project).

Example 6
  1. db.Demo.aggregate([{  
  2.     $group: {  
  3.         _id: "$Age",  
  4.         Sum_Salary: {  
  5.             $sum: "$Salary"  
  6.         },  
  7.         Avg_Salary: {  
  8.             $avg: "$Salary"  
  9.         },  
  10.         M  
  11.         in_Salary: {  
  12.             $min: "$Salary"  
  13.         },  
  14.         Max_Salary: {  
  15.             $max: "$Salary"  
  16.         }  
  17.     }  
  18. }, {  
  19.     $project: {  
  20.         Sum_Salary: 1,  
  21.         Max_Salary: 1,  
  22.         _id: 0  
  23.     }  
  24. }, {  
  25.     $limit: 4  
  26. }, {  
  27.     $skip: 2  
  28. }]) 
Output
  1. {  
  2.     "Sum_Salary": 37000,  
  3.     "Max_Salary": 37000  
  4. }   
  5. {  
  6.     "Sum_Salary": 55000,  
  7.     "Max_Salary": 28000  
  8. }
This example is the same as the previous example but we add an extra stage, “skip”. So the result of the “limit” stage is passed to the “skip” stage and this stage removes the top 2 documents from the collection and prints the remaining documents.

Example 7
  1. db.Demo.aggregate([{  
  2.     $group: {  
  3.         _id: "$Age",  
  4.         Sum_Salary: {  
  5.             $sum: "$Salary"  
  6.         },  
  7.         Avg_Salary: {  
  8.             $avg: "$Salary"  
  9.         },  
  10.         M  
  11.         in_Salary: {  
  12.             $min: "$Salary"  
  13.         },  
  14.         Max_Salary: {  
  15.             $max: "$Salary"  
  16.         }  
  17.     }  
  18. }, {  
  19.     $project: {  
  20.         Sum_Salary: 1,  
  21.         Max_Salary: 1,  
  22.         _id: 0  
  23.     }  
  24. }, {  
  25.     $limit: 4  
  26. }, {  
  27.     $sort: {  
  28.         Sum_Salary: 1  
  29.     }  
  30. }])
Output
  1. {  
  2.     "Sum_Salary": 25000,  
  3.     "Max_Salary": 25000  
  4. }   
  5. {  
  6.     "Sum_Salary": 37000,  
  7.     "Max_Salary": 37000  
  8. }  
  9. {  
  10.     "Sum_Salary": 37000,  
  11.     "Max_Salary": 37000  
  12. }  
  13. {  
  14.     "Sum_Salary": 55000,  
  15.     "Max_Salary": 28000  
  16. }
This example is similar to Example 6, but in this example, we use an extra stage, “sort". That stage retrieves the documents from the output of the “limit” stage and sorts the documents depending on the value of the Sum_Salary field in ascending order.

Example 8
  1. db.Demo.aggregate([{  
  2.     $group: {  
  3.         _id: "$Age",  
  4.         Sum_Salary: {  
  5.             $sum: "$Salary"  
  6.         },  
  7.         Avg_Salary: {  
  8.             $avg: "$Salary"  
  9.         },  
  10.         M  
  11.         in_Salary: {  
  12.             $min: "$Salary"  
  13.         },  
  14.         Max_Salary: {  
  15.             $max: "$Salary"  
  16.         }  
  17.     }  
  18. }, {  
  19.     $project: {  
  20.         Sum_Salary: 1,  
  21.         Max_Salary: 1,  
  22.         _id: 0  
  23.     }  
  24. }, {  
  25.     $redact: {  
  26.         $cond: {  
  27.             if: {  
  28.                 $gt: ["$Sum_Salary", 30000]  
  29.             },  
  30.             then: "$$PRUNE",  
  31.             else"$$DESCEND"  
  32.         }  
  33.     }  
  34. }])
Output 
  1. {  
  2.     "Sum_Salary": 25000,  
  3.     "Max_Salary": 25000  
  4. }   
  5. {  
  6.     "Sum_Salary": 25000,  
  7.     "Max_Salary": 25000  
  8. }
In this example, we use a redact stage. This stage restricts the contents of the documents based on information stored in the documents themselves. In the preceding $redact stage we use two system variables. The first variable is ”$$PRUNE”, this variable excludes all the fields present in the current document or embedded at the document level. The second variable is “$$DESCEND” that returns all the fields of the current documents level, excluding embedded level documents. So the preceding query excludes all the documents where the value of “Sum_Salary” is greater than 30000.

Map-Reduce

Map-Reduce is another way to do aggregation. Map-Reduce is a combination of two operations. The first part of Map-Reduce is “Map”, that processes each document and retrieves one or more objects for each input document. The second part of Map-Reduce is “Reduce” that combines the result of the Map operations. Map-Reduce handles large amounts of data into aggregate results.

How Map-Reduce operation Works

For Map-Reduce operations, MongoDB provides the mapReduce database command. In a Map-Reduce operation, MongoDB first applies the map operation on all the documents that match the query condition. The result of the map operation generates the key-value pairs. For those keys that have more than one value, MongoDB applies reduce operations that collect and condense the aggregate result. Then MongoDB stores the results in a collection. Map-Reduce also contains an optional function, finalize. We can pass the output of the reduce operation to the finalize function to further condense the results of the aggregate operation.
 

MongoDB uses a JavaScript function in map-reduce aggregation. Map-Reduce aggregates may return the documents as output or may write the result to a collection. Map-Reduce uses the JavaScript functions that provide high flexibility.

Let us look at some examples for Map-Reduce aggregations.

Example 1

To perform the the mapReduce function for any collection, we must perform the following three steps. Now we will understand these procedures using the “Demo” collection.

Step 1

Define the map function.

  1. function()
  2. {  
  3.    emit(this.Age,this.Salary);  
  4. } ;  

In the function “this” defines the current collection name. This function maps the Salary to Age for each document and emits the Age and Salary pairs.

Step 2

Define the reduce function.

  1. function(key,values)
  2. {  
  3.    return Array.sum(values);  

Define the reduce function with the two arguments, key, and values. The values parameter is an array type that is emitted by the map function and contains salary values and grouped by Age. This function reduces the values array to a sum of its elements.

Step 3

Output Collection.

  1. { out:"My_Coll"}
This operation stores the result in the “My_Coll” collection. If the collection already exists then the contents of the collection will replace the contents of the mapReduce function.

The combined form of these three steps is:
  1. db.Demo.mapReduce(function() 
  2. {  
  3.     emit(this.Age, this.Salary);  
  4. }, 
  5. function(key, values) 
  6. {  
  7.     return Array.sum(v alues);  
  8. }, 
  9. {  
  10.     query: 
  11.     {  
  12.         Age:   
  13.         {  
  14.             $gt: 20  
  15.         }  
  16.     },  
  17.     out: "My_Coll"  
  18. })
So when we execute the preceding query MongoDB returns the following outputs as acknowledgement.
  1. {  
  2.    "result" : "My_Coll",  
  3.    "timeMillis" : 4,  
  4.    "counts" :    
  5.    {  
  6.       "input" : 7,  
  7.       "emit" : 7,  
  8.       "reduce" : 1,  
  9.       "output" : 6  
  10.    },  
  11.       "ok" : 1  
  12. }
The preceding result defines a number of input documents, number of emit documents, number of reduce documents, number of outputs and time to execute the mapReduce command.

The result of the preceding query will be stored in the “My_Coll” collection so we can use the find() method to retrieve the result from the “My_Coll” collection.
  1. db.My_Coll.find().pretty()
Output
  1. {  
  2.     "_id": 21,  
  3.     "value": 25000  
  4. }   
  5. {  
  6.     "_id": 22,  
  7.     "value": 55000  
  8. }   
  9. {  
  10.     "_id": 23,  
  11.     "value": 37000  
  12. }   
  13. {  
  14.     "_id": 25,  
  15.     "value": 25000  
  16. }   
  17. {  
  18.     "_id": 28,  
  19.     "value": 35000  
  20. }   
  21. {  
  22.     "_id": 32,  
  23.     "value": 37000  
  24. }
Example 2

Step 1
  1. var map_func = function()    
  2. {    
  3.     emit(this.Age, this.Salary);    
  4. };  
Step 2
  1. var reduce_func = function(key, values)     
  2. {    
  3.     return Array.sum(values);    
  4. };  
Step 3
  1. db.Demo.mapReduce(map_func, reduce_func,    
  2. {    
  3.     query:     
  4.     {    
  5.         Age:     
  6.         {    
  7.             $gt: 23    
  8.         }    
  9.     },    
  10.     out"My_Coll"    
  11. })  
Output
  1. {  
  2.     "result""My_Coll",  
  3.     "timeMillis": 4,  
  4.     "counts":   
  5.     {  
  6.         "input": 3,  
  7.         "emit": 3,  
  8.         "reduce": 0,  
  9.         "output": 3  
  10.     },  
  11.     "ok": 1  
  12. }
The following is the data of My_Coll collection:
  1. db.My_Coll.find().pretty()
Output
  1. {  
  2.     "_id": 25,  
  3.     "value": 25000  
  4. }   
  5. {  
  6.     "_id": 28,  
  7.     "value": 35000  
  8. }  
  9. {  
  10.     "_id": 32,  
  11.     "value": 37000  
  12. }
In this example, we will define the map and reduce functions in two variables and use both variables in the mapReduce function. We also define the query for collections that define the selection criteria.

Example  
  1. db.Demo.mapReduce(  
  2. function()   
  3. {  
  4.     emit(this.Age, this.Salary);  
  5. },  
  6. function(key, values)  
  7. {  
  8.     return Array.avg(v alues);  
  9. }, {  
  10.     query:   
  11.     {  
  12.         Age:   
  13.         {  
  14.             $gt: 20  
  15.         },  
  16.         Salary:   
  17.         {  
  18.             $gte: 24000  
  19.         }  
  20.     },  
  21.     out: "My_Coll",  
  22.     limit: 4,  
  23.     sort:   
  24.     {  
  25.         Salary: 1  
  26.     }  
  27. })
Output
  1. {  
  2.     "result""My_Coll",  
  3.     "timeMillis": 5,  
  4.     "counts"
  5. {  
  6.         "input": 4,  
  7.         "emit": 4,  
  8.         "reduce": 1,  
  9.         "output": 3  
  10. },  
  11.     "ok": 1  
  12. }  
  13. }
The following is the data of My_Coll collection:
  1. db.My_Coll.find().pretty()  
Output
  1. {  
  2.     "_id": 21,  
  3.     "value": 25000  
  4. }   
  5. {  
  6.     "_id": 22,  
  7.     "value": 27500  
  8. }   
  9. {  
  10.     "_id": 25,  
  11.     "value": 25000  
  12. }
Today, we read the aggregation operation in MongoDB. Aggregation is an important part of MongoDB that computes the records and filters out the results.

Thanks for reading this article!
 
Author
Pankaj Kumar Choudhary
71 26.6k 13.4m
Next » MongoDB Replication (Day 15)