Mongo DB Indexing Strategies with Real-time Scenarios

Vishal Joshi
Jun 03
903
0
3

Article

Let’s check briefly about MongoDB. It’s a highly scalable, distributed, and flexible NoSql Database to handle the storage of large-volume data. Mongo DB uses documents and collections instead of rows and table structure relational databases.

In this article, we will try to deeply review indexing in MongoDB with real-time scenarios syntax and output. We will check to improve complex query performance with indexing in case of bulk data.

What is indexing in MongoDB?

In MongoDB, indexing refers to the process of creating data structures to improve the speed of data retrieval operations on a collection. Indexes store a small portion of the data set in an easy-to-traverse form. These data structures are typically stored in RAM or on disk, depending on their size and configuration.

In the absence of indexing, every collection document must be scanned to select those that match the query statement. This scan requires MongoDB to process a large volume of data and is inefficient. Also, having too many indexes in MongoDB can affect the performance of multiple operations, such as update, insert, and delete. That is because indexes use the additional data space and additional write.

Create MongoDB Index

The createIndex() Method is used to create an index in the MongoDB database.

Syntax

db.COLLECTION_NAME.createIndex({KEY:1})

In the above-given syntax, the key is the field name that must be created. 1 is used for the ascending order, and to create a descending order index, use -1.

Parameters in createIndex() method

sparse
- Type: Boolean
- Description: The index only references documents with the specified field if true. It uses less space; it behaves differently in sorts.
- Default value: False
name
- Type: String
- Description: MongoDB generates an index name by concatenating the indexed fields’ names and the sort order if unspecified.
unique
- Type: Boolean
- Description: Creates a unique collection so that the collection does not accept document insertion where the index key or keys match an existing value in the index. Specify true to create a unique index.
- Default value: False
expireAfterSeconds
- Type: integer
- Description: Specify a value as a TTL to control how long MongoDB retains documents in this collection. The value is in seconds.
Background
- Type: Boolean
- Description: Builds the index in the background so that it doesn’t block other database activities. Specify true to build in the background.
- Default Value: False
Weights
- Type: document
- Description: Denotes the significance of the field relative to the other indexed fields in terms of score. The weight is a number ranging from 1 to 99,999.
default_language
- Type: string
- Description: The language that determines the list of stop words and the rules for the stemmer and tokenize for a text index.
- Default value: English
language_override
- Type: String
- Description: Language is the default value. For a text index, specify the field name in the document that contains the language to override the default language.

Index types

There are different types of indexes supported by MongoDB, as mentioned below.

Single field index

Check out the below query and screen print having data count by collection.

This index is used to fetch the data in ascending as well as descending order manner. A single field means it can create a single field of a document.

Example. Let’s check with the e-commerce database where lakhs of data and need to filter the data from that. In such cases, indexing is really a lifesaver.

print("products: " + db.products.count({}) + " \ncustomers: " + db.customers.count() + " \norders: " + db.orders.count() + "\nStockItems: " + db.StockItems.count())

Product

Let’s try to get the data with the below query with a lookup of all collections and filters.

db.orders.aggregate([
  {
    $lookup: {
      from: "customers",
      localField: "CustomerId",
      foreignField: "_id",
      as: "Customers"
    }
  },
  { $unwind: "$Customers" },
  {
    $lookup: {
      from: "products",
      localField: "ProductId",
      foreignField: "_id",
      as: "product"
    }
  },
  { $unwind: "$product" },
  {
    $lookup: {
      from: "StockItems",
      localField: "StockItemId",
      foreignField: "_id",
      as: "StockItems"
    }
  },
  { $unwind: "$StockItems" },
  {
    $match: {
      $and: [
        { "CountryOfOrder": "Japan" },
        { "Discount" : { $gt:10 } },
        { "ActualPrice" : { $gt: 1000 } },
        { "ActualPrice" : { $lt: 5000 } },
        {
          $or: [
            { "Customers.AverageOrderTime": "1 Week" },
            { "Customers.Type": "Regular" }
          ]
        },
        { "product.ProductValidityPeriod" : "5 Year"},
        { "product.CountryOfOrigin": "Germany" },
        { "StockItems.CountryOfImport": "India" },
        { "StockItems.Type": "Weight"},
        { "StockItems.CountryOfImport" : "India"},
        { "StockItems.AverageOrderTime" : "1 month"}
      ]
    }
  },
  {
    $project: {
      _id: 0,
      OrderProductType: "$ProductType",
      OrderDate: "$CreatedDate",
      OrderCountry: "$CountryOfOrder",
      OrderActualPrice: "$ActualPrice",
      OrderDiscount: "$Discount",
      OrderDiscountedPrice: "$DiscountedPrice",
      OrderCreatedBy: "$CreatedBy",
      ProductName: "$product.Name",
      ProductCountry: "$product.CountryOfOrigin",
      ProductValidity: "$product.ProductValidityPeriod",
      ProductDescription: "$product.Description",
      CustomerName: "$Customers.Name",
      CustomerCountry: "$Customers.Country",
      CustomerType: "$Customers.Type",
      CustomerAverageOrderTime: "$Customers.AverageOrderTime",
      CustomerCreatedDate: "$Customers.CreatedDate",
      StockName: "$StockItems.Name",
      StockType: "$StockItems.Type",
      StockCountry: "$StockItems.CountryOfImport",
      StockAverageOrderTime: "$StockItems.AverageOrderTime"
    }
  }
])

Result

Result

As per the screen print, we can see it’s returning the result in 3.08 seconds.

Now let’s try to create some single field index on collections on which filters are used.

// Create Single Field Index
db.products.createIndex({CountryOfOrigin:1})
db.orders.createIndex({CountryOfOrder:1})
db.orders.createIndex({Discount:1})
db.orders.createIndex({ActualPrice:1})
db.StockItems.createIndex({CountryOfImport:1})

Now let’s try to execute the same query after creating indexes.

Creating index

It’s returning data in 2.09 seconds. So it’s improving performance. When there are large numbers of records like lakhs or crores it can be a major difference.

Compound Index

As the name says, combine the multiple fields for searching and filtering the document. In other words, we can say a single index holds multiple fields or references.

Let’s check the same data in a single field index. Before starting, we need to drop a single field index. So let’s check the below query to remove the single field index.

// Drop Single Field Index
db.products.dropIndex("CountryOfOrigin_1")
db.orders.dropIndex("CountryOfOrder_1")
db.orders.dropIndex("Discount_1")
db.orders.dropIndex("ActualPrice_1")
db.StockItems.dropIndex("CountryOfImport_1")

Let’s create a compound index on fields that are used in filtering.

//Create Compound Index
db.orders.createIndex({CountryOfOrder:1,Discount:1,ActualPrice:1})
db.products.createIndex({CountryOfOrigin:1,ProductValidityPeriod:1})
db.StockItems.createIndex({CountryOfImport:1,Type:1,CountryOfImport:1,AverageOrderTime:1})

Let’s check the result after creating the compound index.

Collection

It returns data in 1.98 seconds. So it’s improving performance. Here we can see that it is more efficient than the single field index.

Multi-key Index

MongoDB is smart and automatically creates multiple indexes wherever it’s required. In MongoDB, we don’t need to explicitly specify the multiple key indexes. Instead, we can specify the Multi-key index and define the keys in array format. After that MongoDB automatically creates a separate index of each and every value present in that array.

Geospatial Indexes

Geospatial indexes in MongoDB are special types of indexes designed to efficiently query and analyze spatial data, such as points, lines, and polygons on the Earth's surface. MongoDB provides support for geospatial indexing through two types of indexes: 2d indexes and 2dsphere indexes.

2d Indexes: These indexes support legacy coordinate pairs (longitude, latitude) and are suitable for flat Earth geometries, such as plane surfaces. They are created using the legacy "2d" index type and can be applied to documents with coordinate pairs.
Example of 2d Index
```
db.places.createIndex({ location: "2d" })
```
2dsphere Indexes: These indexes support more complex spherical geometries, such as the Earth's surface. They are created using the "2dsphere" index type and can be applied to documents containing GeoJSON objects representing points, lines, or polygons.
Example of creating a 2dsphere index in MongoDB
```
db.places.createIndex({ location: "2dsphere" })
```
After creating a geospatial index, you can perform various geospatial queries to find documents based on their spatial proximity, containment, or intersection with a specified geometry. Some common geospatial queries supported by MongoDB include.

$near: Find documents near a specified point.
$geoWithin: Find documents within a specified geometry (polygon).
$geoIntersects: Find documents that intersect with a specified geometry.

Example of a geospatial query using a 2dsphere index.

// Find places near a specified point
db.places.find({
    location: {
        $near: {
            $geometry: {
                type: "Point",
                coordinates: [longitude, latitude]
            },
            $maxDistance: distanceInMeters
        }
    }
})

Text Index

In MongoDB, a text index is a special type of index designed to support full-text search queries on string fields within documents. Text indexes allow you to perform text-based queries, such as searching for words or phrases contained within text fields, with efficient performance.

Example to create a Text Index.

db.products.createIndex({ Description: "text" })

Real-time Example

Let’s say we have a large volume database like newspaper and magazine articles, a database of books, etc. Now we want to find the articles or content containing specific words or a phase. Here we can use the text index. Let’s check out the brief example for the same as below.

db.products.find({ $Description: { $search: "keyword" } })

Now, let’s say we want to search case-sensitive and language-specific data. Can we do that?

Yes, we can search for that too. Let’s check out how to write the query for that.

db.products.find({
    $Description: {
        $search: "keyword",
        $language: "english",
        $caseSensitive: true
    }
})

Text Search Operators: MongoDB provides several text search operators to customize text search queries.

$search: Specifies the search query string.
$language: Specifies the language for text parsing and stemming.
$caseSensitive: Specifies whether the search is case-sensitive.
$diacriticSensitive: Specifies whether the search is diacritic-sensitive.
Hash Index: As of my last update in January 2022, MongoDB does not support hash indexes as a distinct type of index like text indexes or geospatial indexes. However, MongoDB does utilize hashing internally for some index types, such as the _id field, which uses a hashed index by default for unique identifier indexing.
Hashing is a technique used in computer science and databases for quickly locating a data record given its search key. In MongoDB, hash-based indexes are not exposed explicitly, and the database handles them internally for specific purposes.
```
db.collection.createIndex({ fieldName: "hashed" })
```
Always consider your specific use case and query patterns when choosing the appropriate index type in MongoDB. While MongoDB does not expose hash indexes explicitly, it leverages hashing internally for optimized performance when necessary.
Wildcard Index: The Wildcard index is also known as Regular Expression Queries. MongoDB supports querying using regular expressions, which allows you to perform wildcard-like searches on string fields.
Example
```
db.collection.find({ fieldName: { $regex: /regex pattern/ } })
```
For more efficient text-based wildcard searches, you can create text indexes on string fields and use the $text operator for full-text search queries. While text indexes are not specifically wildcard indexes, they can provide improved performance for wildcard-like searches compared to regular expression queries.
Example. As we have already reviewed examples of text-based indexes. This index can be on top of that to get more efficient results. Let’s consider we have a database of articles with large volumes and want to search or filter some data based on pattern. Please check the below example for the same.
```
db. articles.createIndex({ Content: "text" })
db. articles.find({ $text: { $search: "pattern" } })
```

Conclusion

MongoDB indexing plays a vital role in sharpening the performance and efficiency of your database and caters to diverse data scenarios. It depends on your structure and scenarios. It is like a tool you can use to sharpen the performance based on the way you are using it. Check out the below key things to keep in mind.

Key Points on MongoDB Indexing

Monitor and assess index usage regularly, identifying opportunities to eliminate unused indexes and optimize storage.
Single-field Indexing is effective for simple queries on a single field, while Compound Indexing feeds to more complex scenarios.
Use profiling tools to identify and optimize slow queries, ensuring a responsive and efficient database.