Introduction
In this article, we are going to learn about one of the most advanced search services built-in Azure cloud by Microsoft. This advanced search service is part of Azure Cognitive Services offering. It uses AI capabilities and is highly customizable. If you want to bring the advanced search capabilities in your application or are planning to replace in-house elastic search solutions like ELK or Apache Solr, I would recommend you to go through this article at least once. I will be covering architecture and features that might be a good start for you.
Why do we need advanced search capabilities?
Let us take one example of e-commerce. I have search integrated into my e-commerce application and I take search term as input and query my product table like this (just a simple search query un-optimized) SELECT productName,categoryName FROM PRODUCTS WHERE productName like ‘%[@searchterm]%’ OR categoryName like ‘%[@searchterm]%’. Now, I will get the result, make it understandable and return suggestions or auto-complete every time and make users redirect to category or product page. What if we had millions of records in our table and query would require some time to return each result? Things could have been better if our search was efficient and it should have been user-friendly. The above one was just one use case, what if we have data distributed in multiple data sources such as structured, semi-structured, or unstructured PDFs, Word Files, Images, etc.? For the following reasons we need advanced search capabilities:
- Spelling auto-correction could have made it easier to use for typos.
- Reducing the load on the database to on every search term query made it faster for users and saved CPU utilization of Database.
- Supporting multilingual capabilities if we were global e-commerce would have made it accessible to a global audience.
- Allow people to search based on synonyms, like (notebook, register, copy, etc. generally indicate the same intent) made better User Experience, etc.
- Query from multiple data sources made centralized search experience.
- Any search solution similar to popular e-commerce platforms like Amazon, e-Bay, Flipkart, etc. or maybe 1% capabilities of Google.com or Bing.com?
What is Azure Cognitive Search?
Azure Cognitive Search is a cloud search service, it is part of the Applied AI Service in Azure Cognitive Services offering. It is a very powerful solution, has advanced capabilities, and can get custom search up and running very quickly. It uses AI capabilities to extract rich meaning from structured, semi-structured, and unstructured data sources and makes it searchable using various API endpoints. It is available as a Free Tier too for those who want to explore.
Architecture
There are two workloads, Indexing and Searching. Below are the details:
Indexing
Indexing is the process of converting data from various sources to a searchable dataset known as Index. The indexing is done by creating an enrichment pipeline. This pipeline is customizable where we can add our custom logic. We tell the search engine what, where, and how to search.
The indexing is done by the workers named “Indexers”.
There are various stages of indexing, this is also known as the indexing pipeline. Below are the stages:
Document Cracking
The document if it exists is converted from image to text or ML model maps image to description and output as text is received.
Field Mapping
We specify which fields to be mapped from data source to destination. This is optional.
Skillset Execution
In this step, we specify what action to take on fields like customization, language translation, returning sentiment, extracting key phrases all using AI capabilities of Azure Cognitive services.
Output Field Mappings
This step is used when we specify skillset. Field mappings associate the content of the source field to a destination field in a search index. Output field mappings associate the content of an internal enriched document (skill outputs) to destination fields in the index.
[Image Source: https://docs.microsoft.com/en-in/azure/search/search-indexer-overview]
Searching
It is able to search indexed data based on various scenarios. It uses OData Protocol for performing searches. We can pass fields to target and specify what kind of search to perform.
It supports two types of Apache Lucene-based query languages, simple and full.
Simple
It takes input as a search text and gets the results.
Full
It takes input as search expression term like (category: stationary AND ProductName: book*)
Stages of Searching
Query parsing
The query parser restructures the subqueries into a query tree.
Lexical analysis
Reduces the words to most basic forms, removes articles other words like “the, and, etc”, creates consistent casing, etc.
Document retrieval
It returns the documents based on searchable fields defined in the enrichment pipeline.
Scoring
The scoring is done based on Lucene's Practical Scoring Formula. The length of sentences and other factors influence the scores. We get score as part of response
[Image Source: https://docs.microsoft.com/en-in/azure/search/search-lucene-query-architecture]
[Image Source:https://docs.microsoft.com/en-in/azure/search/semantic-search-overview]
Demo
- Login to portal.azure.com
- Search for Cognitive Search
- Select Free Tier.
- Fill all the required details are reviewed,
- Click Create
- Navigate to Resource
- Click on “Create an Indexer” and then click on "Submit" button.
- Open the Azure Search Explorer and click on Search button.
That's it!! Thanks for reading. Hope you found this insightful! Please feel free to share your thoughts or suggestions.