Introduction
Natural Language Processing is defined as automatic manipulation of a natural language. Natural language is nothing but the way humans communicate with each other, like text, speech in the form of Emails or voices. The main objective of the NLP is to identify the actionable items from the information present in a structured/unstructured form. Nowadays, it has become a major branch of AI to deal between computer language and human language to read/understand/decipher, and then make sense of that information.
There are mainly two components of NLP:
- Natural Language Understanding(NLU)
This is the component where the given information is understood.
- Natural Language Generation(NLG)
This is the component where understood information is used to generate meaningful sentences/phrases.
In general, NLU is a more complex and time-consuming task because if the information is not understood correctly, the information generated would be wrong.
The process of generating new information from given information is divided into six steps:
- Tokenization
This is the process of dividing sentence(s) or info into words. Each word of the sentence is called Token.
- Stemming
This is the process of normalizing each token(word) into its root form. It is done by breaking prefixes/suffixes from word-based to algorithm designed.
- Lemmatization
This is the process of logical analysis of token(word). It is done based on the dictionary contained in the algorithm of the Stemming process. The difference between stemming/lemmatization is logical analysis. For example, in stemming, going and went are two different words, but in the case of amortization, both have the same root as ‘Go,’ which is called lemma
- Tagging
This the process of tagging each word/lemma as Part of Speech say Noun/pronoun/verb. This is a very critical step because this step identifies the context that the word is being used in a sentence.
- NER (Named Entity Recognition)
What if a word has multiple POS? In this case, that word has to go through the process of NER where that is recognized on the basis of a name. Say the lemma is ‘Clock’, indicting time or a real clock. This differentiation is done in the process of NER.
- Chunking
This is the process of grouping a large set of information to form meaningful information.
Today, NLP is becoming very popular in various industries. Some areas where it is being used are:
- Chat Boat
- Spell Check
- Speech Recognition
- Machine Translation
- Advertisement
- Sentimental Analysis
I hope this helps you to learn the basics of NLP!!