Introduction
In this article, I will explain what we mean by "Human-Computer Interaction" and "Natural Language Processing".
What is the meaning of 'Human'?
According to
britannica.com, a human being is a culture-bearing primate identified in the genus Homo, in specific the species H. Sapiens, guy. Human beings are anatomically identical and related to great apes, but they are characterized by a more finely evolved intellect and the subsequent ability to express expression and logical thought.
Some of the features of 'Human' are:
- ability to communicate systematically using words, symbols, body gestures/posture, and facial expressions.
- make our own decisions and bear the consequences of them.
- make and wear clothing, accessories, and other necessities for human life.
- become individuals in the process of making our own life in what each one of us wants to be in the future.
- think about thinking, to ponder on the past, present, and future.
- fit into different personality groups, but our experiences with the personality type are special and different within ourselves.
- fit into different racial, cultural, religious, and political groups.
- have a government or a governing body.
- being unique as an individual in our choices of who we want to be as a person, in our clothing, preferences, talents/gifts, perspectives, likes/dislikes.
- live in an economy.
- inherit our genes and behaviors from our parents, grandparents, aunts, uncles and our great grandparents.
- unique and special as a species.
What do we mean by 'Computer'?
According to Wikipedia, The device is a system that can be ordered to conduct sequences of arithmetic or logical operations automatically through computer programming. Modern computers have the capacity to control abstract collections of operations called programs. Such programs enable computers to conduct an incredibly large variety of tasks.
Some of the features are:
- Speed
In general, no human being can compete in solving complex computations, faster than a computer.
- Accuracy
Since the Computer is programmed, so whatever input we give it gives a result accurately.
- Storage
The computer can store mass storage of data with appropriate formate.
- Diligence
The computer can work for hours without any break and creating error.
- Versatility
We can use a computer to perform completely different types of work at the same time.
- Power Of Remembering
It can remember data for us.
- No IQ
The computer does not work without instruction.
- No Feeling
Computer does not have emotions, knowledge, experience, feel
What is Human-Computer Interaction?
According to Wikipedia, Human-machine interaction (HCI) explores the creation and application of digital technologies, concentrating on interactions between humans (users) and machines. HCI researchers are looking into how humans communicate with computers and develop systems that enable humans to connect with computers in a novel way. As a research area, human-interaction is located at the convergence of information science, behavioral sciences, architecture, media studies and a variety of other fields of analysis. The concept was popularized by Stuart K. Card, Allen Newell, and Thomas P. Moran in their 1983 novel 'The Neuroscience of Human-Computer Interaction'.
HCI may be readily grasped because it is the process or language with which we(' person') interact with the device(' machine') to take advantage of it in cases where person capacities fell short of computers, For Example, computers can perform complex arithmetic calculations very fast and much accurately, or machines never get tiered hence can be commissioned to work 24x7, etc.
Some common examples of HCI are Google Assistant/Voice App, Amazon Alexa, IBM Bluemix, Apple Siri, etc.
Goals of HCI
The goals of HCI are to produce usable and safe systems, as well as functional systems. In order o produce computer systems with good usability, developers must attempt to:
- understand the factors that determine how people use technology
- develop tools and techniques to enable building suitable systems
- achieve efficient, effective, and safe interaction
- put people first
Natural Language
According to Wikipedia, Throughout neuropsychology, linguistics and language theory, normal language or common language is any language that has spontaneously developed throughout humans by usage and experience without deliberate thought or premeditation. Natural languages may take various types, such as voice or signature.
Some examples of Natural language are English, Hindi, Sign Language, Brail, etc.
Natural Language Processing
According to Wikipedia, Natural Language Processing (NLP) is sub-in linguistics, computer science, information engineering, and artificial intelligence dealing with machine-(natural) language encounters, in particular how computers can be configured to handle and interpret vast quantities in natural language data.
Different types of Natural Language processing include:
- NLP based on Text, Voice and Audio.
- NLP based on computational models.
- NLP based on Text Analysis that leads to Discussion, Review, Opining, Contextual, Dictionary building/Corpus building, linguistic, semantics, ontological and many fields.
- Pattern Analysis and discovery of new insights such as writing styles.
- Machine Translation of languages.
- Machine Learning: e.g Prediction & Classification of positive, negative views.
- First Order logic
- Automatic Report Generation from Data.
Some of the common terms related to NLP and HCI:
1. Weka
According to Wikipedia, Waikato Environment for Knowledge Analysis (Weka), developed at the University of Waikato, New Zealand. This is free software available under the GNU General Public License and the accompanying program for the book "Data Mining: Practical Methods and Techniques for Machine Learning.
Weka includes a range of data processing and predictive modeling methods and algorithms, along with streamlined user interfaces for quick access to these functions. Weka's initial non-Java implementation was a Tcl / Tk front-end of (mostly third-party) simulation algorithms implemented in other programming languages, including C-based data pre-processing tools, and a Makefile-based machine learning application run method.
Advantages of Weka include:
- Free availability under the GNU General Public License.
- Portability, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform.
- A comprehensive collection of data preprocessing and modeling techniques.
- Ease of use due to its graphical user interfaces.
Weka can be downloaded by following the official
Weka link.
The following images show how we load a dataset into the weka tool, the attribute window shows the list of attributes that are present in the dataset.
Weka provides various filters that can be applied to the dataset, to preprocess the dataset, and hence do the feature selection.
The following image shows the visualization of various features:
2. IBM Watson BlueMix
Image Courtesy: IBM
According to Wikipedia, Watson is a question-responding computer system capable of answering questions asked in natural language, developed by a research team led by lead researcher David Ferrucci in IBM's DeepQA project. Watson was named for the father of IBM and the first CEO, Thomas J. Watson, an industrialist.
Initially, the computer program was built to address questions about the Jeopardy quiz series! Yet Watson's computing program played on Jeopardy in 2011! Won the $1 million first-place award over Champions Brad Rutter and Ken Jennings.
Watson can be used to build a chatbot, official documentation can be found at this
link:
Image Courtesy: IBM
According to Wikipedia, IBM Bluemix renamed IBM Cloud in 2017, is a cloud platform as a service (PaaS) established by IBM. This embraces growing programming languages and facilities, as well as DevOps implemented to develop, operate, deploy and handle cloud applications. Bluemix is free platform focused on Cloud Foundry which operates on the SoftLayer framework.
Bluemix supports many programming languages including Java, Node.js, Go, PHP, Swift, Python, Ruby Sinatra, Ruby on Rails and can be expanded to support other languages such as Scala by buildpacks.
Image Courtesy: IBM
3. Stemming
According to Wikipedia, In linguistic morphology and knowledge retrieval, stemming is the method of reducing inflected (or often derived) words to their word source, foundation or root type — usually written word type. The stem does not have to be equivalent to the morphological root of the word; it is generally appropriate that the associated terms map to the same stem, even though the stem itself is not a true root. Stemming algorithms have been researched in computer science since the 1960s. Often search engines view terms of the same stem as synonyms as a form of question extension, a mechanism called conflation.
>
A machine system or subroutine comprising a term can be named a stemming method, a stemming algorithm, or a stemmer.
For example, give the sentence:
"I will be stemming the given sentence so that I could retrieve the stem(s).'
If I pass it to PorterStemmer it will result:
['I', 'will', 'be', 'stem', 'the', 'given', 'sentenc', 'so', 'that', 'I', 'could', 'retriev', 'the', 'stem', '(', 's', ')', '.'].
it can be easily seen that "stemming" will result in "stem".
Various Stemming algorithms available are,
- Porter Stemming
- Lovins Stemming
- Dawson Stemming
- Krovetz Stemming
- Xerox Stemming
- N-Gram Stemming
To better understand stemming you can
visit.
4. Porter2 Stemming Algorithm
Porter's algorithm consists of 5 phrases of word reduction, applied sequentially. Within each phrase, there are various conventions to select rules, such as selecting rule from each group that applies to the longest suffix. In the first phase, this convention is used within the following rule group:
SSES ---> SS caresses ---> caress
IES ---> I ponies ---> poni
SS ---> SS caress ---> caress
S ---> cats ---> cats
Installation of Porter2 Stemmer
At the command line:
- easy_install porter2stemmer
Or, if you have virtualenvwrapper installed:
- mkvirtualenv porter2stemmer$ pip install porter2stemmer
Or, another way, is:
- firefox <https://pypi.python.org/pypi/stemming/1.0>
-
- mv /home/downloads/[package] /home/speech_recognition/
- tar -xvfz [package]
- sudo python setup.py install
Example:
- import nltk
- from nltk.stem.porter import *
-
- porterStemmer = PorterStemmer()
-
- sentence = "Provision Maximum multiply owed caring on go gone going was this"
- wordList = nltk.word_tokenize(sentence)
-
- stemWords = [porterStemmer.stem(word) for word in wordList]
-
- print(' '.join(stemWords))
Output
provis maximum multipli owe care on go gone go wa thi
5. PocketSphinx
PocketSphinx is a lightweight speech recognition system originally developed for smartphones and portable apps, but it functions as well on the desktop. It is published under the same license as Sphinx itself. Sphinx is a platform that makes it easier to build smart and beautiful files, written by Georg Brandl and approved under the BSD license.
It has ready to use binaries for MAC and Windows separately. PocketSphinx is dependent on the SphinxBase, hence it is mandatory to have SphinxBase pre-installed on your system.
SphinxBase is the CMU open-source library that is used by the CMU Sphinx to run. It can be installed as:
- git clone <https://github.com/cmusphinx/sphinxbase.git>
- cd sphinxbase
- ./autogen.sh
- make clean all
- make check
- sudo make install
to configure the path
- export LD_LIBRARY_PATH=/usr/local/lib
- sudo nano /etc/ld.so.conf
-
- /usr/local/lib
- sudo ldconfig //to apply the configuration
Other dependencies can be downloaded using the following commands:
- sudo apt-get install -y python python-dev python-pip build-essential swig git libpulse-dev
To install PocketSphinx on your PC, you can execute the following commands:
- git clone <https://github.com/cmusphinx/pocketsphinx.git>
- cd pocketsphinx
- ./autogen.sh
- make clean all
- make check
- sudo make install
Or, you can install it using PIP, as given below:
- python -m pip install --upgrade pip setuptools wheel
- pip install --upgrade pocketsphinx
To install PocketSphinx on Windows, you can use the following commands:
- sudo apt-get install -y python python-dev python-pip build-essential swig git
- git clone --recursive https://github.com/cmusphinx/pocketsphinx-python/
- cd pocketsphinx-python
- sudo python setup.py install
PocketSphinx is a library that can have binaries for Python, Java, and C++, hence I can be used in Android development as well as on embedded systems that use Embedded-C.
To generate the required output from the PockeSphinx you need to execute the following commands
- sudo pocketsphinx_continuous -inmic yes -lm 5371.lm -dict 5371.dic > NLP.txt
In the above code, we provide,
- lm- Language Model
- dict- dictionary file
The output file is 'NLP.txt'
6. CMUSphinx Knowledge Base
The lmtool constructs a standard collection of decoder lexical and language model data. The target decoders are the Sphinx family, but they can be used by any device that can read ARPA-format data. Lmtool is actually designed for the English language (and its American variant in particular). When you transfer a corpus to another language, the performance is generally uncertain.
Q ) CMUSphinx Knowledge Base takes what as input and gives what as output?
To use the tool you need to have the corresponding Sentence Corpus file (It can be a .txt file), which when loaded results to give a Knowledge base as the output.
Note: I have attached a demo file along with the generated knowledge base (demo.zip). you can also access my knowledge base from
this.
Q) What does the CMUSphinx Knowledge Base output contain?
The knowledge base hence generated contains the following files:
6682.dic 3.3K Pronunciation Dictionary
6682.lm 18K Language Model
6682.log_pronounce 3.9K Log File
6682.sent 1.9K Corpus (processed)
6682.vocab 1.0K Word List
TAR6682.tgz 3.1K COMPRESSED TARBALL
7. SphinxTrain
This is an acoustic platform used in combination with CMU Sphinx. In Automatic Speech Recognition, an auditory model is used to reflect the interaction between an audio signal and the phonemes or other speech-forming language components. The pattern is taught from a selection of audio recordings and transcripts. It is created by taking audio recordings of speech and their text translations, and by using algorithms to construct mathematical representations of the sounds that make up each word.
To install SphinxTrain you can use the following commands:
- git clone <https://github.com/cmusphinx/sphinxtrain.git>
- cd sphinxtrain
- ./autogen.sh
- make clean all
- make check
- sudo make install
8. CMUSphinx-Code
It is the CMU open-source library that is used by the CMU Sphinx.
to install CMUSphinx-Code, the following commands can be used
- svn checkout svn://svn.code.sf.net/p/cmusphinx/code/trunk cmusphinx-code
- cd cmusphinx-code
- cd cmuclmtk
- make clean all
- make check
- sudo make install
9. NLTK (Natural Language Tool Kit)
According to Wikipedia, The Natural Language Toolkit, or more generally the NLTK, is a collection of libraries and programs for symbolic and computational natural language processing (NLP) in the Python programming language. It was created by Steven Bird and Edward Loper at the Department of Computer Science and Information Sciences at the University of Pennsylvania. NLTK provides reference details and statistical proofs. It's followed by a book describing the basic ideas behind the toolkit-supported language processing functions, plus a cookbook.
NLTK is a leading forum for developing Python programs that function with data in the human language. This offers easy-to-use applications for over 50 organizational and lexical tools such as WordNet, along with a collection of text processing libraries for grouping, tokenization, halting, labeling, sorting, and semantic reasoning, wrappers for NLP libraries of industrial intensity, and an engaging platform for debate.
NLTK is equally appropriate for professional linguists, developers, students, researchers, scholars and users. NLTK supports Macintosh, Mac OS X, Linux and others. Best of all, the NLTK is an open-source project that is managed by the user, online.
Commands to install NLTK
- sudo pip install --user -U nltk
-
- sudo pip3 install nltk
-
- sudo easy_install pip
To download the NLTK data, follow the following steps
1. Execute the following command to open the GUI application
- import nltk
- nltk.download()
2. A new window should open, showing the NLTK Downloader. Click on the File menu and select Change Download Directory.
3. For central installation, set this to C:\nltk_data (Windows), /usr/local/share/nltk_data (Mac), or /usr/share/nltk_data (Unix).
4. Next, select the packages or collections you want to download.
Note: I downloaded all the packages, you can execute the following to download all packages using CLI
- python -m nltk.downloader all
- // OR
-
- // to do a system wide installation on linux, etc.
- sudo python -m nltk.downloader -d /usr/local/share/nltk_data all
The official NLTK tutorial series can be found at
NLTK Book.
Now, let's take have a look at how to use NLTK,
The above command will result in the following:
- sorted([w for w in text5 if w.startswith('b')])
The above command will return a sorted list of all the words from the demo text number 5, which starts with 'b'. A snapshot of the output is as follows:
The output that I got is 4569, that means we have 4569 'a' in the demo text 1
10. eSpeak
The eSpeakNG is a portable speech synthesizer for Linux, Windows and other devices. It uses a method of formant synthesis which provides several languages in a limited size. Most of the programming for the language help of eSpeakNG is performed using the rule files with native speaker input.
eSpeak is available as:
- A command-line program (Linux and Windows) to speak text from a file or from stdin.
- A shared library version for use by other programs. (On Windows this is a DLL).
- A SAPI5 version for Windows, so it can be used with screen-readers and other programs that support the Windows SAPI5 interface.
- eSpeak has been ported to other platforms, including Android, Mac OSX and Solaris.
Features.
- Includes different Voices, whose characteristics can be altered.
- It can produce speech output as a WAV file.
- SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
- Compact size. The program and its data, including many languages, totals about 2 Mbytes.
- It can be used as a front-end to MBROLA diphone voices, see mbrola.html. eSpeak converts text to phonemes with pitch and length information.
- It can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
- Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome.
- Development tools are available for producing and tuning phoneme data.
- Written in C.
To install eSpeak, execute the following commands:
- sudo apt-get install espeak
eSpeak GUI version can be downloaded through the following commands:
- sudo apt-get install gespeaker
Example
Following we have
- if door_state == True:
- print("Door Opened")
- espeak.set_voice("ru")
- espeak.synth("Door Opened")
- time.sleep(1)
-
- else:
- print("Door Closed")
- espeak.set_voice("ru")
- espeak.synth("Door Closed")
- time.sleep(1)
In the above code, if door_state is 'True', then we will here, 'Door Opened', else we will hear 'Door closed', with a time gap of 1 ms.
Now let me share with you a small piece of code, that I created to demonstrate NLP.
changeDoorState.py
- from stemmers import porter2
- from espeak import espeak
- import time
-
- def change_state(door_state, text):
-
- if isValid(text) == False:
- return door_state
-
- open_key = ['open','door']
- close_key = ['close','door']
-
- stem_cmd = [porter2.stem(i) for i in text]
-
- if door_state == False:
- key = open_key
- elif door_state == True:
- key = close_key
-
- for w in key:
- if w not in stem_cmd:
- print("invalid command")
- espeak.set_voice("ru")
- espeak.synth("invalid command")
- time.sleep(1)
- return door_state
-
- door_state = not door_state
-
- if door_state == True:
- print("Door Opened")
- espeak.set_voice("ru")
- espeak.synth("Door Opened")
- time.sleep(1)
- else:
- print("Door Closed")
- espeak.set_voice("ru")
- espeak.synth("Door Closed")
- time.sleep(1)
-
- return door_state
-
-
- def isValid(text):
- stemmed = [porter2.stem(i) for i in text]
-
- if (len(text) == 1) or ('open' in stemmed and 'close' in stemmed):
- print("Invalid command.")
- espeak.set_voice("ru")
- espeak.synth("Invalid Command")
- time.sleep(1)
- return False
-
- return True
opendoor.py
- from changeDoorState import change_state
- from espeak import espeak
- import time
-
- door_state = False
-
-
- f=open("NLP.txt","r+")
- while True:
-
- if door_state == True:
- print("Door state: Opened")
- espeak.set_voice("ru")
- espeak.synth("Door state: Opened")
- time.sleep(1)
- else:
- print("Door state: Closed")
- espeak.set_voice("ru")
- espeak.synth("Door state: Closed")
- time.sleep(1)
-
- cmd=f.readline()
- command = cmd.split()
- if len(command) == 0:
- continue
- door_state = change_state(door_state, command)
OUTPUT
Conclusion
In the above article, I tried to explain Natural Language Processing and Human-Computer Interaction. I hope you were able to understand each and everything explained in the article.