Creating A Voice Assistant Using Python And It's Libraries

Dhanush M
4y
28.6k
0
5

Article

main.py.zip

Introduction

We use several voice assistants by various companies from around the world, like Amazon’s Alexa, Google’s Google Assistant, Apple’s Siri. In this article, we will discuss creating a voice assistant using the Python programming language and its libraries. We are using different libraries which are available in Python. As I said in my earlier articles Python has a lot of library files that are used for different purposes. In this article, we will see how to create a voice assistant using Python.

Installing Libraries

For creating this program we need to install some library files from the internet for the purpose of our usage. These library files can be installed using the following terminal commands.

pip install Pyaudio
pip install speech_recognition
pip install playsound
pip install gtts
pip install ssl
pip install certify
pip install webrowser

Step 1 - Importing libraries

At the first stage of programming, we need to import the libraries and functions which are required for the execution of the program. We may import libraries like speech_recognition for recognition of what we are speaking, playground for the purpose of making the system to speak or play sound, gtts (Google text to Speech) for making the system to interact with you and for getting a voice, random for the purpose of giving the random names for the audio files generated by the program, ctime for the purpose of telling you the time if you required, web browser for the purpose for opening and accessing the browser of your system and os for the purpose of destroying the audio files created by the program.

import speech_recognition as sr # recognise speech
import playsound # to play an audio file
from gtts import gTTS # google text to speech
import random
from time import ctime # get time details
import webbrowser # open browser
import ssl
import certifi
import time
import os # to remove created audio files

Step 2 - Recognition of Speech

The next important step in voice assistant is recognition of speech and the speaker. To make the system understand what we are saying we need to convert the voice to text. For the purpose of learning more about this concept please refer to my friend Naveenkumar Paramasivam's article by clicking here. Then we need to save the name of the person who is interacting with the system.

r = sr.Recognizer() # initialise a recogniser
# listen for audio and convert it to text:
def record_audio(ask=False):
with sr.Microphone() as source: # microphone as source
if ask:
speak(ask)
audio = r.listen(source) # listen for the audio via source
voice_data = ''
try:
voice_data = r.recognize_google(audio) # convert audio to text
except sr.UnknownValueError: # error: recognizer does not understand
speak('I did not get that')
except sr.RequestError:
speak('Sorry, the service is down') # error: recognizer is not connected
print(f">> {voice_data.lower()}") # print what user said
return voice_data.lower()

Step 3 - Replying to the question

To make the system reply during the conversation we are using Google's text to speech library for the purpose of making the computer speak out. First we need to make the system respond in the form of text and then using the gtts library we can make the system read the text.

def speak(audio_string):
tts = gTTS(text=audio_string, lang='en') # text to speech(voice)
r = random.randint(1,20000000)
audio_file = 'audio' + str(r) + '.mp3'
tts.save(audio_file) # save as mp3
playsound.playsound(audio_file) # play the audio file
print(f"May Day: {audio_string}") # print what app said
os.remove(audio_file) # remove audio file

Step 4 - Responding for the questions

At least you need to make the system respond to the questions asked by the user. In this program, I just included a few ways of responding to the questions like details about the stocks, searching something on the browser and youtube, and some more basic things. You may customize it based on your requirement.

import speech_recognition as sr # recognise speech
import playsound # to play an audio file
from gtts import gTTS # google text to speech
import random
from time import ctime # get time details
import webbrowser # open browser
import ssl
import certifi
import time
import os # to remove created audio files
class person:
name = ''
def setName(self, name):
self.name = name
def there_exists(terms):
for term in terms:
if term in voice_data:
return True
r = sr.Recognizer() # initialise a recogniser
# listen for audio and convert it to text:
def record_audio(ask=False):
with sr.Microphone() as source: # microphone as source
if ask:
speak(ask)
audio = r.listen(source) # listen for the audio via source
voice_data = ''
try:
voice_data = r.recognize_google(audio) # convert audio to text
except sr.UnknownValueError: # error: recognizer does not understand
speak('I did not get that')
except sr.RequestError:
speak('Sorry, the service is down') # error: recognizer is not connected
print(f">> {voice_data.lower()}") # print what user said
return voice_data.lower()
# get string and make a audio file to be played
def speak(audio_string):
tts = gTTS(text=audio_string, lang='en') # text to speech(voice)
r = random.randint(1,20000000)
audio_file = 'audio' + str(r) + '.mp3'
tts.save(audio_file) # save as mp3
playsound.playsound(audio_file) # play the audio file
print(f"May Day: {audio_string}") # print what app said
os.remove(audio_file) # remove audio file
def respond(voice_data):
speak('How can I help you?')
# 1: greeting
if there_exists(['hey','hi','hello']):
greetings = [f"hey, how can I help you {person_obj.name}", f"hey, what's up? {person_obj.name}", f"I'm listening {person_obj.name}", f"how can I help you? {person_obj.name}", f"hello {person_obj.name}"]
greet = greetings[random.randint(0,len(greetings)-1)]
speak(greet)
# 2: name
if there_exists(["what is your name","what's your name","tell me your name"]):
if person_obj.name:
speak("my name is May day")
else:
speak("my name is May Day. what's your name?")
if there_exists(["my name is","i am"]):
person_name = voice_data.split("is")[-1].strip()
speak(f"okay, i will remember that {person_name}")
person_obj.setName(person_name) # remember name in person object
# 3: greeting
if there_exists(["how are you","how are you doing"]):
speak(f"I'm very well, thanks for asking {person_obj.name}")
# 4: time
if there_exists(["what's the time","tell me the time","what time is it"]):
time = ctime().split(" ")[3].split(":")[0:2]
if time[0] == "00":
hours = '12'
else:
hours = time[0]
minutes = time[1]
time = f'{hours} {minutes}'
speak(time)
# 5: search google
if there_exists(["search for"]) and 'youtube' not in voice_data:
search_term = voice_data.split("for")[-1]
url = f"https://google.com/search?q={search_term}"
webbrowser.get().open(url)
speak(f'Here is what I found for {search_term} on google')
# 6: search youtube
if there_exists(["youtube"]):
search_term = voice_data.split("for")[-1]
url = f"https://www.youtube.com/results?search_query={search_term}"
webbrowser.get().open(url)
speak(f'Here is what I found for {search_term} on youtube')
# 7: get stock price
if there_exists(["price of"]):
search_term = voice_data.lower().split(" of ")[-1].strip() #strip removes whitespace after/before a term in string
stocks = {
"apple":"AAPL",
"microsoft":"MSFT",
"facebook":"FB",
"tesla":"TSLA",
"bitcoin":"BTC-USD"
}
try:
stock = stocks[search_term]
stock = yf.Ticker(stock)
price = stock.info["regularMarketPrice"]
speak(f'price of {search_term} is {price} {stock.info["currency"]} {person_obj.name}')
except:
speak('oops, something went wrong')
if there_exists(["exit", "quit", "goodbye"]):
speak("going offline")
exit()
time.sleep(1)
person_obj = person()
while(1):
voice_data = record_audio() # get the voice input
respond(voice_data) # respond

Conclusion

Even though it is not the way of creating a perfect personal assistant, this may help you to create a personal assistant based on your requirement and your need. You may customize this code based on your requirement for your usage. Thank you.