Introduction
We use several voice assistants by various companies from around the world, like Amazon’s Alexa, Google’s Google Assistant, Apple’s Siri. In this article, we will discuss creating a voice assistant using the Python programming language and its libraries. We are using different libraries which are available in Python. As I said in my earlier articles Python has a lot of library files that are used for different purposes. In this article, we will see how to create a voice assistant using Python.
Installing Libraries
For creating this program we need to install some library files from the internet for the purpose of our usage. These library files can be installed using the following terminal commands.
- pip install Pyaudio
- pip install speech_recognition
- pip install playsound
- pip install gtts
- pip install ssl
- pip install certify
- pip install webrowser
Step 1 - Importing libraries
At the first stage of programming, we need to import the libraries and functions which are required for the execution of the program. We may import libraries like speech_recognition for recognition of what we are speaking, playground for the purpose of making the system to speak or play sound, gtts (Google text to Speech) for making the system to interact with you and for getting a voice, random for the purpose of giving the random names for the audio files generated by the program, ctime for the purpose of telling you the time if you required, web browser for the purpose for opening and accessing the browser of your system and os for the purpose of destroying the audio files created by the program.
- import speech_recognition as sr
- import playsound
- from gtts import gTTS
- import random
- from time import ctime
- import webbrowser
- import ssl
- import certifi
- import time
- import os
Step 2 - Recognition of Speech
The next important step in voice assistant is recognition of speech and the speaker. To make the system understand what we are saying we need to convert the voice to text. For the purpose of learning more about this concept please refer to my friend Naveenkumar Paramasivam's article by clicking
here. Then we need to save the name of the person who is interacting with the system.
- r = sr.Recognizer()
-
- def record_audio(ask=False):
- with sr.Microphone() as source:
- if ask:
- speak(ask)
- audio = r.listen(source)
- voice_data = ''
- try:
- voice_data = r.recognize_google(audio)
- except sr.UnknownValueError:
- speak('I did not get that')
- except sr.RequestError:
- speak('Sorry, the service is down')
- print(f">> {voice_data.lower()}")
- return voice_data.lower()
Step 3 - Replying to the question
To make the system reply during the conversation we are using Google's text to speech library for the purpose of making the computer speak out. First we need to make the system respond in the form of text and then using the gtts library we can make the system read the text.
- def speak(audio_string):
- tts = gTTS(text=audio_string, lang='en')
- r = random.randint(1,20000000)
- audio_file = 'audio' + str(r) + '.mp3'
- tts.save(audio_file)
- playsound.playsound(audio_file)
- print(f"May Day: {audio_string}")
- os.remove(audio_file)
Step 4 - Responding for the questions
At least you need to make the system respond to the questions asked by the user. In this program, I just included a few ways of responding to the questions like details about the stocks, searching something on the browser and youtube, and some more basic things. You may customize it based on your requirement.
- import speech_recognition as sr
- import playsound
- from gtts import gTTS
- import random
- from time import ctime
- import webbrowser
- import ssl
- import certifi
- import time
- import os
- class person:
- name = ''
- def setName(self, name):
- self.name = name
-
- def there_exists(terms):
- for term in terms:
- if term in voice_data:
- return True
-
- r = sr.Recognizer()
-
- def record_audio(ask=False):
- with sr.Microphone() as source:
- if ask:
- speak(ask)
- audio = r.listen(source)
- voice_data = ''
- try:
- voice_data = r.recognize_google(audio)
- except sr.UnknownValueError:
- speak('I did not get that')
- except sr.RequestError:
- speak('Sorry, the service is down')
- print(f">> {voice_data.lower()}")
- return voice_data.lower()
-
-
- def speak(audio_string):
- tts = gTTS(text=audio_string, lang='en')
- r = random.randint(1,20000000)
- audio_file = 'audio' + str(r) + '.mp3'
- tts.save(audio_file)
- playsound.playsound(audio_file)
- print(f"May Day: {audio_string}")
- os.remove(audio_file)
-
- def respond(voice_data):
- speak('How can I help you?')
-
- if there_exists(['hey','hi','hello']):
- greetings = [f"hey, how can I help you {person_obj.name}", f"hey, what's up? {person_obj.name}", f"I'm listening {person_obj.name}", f"how can I help you? {person_obj.name}", f"hello {person_obj.name}"]
- greet = greetings[random.randint(0,len(greetings)-1)]
- speak(greet)
-
-
- if there_exists(["what is your name","what's your name","tell me your name"]):
- if person_obj.name:
- speak("my name is May day")
- else:
- speak("my name is May Day. what's your name?")
-
- if there_exists(["my name is","i am"]):
- person_name = voice_data.split("is")[-1].strip()
- speak(f"okay, i will remember that {person_name}")
- person_obj.setName(person_name)
-
-
- if there_exists(["how are you","how are you doing"]):
- speak(f"I'm very well, thanks for asking {person_obj.name}")
-
-
- if there_exists(["what's the time","tell me the time","what time is it"]):
- time = ctime().split(" ")[3].split(":")[0:2]
- if time[0] == "00":
- hours = '12'
- else:
- hours = time[0]
- minutes = time[1]
- time = f'{hours} {minutes}'
- speak(time)
-
-
- if there_exists(["search for"]) and 'youtube' not in voice_data:
- search_term = voice_data.split("for")[-1]
- url = f"https://google.com/search?q={search_term}"
- webbrowser.get().open(url)
- speak(f'Here is what I found for {search_term} on google')
-
-
- if there_exists(["youtube"]):
- search_term = voice_data.split("for")[-1]
- url = f"https://www.youtube.com/results?search_query={search_term}"
- webbrowser.get().open(url)
- speak(f'Here is what I found for {search_term} on youtube')
-
-
- if there_exists(["price of"]):
- search_term = voice_data.lower().split(" of ")[-1].strip()
- stocks = {
- "apple":"AAPL",
- "microsoft":"MSFT",
- "facebook":"FB",
- "tesla":"TSLA",
- "bitcoin":"BTC-USD"
- }
- try:
- stock = stocks[search_term]
- stock = yf.Ticker(stock)
- price = stock.info["regularMarketPrice"]
-
- speak(f'price of {search_term} is {price} {stock.info["currency"]} {person_obj.name}')
- except:
- speak('oops, something went wrong')
- if there_exists(["exit", "quit", "goodbye"]):
- speak("going offline")
- exit()
-
-
- time.sleep(1)
-
- person_obj = person()
- while(1):
- voice_data = record_audio()
- respond(voice_data)
Conclusion
Even though it is not the way of creating a perfect personal assistant, this may help you to create a personal assistant based on your requirement and your need. You may customize this code based on your requirement for your usage. Thank you.