In this article, I’ll show you how you can create a basic chatbot that utilizes your data. Here we will be using GPT-Index, OpenAI, and Python.
Let’s get started by installing the required Python module.
Install modules/packages
We need to install two packages named gpt_index and langchain, and this can be done using below lines:
pip install gpt_index
pip install langchain
Importing packages
Next, we need to import those packages so that we can use them:
from gpt_index import SimpleDirectoryReader,GPTListIndex,GPTSimpleVectorIndex,LLMPredictor,PromptHelper
from langchain import OpenAI
import sys
import os
Please note that we don’t need a GPU here because we are not doing anything locally. All we are doing is using the OpenAI server.
Grab OpenAI Key
To grab the OpenAI key, you need to go to https://openai.com/, log in and then grab the keys using highlighted way:
Once you get the key, set that inside an environment variable(I’m using Windows). If you do not want to set it as an environment, you must pass this key in every function call.
os.environ["OPENAI_API_KEY"] = "YOUR_KEY"
Collect Data
Once our environment is set, the next thing we need is data. Here, you can either take an URL with all the data or take the data, which is already downloaded and available in the form of a flat file.
Once the text file is downloaded, keep it in a directory. If you have multiple text files, you can keep all of them in the same directory.
Now we have the data and knowledge. The next thing is to use this knowledge base.
Create Index
We need to create an index using all the text files. For this, we will create a function to take the directory path where our text file is saved.
def create_index(path):
max_input = 4096
tokens = 200
chunk_size = 600 #for LLM, we need to define chunk size
max_chunk_overlap = 20
#define prompt
promptHelper = PromptHelper(max_input,tokens,max_chunk_overlap,chunk_size_limit=chunk_size)
#define LLM — there could be many models we can use, but in this example, let’s go with OpenAI model
llmPredictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-ada-001",max_tokens=tokens))
#load data — it will take all the .txtx files, if there are more than 1
docs = SimpleDirectoryReader(path).load_data()
#create vector index
vectorIndex = GPTSimpleVectorIndex(documents=docs,llm_predictor=llmPredictor,prompt_helper=promptHelper)
vectorIndex.save_to_disk(‘vectorIndex.json’)
return vectorIndex
The above process is called embedding, and you need to do this again only when new data flows in.
Create Answering System
Next, we need to build a system that can respond to user. Let’s create a function for that.
def answerMe(vectorIndex):
vIndex = GPTSimpleVectorIndex.load_from_disk(vectorIndex)
while True:
input = input(‘Please ask: ‘)
response = vIndex.query(input,response_mode=”compact”)
print(f”Response: {response} \n”)
Test The System
Finally, it’s time to test the system. Run your application, ask some questions, and get a response.
I hope you enjoyed creating your basic bot.
If you find anything unclear, watch my video demonstrating this entire flow here.