Build a Chatbot with Retrieval-Augmented Generation (RAG)

Article

About RAG

A chatbot typically only answers questions based on what it already knows, much like a student who has studied a book from last year.

But RAG is smarter. It does two things.

Finds information from external sources, such as websites and documents.
Uses the data it finds to write answers.

The chatbot is like a student who remembers what they have learned in the past and can quickly look up the answer in books or on Google. That means the answers are correct, up-to-date, and valuable.

Chatbots that use RAG can,

Give answers that are better and more accurate.
Use information or documents from outside sources.

Chatbots

Set up dependencies

pip install transformers, datasets, faiss-cpu, sentence-transformers, gradio

Transformers provide us with intelligent language models (such as BART and FLAN-T5) that can comprehend and generate text.
Datasets make it easy to load and use sample text data.
Faiss-cpu is a fast method for finding similar text across multiple documents.
Sentence-transformers convert text into numbers (embeddings) so that FAISS can perform better searches.
Gradio is a simple web app that lets you use and test the chatbot.

Prepare your Knowledge Base

This is the information your chatbot will use to find answers.

You can use,

Your own documents
Text extracted from websites
Small sections of any written material

For the time being, we will use a contrived, small demo list, such as Wikipedia facts.

documents = [
    "The Eiffel Tower is found in Paris, France.",
    "Python is a flexible programming language that is used for web development, data science, and many other roles.",
    "The Pacific Ocean is the largest and deepest of the Earth's oceanic divisions."
]

This is similar to giving the chatbot a small notebook to read from. Later, the chatbot will search for this to answer your questions.

Create your Embeddings

Now we will convert our text into numbers (called embeddings) so that the computer can search and comprehend them.

Steps to Implement

Use a pretrained model to convert sentences into a number form.

from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
embedder = SentenceTransformer("all-MiniLM-L6-v2")
doc_embeddings = embedder.encode(documents, convert_to_numpy=True)

This model converts your text into embeddings, essentially assigning numerical meaning to the text.

Embed the documents

# Create FAISS index
index = faiss.IndexFlatL2(doc_embeddings.shape[1])
index.add(doc_embeddings)

Build a FAISS index

import faiss
import numpy as np
index = faiss.IndexFlatL2(doc_embeddings.shape[1])
index.add(doc_embeddings)

FAISS will use this to quickly search and hit on the most similar text given a question.

Specify Retrieval + Generation Pipeline

Next, let's create a function that does two things.

Fetches the best matching text from our documents (retrieval).

Generates a clever answer from that text (generation).

from transformers import pipeline
generator = pipeline("text2text-generation", model="google/flan-t5-base")

This loads a ready-made brain (FLAN-T5) that can read a question and write a full answer.

Function to Handle Questions

def retrieve_and_generate(query, top_k=1):

We define a function that takes a question (query) and searches for the best match (top_k=1 means only one best match).

Step 1. Convert the question into a vector (numbers).

query_embedding = embedder.encode([query], convert_to_numpy=True)

Step 2. Use that vector to search for a matching document.

_, indices = index.search(query_embedding, top_k)
retrieved_docs = [documents[i] for i in indices[0]]

Step 3. Create a prompt with context and a question.

context = "\n".join(retrieved_docs)
prompt = f"Context: {context}\n\nQuestion: {query}\nAnswer:"

We prepare the input in a way that the model can understand: provide background information and a question.

Step 4. Generate an answer using the model

response = generator(prompt, max_length=100, do_sample=False)[0]['generated_text']
return response.strip()

The model reads the prompt and generates an answer. That is what we return as the final answer.

Whenever you submit a question, this function first determines the most relevant text, then provides it to the chatbot model, and finally, the model generates a proper answer using the related text.

Develop a Gradio Interface

Now we will build a simple app where you can enter a question and receive an answer. This is done using Gradio, which is an easy way to make a small web page that acts as a UI with a text box.

import gradio as gr

Gradio is designed to create an intuitive user interface (UI) for our chatbot.

def chat_with_rag(question):
    return retrieve_and_generate(question)

This function takes your question and returns the answer from our RAG system.

demo = gr.Interface(fn=chat_with_rag, inputs="text", outputs="text", title="RAG Chatbot")
demo.launch()

This creates the chatbot window, at which point you can type questions and see the answers live in your browser.

Here are some questions you can ask.

"Where is the Eiffel Tower?"
"What is Python used for?"
"Which ocean is the largest?"

You created a working chatbot app, where you could type a question, retrieve the correct information, and generate an answer, essentially like a smaller Google or ChatGPT.

Optional Enhancements (if you want to make it better)

Use a larger model, namely flan-t5-xl, to get more thoughtful answers.
Store more information, like document titles and links.
Break large documents into smaller chunks when adding.
Share your app online using Hugging Face Spaces.

Conclusion

RAG (Retrieval-Augmented Generation) is an intelligent approach to developing chatbots that do not simply make educated guesses in their answers; instead, they conduct a search for information and then respond accordingly. This is much more accurate and helpful for your users.

You can use it to.

Support customers
Answer research questions
Teach or guide individuals

Now that your chatbot is functional, you can do some advanced things.

Use your own documents
Connect it to PDFs, websites, or even databases

With minimal tooling, you can now build a very powerful chatbot that responds with accurate and helpful information.