About RAG
A chatbot typically only answers questions based on what it already knows, much like a student who has studied a book from last year.
But RAG is smarter. It does two things.
- Finds information from external sources, such as websites and documents.
- Uses the data it finds to write answers.
The chatbot is like a student who remembers what they have learned in the past and can quickly look up the answer in books or on Google. That means the answers are correct, up-to-date, and valuable.
Chatbots that use RAG can,
- Give answers that are better and more accurate.
- Use information or documents from outside sources.
![Chatbots]()
Set up dependencies
pip install transformers, datasets, faiss-cpu, sentence-transformers, gradio
- Transformers provide us with intelligent language models (such as BART and FLAN-T5) that can comprehend and generate text.
- Datasets make it easy to load and use sample text data.
- Faiss-cpu is a fast method for finding similar text across multiple documents.
- Sentence-transformers convert text into numbers (embeddings) so that FAISS can perform better searches.
- Gradio is a simple web app that lets you use and test the chatbot.
Prepare your Knowledge Base
This is the information your chatbot will use to find answers.
You can use,
- Your own documents
- Text extracted from websites
- Small sections of any written material
For the time being, we will use a contrived, small demo list, such as Wikipedia facts.
documents = [
"The Eiffel Tower is found in Paris, France.",
"Python is a flexible programming language that is used for web development, data science, and many other roles.",
"The Pacific Ocean is the largest and deepest of the Earth's oceanic divisions."
]
This is similar to giving the chatbot a small notebook to read from. Later, the chatbot will search for this to answer your questions.
Create your Embeddings
Now we will convert our text into numbers (called embeddings) so that the computer can search and comprehend them.
Steps to Implement
Use a pretrained model to convert sentences into a number form.
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
embedder = SentenceTransformer("all-MiniLM-L6-v2")
doc_embeddings = embedder.encode(documents, convert_to_numpy=True)
This model converts your text into embeddings, essentially assigning numerical meaning to the text.
Embed the documents
# Create FAISS index
index = faiss.IndexFlatL2(doc_embeddings.shape[1])
index.add(doc_embeddings)
Build a FAISS index
import faiss
import numpy as np
index = faiss.IndexFlatL2(doc_embeddings.shape[1])
index.add(doc_embeddings)
FAISS will use this to quickly search and hit on the most similar text given a question.
Specify Retrieval + Generation Pipeline
Next, let's create a function that does two things.
This loads a ready-made brain (FLAN-T5) that can read a question and write a full answer.
Function to Handle Questions
def retrieve_and_generate(query, top_k=1):
We define a function that takes a question (query) and searches for the best match (top_k=1 means only one best match).
Step 1. Convert the question into a vector (numbers).
query_embedding = embedder.encode([query], convert_to_numpy=True)
Step 2. Use that vector to search for a matching document.
_, indices = index.search(query_embedding, top_k)
retrieved_docs = [documents[i] for i in indices[0]]
Step 3. Create a prompt with context and a question.
context = "\n".join(retrieved_docs)
prompt = f"Context: {context}\n\nQuestion: {query}\nAnswer:"
We prepare the input in a way that the model can understand: provide background information and a question.
Step 4. Generate an answer using the model
response = generator(prompt, max_length=100, do_sample=False)[0]['generated_text']
return response.strip()
The model reads the prompt and generates an answer. That is what we return as the final answer.
Whenever you submit a question, this function first determines the most relevant text, then provides it to the chatbot model, and finally, the model generates a proper answer using the related text.
Develop a Gradio Interface
Now we will build a simple app where you can enter a question and receive an answer. This is done using Gradio, which is an easy way to make a small web page that acts as a UI with a text box.
import gradio as gr
Gradio is designed to create an intuitive user interface (UI) for our chatbot.
def chat_with_rag(question):
return retrieve_and_generate(question)
This function takes your question and returns the answer from our RAG system.
demo = gr.Interface(fn=chat_with_rag, inputs="text", outputs="text", title="RAG Chatbot")
demo.launch()
This creates the chatbot window, at which point you can type questions and see the answers live in your browser.
Here are some questions you can ask.
- "Where is the Eiffel Tower?"
- "What is Python used for?"
- "Which ocean is the largest?"
You created a working chatbot app, where you could type a question, retrieve the correct information, and generate an answer, essentially like a smaller Google or ChatGPT.
Optional Enhancements (if you want to make it better)
- Use a larger model, namely flan-t5-xl, to get more thoughtful answers.
- Store more information, like document titles and links.
- Break large documents into smaller chunks when adding.
- Share your app online using Hugging Face Spaces.
Conclusion
RAG (Retrieval-Augmented Generation) is an intelligent approach to developing chatbots that do not simply make educated guesses in their answers; instead, they conduct a search for information and then respond accordingly. This is much more accurate and helpful for your users.
You can use it to.
- Support customers
- Answer research questions
- Teach or guide individuals
Now that your chatbot is functional, you can do some advanced things.
- Use your own documents
- Connect it to PDFs, websites, or even databases
With minimal tooling, you can now build a very powerful chatbot that responds with accurate and helpful information.