Introduction
After we rolled out the chatbot I introduced in my last article about Langchain and Python to our customers, the first wish was: can't we get a real chat functionality like in Chat-GPT? This has several advantages; you can refine questions, clear up misunderstandings and get much better results. In addition, it would, of course, be great to list the sources used so that you can make your own picture of the documentation. These are completely reasonable wishes. And because I love challenges, I spent my weekend trying to fulfill them.
What's Needed?
The first part of the work can be adopted almost unchanged in itself. The so-called embedding serves to give the documentation to OpenAI in small bites. The vectors returned by the API reflect the semantic meaning of the passed fragments and are stored in a vector database. This allows afterward to return the most appropriate parts of the documentation as context for the question via a similarity search. If you are interested in details about this, I recommend the article linked in the introduction.
However, I have made one change. Since I want to link the sources afterward, I now only use content that I load from our documentation portal and other sources via the sitemap. This way, the sources can be displayed as clickable links afterward.
I also added practical parameters to the routine that loads content from a sitemap. The Langchain class supports post-processing of the crawled HTML content so that you can remove annoying or unnecessary elements like the navigation, header, and footer before the embedding process. This looks like this, for example:
def sanitize_documentx_page(content: BeautifulSoup) -> str:
# Find content div element
div_element = content.find('div', {"class": "i-body-content"})
return str(div_element)
...
def add_sitemap_documents(web_path, filter_urls, parsing_function, instance):
if os.path.isfile(web_path):
# If it's a local file path, use the SitemapLoader with is_local=True
loader = SitemapLoader(web_path=web_path, filter_urls=filter_urls, parsing_function=parsing_function, is_local=True)
else:
# If it's a web URL, use the SitemapLoader with web_path
loader = SitemapLoader(web_path=web_path, filter_urls=filter_urls, parsing_function=parsing_function)
loader.session.headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
add_documents(loader, instance)
...
# add EN .NET help from docu.combit.net
add_sitemap_documents('C:\\DocBot\\input\\sitemap_net_en.xml',
[],
sanitize_documentx_page,
instance)
Changes to the Server
The more important and larger part of the changes concerns the server part. Now completely different APIs are called here; in addition, the state of the session must be saved so that the following calls can have the history. Let's look at this step by step.
Session Handling
Flask does a very good job and takes session handling almost completely off your hands. However, only objects that are JSON serializable can be included in the session. I have circumvented this limitation by having a local store for the required objects and just remembering generated GUIDs for the session that allow me to do a lookup. Details of this can be found in the sources I've linked below. This is not a special masterstroke either; in production, this would be solved more elegantly.
Storing the History
In contrast to the simple "question/answer" game from the first article, the entire history of the chat must now be transferred with each request. Langchain offers various conversational memory classes for this purpose - here's a great introduction to the topic. For our purpose, ConversationBufferMemory does the job just fine. So, each session gets a memory object assigned like this.
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, output_key='answer')
For the actual Q&A, a different Chain is used.
qa = ConversationalRetrievalChain.from_llm(llm, instance.as_retriever(), memory=memory, get_chat_history=lambda h : h, condense_question_prompt=CONDENSE_QUESTION_PROMPT, combine_docs_chain_kwargs={"prompt": QA_PROMPT}, return_source_documents=True)
This looks complicated. However, Langchain does a great job here out of the box. I had to figure out the correct way to call this, as there are just a few people out there that seem to have experimented with this so far. The parameters we pass are the large language model (ChatOpenAI in our case, using the gpt-3.5-turbo model), the Chroma vector database instance, the conversational memory instance, a function to retrieve the chat history from this memory and two prompts. Two? Well, yes.
The QA_PROMPT is the same as in the first article, It sets the tone and purpose for the bot. The CONDENSE_QUESTION_PROMPT is new here. This is the prompt that is used to generate a new, standalone question from the chat history and the next question asked. It's mandatory to rerun this condensed question through the same process as the sources that are needed might change depending on the question asked. The prompt looks like this.
# Condense Prompt
condense_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(condense_template)
With all of this in place, we just need to call the qa object from the main web API and return both the response and the sources to the client.
query = request.args.get('query')
# Process the input string through the Q&A chain
query_response = qa({"question": query})
metadata_list = [obj.metadata for obj in query_response["source_documents"]]
response = {
'answer' : query_response["answer"],
'sources' : metadata_list
}
The client-side needs to have a chat-like UI with a way to add messages in an endless container, call the web API, retrieve answers, and nicely format the sources. As this would be too much for this article, I've put the sources on combit's GitHub.
Wrap Up
After a couple of hours of working on this, we now have a prototype that shows the potential of this technique.
Obviously, I'm not a web designer, but in the short run, this will replace the bot combit is currently using here. The new bot is faster, more versatile, and much more useful to the user.