Meta has unveiled an open-source version of the popular podcast generation feature found in Google’s NotebookLM, named NotebookLlama. This innovative project leverages Meta’s own Llama models for processing, allowing users to create engaging, podcast-style conversations from uploaded text files.
Image credit
The process begins with NotebookLlama converting a document—such as a PDF of a news article or blog post—into a transcript. It then enhances this transcript by adding dramatization and interruptions before utilizing open text-to-speech models to produce the final audio output.
However, initial results indicate that the audio quality does not quite match that of NotebookLM. The samples reviewed exhibit a distinctly robotic tone, with instances of overlapping dialogue that detract from the listening experience.
Meta's researchers acknowledge these limitations, suggesting that improvements in the text-to-speech model could enhance the naturalness of the generated audio. They also propose an alternative approach where two AI agents could engage in a debate on the topic, creating a more dynamic podcast outline. Currently, NotebookLlama relies on a single model for this task.
NotebookLlama is not the first attempt to replicate NotebookLM’s podcast capabilities; various projects have emerged, each with varying degrees of success. However, all AI-generated podcasts, including those from NotebookLM, still face challenges related to hallucinations—meaning they may inadvertently include fabricated information.
Overall, while NotebookLlama presents an exciting step forward in AI-driven content creation, there remains room for improvement in audio quality and accuracy.