Spaces:
Running
on
T4
PDF to Podcast Converter
Overview
This project provides a tool to convert any PDF document into a podcast episode! Using OpenAI's text-to-speech models and Google Gemini, this tool processes the content of a PDF, generates a natural dialogue suitable for an audio podcast, and outputs it as an MP3 file.
Features
- Convert PDF to Podcast: Upload a PDF and convert its content into a podcast dialogue.
- Engaging Dialogue: The generated dialogue is designed to be informative and entertaining.
- Multiple Voice Options: Choose from different voices to narrate the podcast.
- User-friendly Interface: Simple interface using Gradio for easy interaction.
Installation
To set up the project, follow these steps:
Clone the repository:
git clone https://github.com/knowsuchagency/pdf-to-podcast.git cd pdf-to-podcast
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate
Install the required packages:
pip install -r requirements.txt
Usage
Set up API Key(s): Ensure you have an Google Gemini API key. You can get yours at https://aistudio.google.com/app/apikey. Use it as the value to
GEMINI_API_KEY
. You'll also need an api key for OpenAI which you can either pass through the interface or set as theOPENAI_API_KEY
environment variable.Gemini flash is used as the LLM and OpenAI is used for text-to-speech.
Run the application:
python main.py
This will launch a Gradio interface in your web browser.
Upload a PDF: Upload the PDF document you want to convert into a podcast.
Enter OpenAI API Key: Provide your OpenAI API key in the designated textbox.
Generate Audio: Click the button to start the conversion process. The output will be an MP3 file containing the podcast dialogue.
Project Structure
- main.py: Main application script.
- requirements.txt: List of dependencies.
- README.md: Project documentation (this file).
Code Explanation
Dialogue Models
Defines the structure of the dialogue using Pydantic models.
class DialogueItem(BaseModel):
text: str
voice: Literal["alloy", "onyx", "fable"]
class Dialogue(BaseModel):
scratchpad: str
dialogue: List[DialogueItem]
LLM Function
Generates dialogue based on the input text using the promptic
decorator.
@llm(model="gemini/gemini-1.5-flash")
def generate_dialogue(text: str) -> Dialogue:
# Function to generate podcast dialogue
TTS Function
Converts text to speech using OpenAI's text-to-speech model.
def get_mp3(text: str, voice: str, api_key: str = None) -> bytes:
# Function to generate MP3 from text
Main Function
Processes the PDF, generates dialogue, and converts it to audio.
def generate_audio(file: bytes, openai_api_key: str) -> bytes:
# Main function to process PDF and generate audio
Gradio Interface
Creates a user-friendly interface for uploading PDFs and generating podcasts.
demo = gr.Interface(
title="PDF to Podcast",
description="Convert any PDF document into an engaging podcast episode!",
fn=generate_audio,
inputs=[
gr.File(label="Input PDF", type="binary"),
gr.Textbox(label="OpenAI API Key", placeholder="Enter your OpenAI API key here"),
],
outputs=[
gr.Audio(format="mp3"),
],
)
demo.launch(show_api=False)
License
This project is licensed under the Apache 2.0 License. See the LICENSE file for more information.