PDF to Speech

A full-stack application that converts PDF documents into natural-sounding speech using AI-powered text-to-speech (TTS) technology. Upload a PDF, and listen to it read aloud with professional voice synthesis powered by Supertonic.

🌟 Features

PDF Upload & Parsing: Extract text from any PDF document
Intelligent Text Processing: Automatically split text into sentences using NLP
High-Quality TTS: Generate natural-sounding speech using Supertonic's AI voice models
Interactive Player: Play, pause, resume, and navigate through sentences
Visual Feedback: Highlight the currently playing sentence with auto-scroll
Seamless Experience: Modern, responsive UI built with Next.js and Material-UI

🏗️ Architecture

The application consists of two main components:

Backend (Python/FastAPI)

PDF Parsing: Uses pdfminer.six to extract text from PDF files
NLP Processing: Leverages spaCy (en_core_web_sm) to split text into sentences
Text-to-Speech: Integrates Supertonic's ONNX models for voice synthesis
API Endpoints:
- POST /api/upload - Upload PDF and extract sentences
- POST /api/tts - Synthesize speech for a given sentence

Frontend (Next.js/React)

Modern UI: Built with Next.js 14, React 18, and Material-UI 6
Component-Based: Modular architecture with reusable components
Interactive Controls: Player controls for playback management
Static Export: Optimized for deployment as static files

📋 Prerequisites

Docker and Docker Compose (recommended)
OR:
- Python 3.11+
- Node.js 20+
- Git LFS (for downloading TTS models)

🚀 Quick Start

Using Docker (Recommended)

Clone the repository:

git clone <repository-url>
cd document-reader-supertonic

Build and run:

docker build -t pdf-tts .
docker run -p 8000:8000 pdf-tts

Access the application: Open your browser to http://localhost:8000

Manual Setup

Backend Setup

Install Python dependencies:

cd backend
pip install -r requirements.txt

Download spaCy model:

pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1.tar.gz

Download Supertonic models:

git lfs install
git clone https://huggingface.co/Supertone/supertonic /models/supertonic

Download helper script:

curl -o app/helper.py https://raw.githubusercontent.com/supertone-inc/supertonic/main/py/helper.py

Run the backend:

uvicorn app.main:app --host 0.0.0.0 --port 8000

Frontend Setup

Install dependencies:
```
cd frontend
npm install
```
Run development server:
```
npm run dev
```
Or build for production:
```
npm run build
```

📖 Usage

Upload a PDF: Click the "Upload PDF" button and select a PDF file
View Sentences: The extracted text appears in the viewer, split into sentences
Play Audio: Click "Play" to start listening from the beginning or current sentence
Navigate: Use "Next" and "Prev" buttons to skip between sentences
Jump to Sentence: Double-click any sentence to start playing from that point
Auto-Scroll: Toggle auto-scroll to automatically follow along with the audio
Playback Controls: Use Play, Pause, Resume, and Stop buttons as needed

🛠️ Technology Stack

Backend

FastAPI: Modern, fast web framework for building APIs
PDFMiner.six: Robust PDF text extraction
spaCy: Industrial-strength NLP for sentence segmentation
Supertonic: State-of-the-art neural TTS with ONNX runtime
ONNX Runtime: Efficient model inference
Uvicorn: Lightning-fast ASGI server

Frontend

Next.js 14: React framework with static export capabilities
React 18: Component-based UI library
Material-UI (MUI) 6: Comprehensive component library
TypeScript: Type-safe JavaScript development
Emotion: CSS-in-JS styling solution

📁 Project Structure

document-reader-supertonic/
├── Dockerfile              # Multi-stage Docker build
├── backend/
│   ├── requirements.txt    # Python dependencies
│   └── app/
│       ├── main.py         # FastAPI application & routes
│       ├── pdf_parser.py   # PDF text extraction
│       ├── nlp.py          # Sentence segmentation
│       └── tts.py          # Text-to-speech synthesis
└── frontend/
    ├── package.json        # Node.js dependencies
    ├── next.config.js      # Next.js configuration
    └── src/
        ├── components/     # React components
        │   ├── PdfUploader.tsx     # File upload component
        │   ├── PlayerControls.tsx  # Audio player controls
        │   └── TextViewer.tsx      # Sentence display & navigation
        ├── lib/
        │   └── api.ts      # API client functions
        └── pages/
            └── index.tsx   # Main application page

🔧 Configuration

Voice Styles

The default voice style is set to M1.json (male voice). You can modify the voice style in backend/app/tts.py:

VOICE_STYLE = ["/models/supertonic/voice_styles/M1.json"]

Available voice styles are located in /models/supertonic/voice_styles/.

TTS Parameters

Adjust synthesis parameters in the SupertonicTTS.synthesize() method:

total_step: Number of diffusion steps (default: 5)
speed: Playback speed multiplier (default: 1.05)

🐳 Docker Details

The Dockerfile uses a multi-stage build:

Frontend Stage: Builds the Next.js application into static files
Backend Stage: Sets up Python environment, installs dependencies, downloads models, and serves both API and static frontend

📝 API Reference

POST /api/upload

Upload a PDF file and extract sentences.

Request: multipart/form-data with file field

Response:

{
  "sentences": [
    { "id": 0, "text": "First sentence." },
    { "id": 1, "text": "Second sentence." }
  ]
}

POST /api/tts

Synthesize speech for a given text.

Request:

{
  "text": "Text to synthesize"
}

Response:

{
  "audioUrl": "/data/audio/tmp_xyz.wav"
}

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project uses the following third-party technologies:

Supertonic - Text-to-speech model
spaCy - Natural language processing
FastAPI - Web framework
Next.js - React framework

🙏 Acknowledgments

Supertone for providing the high-quality TTS models
spaCy for robust NLP capabilities
The open-source community for all the amazing tools and libraries

📧 Contact

For questions, issues, or suggestions, please open an issue on the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
frontend		frontend
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF to Speech

🌟 Features

🏗️ Architecture

Backend (Python/FastAPI)

Frontend (Next.js/React)

📋 Prerequisites

🚀 Quick Start

Using Docker (Recommended)

Manual Setup

Backend Setup

Frontend Setup

📖 Usage

🛠️ Technology Stack

Backend

Frontend

📁 Project Structure

🔧 Configuration

Voice Styles

TTS Parameters

🐳 Docker Details

📝 API Reference

POST /api/upload

POST /api/tts

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Contact

About

Uh oh!

Releases

Packages

Languages

raghu13590/speakPDF

Folders and files

Latest commit

History

Repository files navigation

PDF to Speech

🌟 Features

🏗️ Architecture

Backend (Python/FastAPI)

Frontend (Next.js/React)

📋 Prerequisites

🚀 Quick Start

Using Docker (Recommended)

Manual Setup

Backend Setup

Frontend Setup

📖 Usage

🛠️ Technology Stack

Backend

Frontend

📁 Project Structure

🔧 Configuration

Voice Styles

TTS Parameters

🐳 Docker Details

📝 API Reference

POST /api/upload

POST /api/tts

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages