Clara - Your Reading Companion

A desktop application for reading documents with text-to-speech capabilities and AI-powered question answering.

Features

Text-to-speech reading for PDF, TXT, and Markdown files
Word-by-word highlighting during playback
AI-powered document Q&A with local LLM
Advanced multi-source table of contents extraction
- Extracts from native PDF metadata (most accurate)
- Parses table of contents pages (works with books, textbooks, manuals)
- Font-based structural analysis (works with any PDF)
- Supports 13+ TOC format patterns
- Progressive loading for instant display
Continuous scrolling through pages
Dictionary lookup for words
All AI processing happens locally on your machine

First Launch

When you run Clara for the first time, it will automatically download a base AI model (approximately 2 GB). This one-time setup enables:

Asking questions about your documents
Smart table of contents extraction
Intelligent text processing
Context-aware reading features

The download takes about 5-10 minutes on a typical internet connection. You can skip this step, but Clara works best with AI features enabled.

System Requirements

Python 3.8 or higher
Poppler (system library for PDF rendering)
Internet connection (for initial setup only)

Installation

Quick Start

The easiest way to install Clara is to use the automated installation script:

On macOS or Linux:

bash install.sh

On Windows:

install.bat

The script will install all Python dependencies and check for system requirements. If anything is missing, it will provide installation instructions.

Manual Installation

If you prefer to install manually, follow these steps:

Step 1: Install Python Dependencies

pip install -r requirements.txt

Step 2: Install Poppler

Poppler is required for rendering PDF pages as images. Without it, PDFs will only display as text.

On macOS:

Install using Homebrew:

brew install poppler

If you don't have Homebrew, install it first from https://brew.sh

On Windows:

Download the latest release from: https://github.com/oschwartz10612/poppler-windows/releases/
Extract the ZIP file to C:\Program Files\poppler
Add the bin directory to your PATH:
- Open System Properties (Win + X, then System)
- Click "Advanced system settings"
- Click "Environment Variables"
- Under "System variables", find and select "Path"
- Click "Edit"
- Click "New"
- Add: C:\Program Files\poppler\Library\bin
- Click OK on all dialogs
Restart your computer for the PATH changes to take effect

On Linux:

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install poppler-utils

Fedora:

sudo dnf install poppler-utils

Arch Linux:

sudo pacman -S poppler

Step 3: Verify Poppler Installation

Open a new terminal window and run:

pdftoppm -v

You should see the Poppler version information. If you get an error, review the installation steps above.

Running Clara

After installation, start the application with:

python app.py

On startup, Clara will check all dependencies and display their status. If any required dependencies are missing, you will see detailed installation instructions.

The application will open in a desktop window. You can then upload documents and use the controls to read or navigate.

Controls

Play/Pause button: Start or stop reading
Speed control: Adjust playback rate
Previous/Next buttons: Navigate between pages
Sync button: Jump to current reading position
Question input: Ask questions about document content
Table of Contents: Navigate to any section of your document

Table of Contents Extraction

Clara uses a sophisticated multi-source approach to extract table of contents from any PDF:

Extraction Methods (Priority Order)

Native PDF TOC - Reads embedded navigation from PDF metadata (most accurate)
TOC Page Parsing - Extracts actual page numbers from table of contents pages
- Supports 13+ format patterns including:
  - "Chapter 1: Introduction ............... 15"
  - "1.1 Overview ........................... 42"
  - "Part I: Getting Started ------------- 15"
  - Tabbed formats, numbered sections, and more
Font-Based Analysis - Analyzes text styling to identify headings
- Adaptive font size detection
- Works with academic papers, novels, manuals, reports
- Skips front matter to avoid confusion

Progressive Loading

Quick TOC displays in under 0.5 seconds for instant navigation
Enhanced TOC loads in background with better accuracy
Results are cached for fast subsequent access

Works With

Academic papers (Abstract, Introduction, Methods, Results)
Textbooks and "For Dummies" series
O'Reilly technical books
Novels and fiction
Corporate reports and manuals
Technical documentation
Any PDF with structured headings

Dependencies

Required Python Packages

These are installed automatically via requirements.txt:

flask - Web framework for the application backend
pywebview - Creates the desktop window
pypdf - Parses PDF files
pymupdf - Extracts table of contents and metadata
pdf2image - Converts PDF pages to images (requires Poppler)
sentence-transformers - AI embeddings for semantic search
chromadb - Vector database for document storage
werkzeug - WSGI utilities (Flask dependency)
numpy - Numerical computing (AI dependency)

Optional Python Packages

These enhance functionality but are not required:

edge-tts - Microsoft neural voices for better text-to-speech quality (Falls back to system TTS if not installed)
llama-cpp-python - Local AI for question answering (Falls back to search-only mode if not installed)

System Dependencies

These must be installed separately on your system:

Poppler - Required for PDF image rendering (see installation instructions above)

Data Storage

Clara stores all data locally in ~/Documents/Clara/:

documents/ - Uploaded document files
thumbnails/ - PDF page thumbnails
toc_cache/ - Table of contents extraction cache
voices/ - TTS voice cache
models/ - AI model files
audio_cache/ - Generated audio files
clara.db - SQLite database
config.json - Application configuration

Troubleshooting

Poppler Not Found

If you see "Poppler NOT FOUND" on startup:

Follow the Poppler installation instructions for your operating system (see Step 2 above)
Verify installation by running pdftoppm -v in a new terminal
On Windows, make sure you added the correct path: C:\Program Files\poppler\Library\bin (note the \Library\bin subdirectory)
On Windows, restart your computer after changing PATH
Restart Clara

PDF Pages Show Text Instead of Images

This means Poppler is not properly installed. Follow the Poppler installation steps above.

Import Errors

If you see errors like "ModuleNotFoundError: No module named 'flask'":

Make sure you installed all dependencies: pip install -r requirements.txt
If using a virtual environment, make sure it is activated
Try reinstalling: pip install -r requirements.txt --force-reinstall

Edge TTS Not Available

This is normal if you haven't installed edge-tts. Clara will use your system's built-in text-to-speech. To use higher quality neural voices:

pip install edge-tts

Permission Errors (macOS/Linux)

If you see permission errors during installation:

pip install --user -r requirements.txt

Virtual Environment Setup (Optional)

If you want to use a virtual environment:

# Create virtual environment
python -m venv venv

# Activate it
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run Clara
python app.py

Platform-Specific Notes

macOS

Tested on macOS 10.13 (High Sierra) and newer
Uses WebKit (Safari) for rendering
System TTS uses the say command as fallback

Windows

Tested on Windows 10 and newer
Uses Edge WebView2 for rendering (included with Windows 10+)
Poppler installation requires adding to PATH and restarting

Linux

Tested on Ubuntu, Fedora, and Arch Linux
PyWebView requires GTK+ 3.0 (usually pre-installed)
If needed, install GTK:
- Ubuntu/Debian: sudo apt-get install python3-gi python3-gi-cairo gir1.2-gtk-3.0 gir1.2-webkit2-4.0
- Fedora: sudo dnf install python3-gobject gtk3 webkit2gtk3

Project Structure

clara_2/
├── app.py                  # Main application file
├── requirements.txt        # Python dependencies
├── install.sh             # Installation script (macOS/Linux)
├── install.bat            # Installation script (Windows)
├── templates/
│   └── index.html         # Main UI template
└── static/
    ├── css/
    │   └── style.css
    └── js/
        ├── app.js
        └── modules/       # Modular JavaScript components

Technical Details

Backend: Python Flask
Frontend: HTML, CSS, JavaScript (ES modules)
Desktop UI: PyWebView (native window wrapper)
Text-to-Speech: Edge TTS or system TTS
AI: Local LLM via llama-cpp-python
Vector Embeddings: sentence-transformers
Vector Database: ChromaDB
PDF Processing: PyMuPDF and pdf2image

First Run

When you first run Clara:

Dependency Check: Clara will verify all dependencies and display their status
Storage Initialization: Creates the ~/Documents/Clara/ directory structure
Model Download: If using Q&A features, TinyLlama model (670MB) downloads automatically on first use
Ready: The application window opens and you can start uploading documents

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
electron		electron
static		static
templates		templates
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
app.py		app.py
install.bat		install.bat
install.sh		install.sh
requirements-py39.txt		requirements-py39.txt
requirements.txt		requirements.txt
start_clara.sh		start_clara.sh

Folders and files

Latest commit

History

Repository files navigation

Clara - Your Reading Companion

Features

First Launch

System Requirements

Installation

Quick Start

Manual Installation

Step 1: Install Python Dependencies

Step 2: Install Poppler

Step 3: Verify Poppler Installation

Running Clara

Controls

Table of Contents Extraction

Extraction Methods (Priority Order)

Progressive Loading

Works With

Dependencies

Required Python Packages

Optional Python Packages

System Dependencies

Data Storage

Troubleshooting

Poppler Not Found

PDF Pages Show Text Instead of Images

Import Errors

Edge TTS Not Available

Permission Errors (macOS/Linux)

Virtual Environment Setup (Optional)

Platform-Specific Notes

macOS

Windows

Linux

Project Structure

Technical Details

First Run

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages