Skip to content

Professional text-to-speech and voice input tools for Linux systems. Multi-engine TTS, voice recording, cross-platform compatibility.

License

Notifications You must be signed in to change notification settings

pablopda/linux-speech-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

31 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Linux Speech Tools

Professional text-to-speech and voice input tools for Linux systems. Multi-engine TTS, voice recording, and cross-platform compatibility.

CI/CD Pipeline Release License

๐Ÿš€ Quick Installation

curl -fsSL https://raw.githubusercontent.com/pablopda/linux-speech-tools/main/installer.sh | bash

โœจ Features

๐ŸŽ™๏ธ Multi-Engine Text-to-Speech

  • Edge TTS: High-quality cloud-based synthesis with 22-country LATAM regional voice support
  • Kokoro TTS: Offline neural voice synthesis
  • Festival TTS: Local fallback engine
  • Graceful fallbacks: Automatic engine switching for maximum reliability

๐Ÿ—ฃ๏ธ Voice Input & Recording

  • Toggle recording: Press once to start, again to stop (default mode)
  • Speech-to-text: Powered by OpenAI Whisper for accurate transcription
  • Auto-clipboard: Transcription automatically copied to clipboard
  • GNOME integration: Global hotkey (Ctrl+Alt+V) for system-wide voice input
  • Smart detection: Terminal vs GUI application handling

๐ŸŽต Enhanced Audio Streaming โญ NEW

  • Continuous playback: Eliminates gaps between audio chunks
  • Professional quality: Broadcast-level smooth TTS streaming
  • Smart concatenation: Uses ffmpeg/sox for seamless audio joining
  • Multiple modes: Continuous, buffered, and original streaming options
  • Drop-in replacement: Enhanced versions of existing commands

๐ŸŽฎ GNOME Media Controls โญ LATEST

  • Desktop media controls: Play/pause/stop from notification panel
  • Real-time progress: Visual progress tracking for reading sessions
  • Native integration: Professional media player experience for TTS
  • Document information: Display source title and reading status
  • Notification controls: Never lose control of long reading sessions

๐Ÿ–ฅ๏ธ Command-Line Tools

  • say - Text-to-speech with file output support
  • say-local - Local TTS using Festival/Kokoro
  • say-read - Read URLs, PDFs, and documents with TTS
  • say-read-es - Spanish language content reader
  • talk2claude - Voice input with transcription

๐Ÿง Cross-Platform Linux Support

  • Ubuntu 20.04, 22.04
  • Debian 11, 12
  • Fedora 38, 39
  • Automatic dependency detection and installation
  • XDG-compliant configuration management

๐Ÿ“– Usage Examples

Basic Text-to-Speech

# Simple speech
say "Hello from Linux Speech Tools!"

# Spanish voice
say -v es-ES-AlvaroNeural "ยกHola mundo!"

# Save to file
say -o greeting.mp3 "Welcome to our application"

# Show available options
say --help

๐ŸŽค Voice Input

GNOME Integration (Recommended):

# Install GNOME integration
./install-gnome-integration.sh

# Use system-wide hotkey: Ctrl+Alt+V
# Press once โ†’ Start recording
# Press again โ†’ Stop and transcribe

Command Line:

# Toggle mode (default)
./toggle-speech.sh toggle    # Start/stop recording
./toggle-speech.sh start     # Start only
./toggle-speech.sh stop      # Stop only

# Fixed duration mode
./simple-speech.sh 5         # 5-second recording

# Original talk2claude (advanced)
talk2claude                  # 8-second recording
talk2claude start           # Background recording
talk2claude stop            # Stop and transcribe

๐Ÿ“– Content Reading

๐ŸŽต Enhanced: Continuous Streaming (NEW)

# Smooth, gap-free audio streaming
./say-read-continuous https://example.com/article

# Professional-quality playback for long content
./say-read-smooth --buffered https://en.wikipedia.org/wiki/Linux

# Interactive demo showing improvement
./demo-audio-streaming.sh

๐ŸŽฎ GNOME Media Controls (LATEST)

# Reading with desktop media controls
./say-read-gnome https://www.bbc.com/news/technology

# Control playback from notification panel:
# โธ๏ธ Pause - Click to pause reading
# โ–ถ๏ธ Resume - Click to resume reading
# โน๏ธ Stop - Click to stop completely

# Setup GNOME integration (first time)
./say-read-gnome --setup

# Interactive demo and testing
./demo-gnome-media-integration.sh

๐Ÿ“š Standard Reading

# Read web articles
say-read https://example.com/article

# Read PDF documents
say-read document.pdf

# Read with Spanish voice
say-read-es https://elpais.com/tecnologia/

๐Ÿ”ง Installation Methods

Option 1: One-Command Install (Recommended)

curl -fsSL https://raw.githubusercontent.com/pablopda/linux-speech-tools/main/installer.sh | bash

Option 2: Manual Installation

git clone https://github.com/pablopda/linux-speech-tools.git
cd linux-speech-tools
./installer.sh

Option 3: Package Installation

Download packages from Releases:

Ubuntu/Debian:

wget https://github.com/pablopda/linux-speech-tools/releases/download/v1.0.0/linux-speech-tools_1.0.0.deb
sudo dpkg -i linux-speech-tools_1.0.0.deb

Fedora/RHEL:

wget https://github.com/pablopda/linux-speech-tools/releases/download/v1.0.0/linux-speech-tools-1.0.0-1.noarch.rpm
sudo rpm -i linux-speech-tools-1.0.0-1.noarch.rpm

โš™๏ธ Configuration

Voice Configuration

Create ~/.config/speech-tools/config:

# Default voice for Edge TTS
EDGE_VOICE=en-US-EmmaMultilingualNeural

# Voice input settings
ASR_LANG=en
WHISPER_MODEL=large-v3

Available Voices

# List Edge TTS voices
edge-tts --list-voices | grep -E "(Male|Female)"

# Test different voices
say -v en-GB-SoniaNeural "British English"
say -v es-MX-DaliaNeural "Mexican Spanish"
say -v pt-BR-AntonioNeural "Brazilian Portuguese"

๐Ÿ” Troubleshooting

Audio Issues

# Test audio output
say "Audio test"

# Check audio devices
pactl list short sinks

# Install audio dependencies
sudo apt install pulseaudio-utils  # Ubuntu/Debian
sudo dnf install pulseaudio-utils  # Fedora

Dependency Issues

# Install Python dependencies manually
pip3 install edge-tts pyaudio speechrecognition

# Install system dependencies
sudo apt install python3-pip ffmpeg espeak-ng portaudio19-dev  # Ubuntu/Debian
sudo dnf install python3-pip ffmpeg espeak-ng portaudio-devel  # Fedora

Permission Issues

# Make scripts executable
chmod +x ~/.local/bin/{say,say-local,talk2claude}

# Add to PATH if needed
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

๐Ÿงช Development

Running Tests

# Run full test suite
python3 tests/test_speech_tools.py

# Quick validation
./scripts/quick-release-check.sh

# Comprehensive validation
./scripts/pre-release-check.sh

Creating Releases

# Patch release (1.0.0 -> 1.0.1)
./release.sh patch

# Minor release (1.0.0 -> 1.1.0)
./release.sh minor

# Preview release
./release.sh patch --dry-run

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Quick Start for Contributors

git clone https://github.com/pablopda/linux-speech-tools.git
cd linux-speech-tools

# Install development dependencies
./installer.sh

# Run tests
python3 tests/test_speech_tools.py

# Submit changes
git checkout -b feature/your-feature
# Make changes
./scripts/quick-release-check.sh
git commit -m "Add your feature"
git push origin feature/your-feature
# Create pull request

๐Ÿ“‹ Requirements

System Requirements

  • OS: Linux (Ubuntu 20.04+, Debian 11+, Fedora 38+)
  • Python: 3.7+
  • Audio: PulseAudio or ALSA
  • Network: Internet connection for Edge TTS

Dependencies

  • python3-pip
  • ffmpeg
  • espeak-ng
  • portaudio19-dev (Ubuntu/Debian) or portaudio-devel (Fedora)

All dependencies are automatically installed by the installer script.

๐Ÿ“š Documentation

๐Ÿ“Š Project Status

  • โœ… Production Ready: Comprehensive testing across multiple distributions
  • โœ… Actively Maintained: Regular updates and improvements
  • โœ… Community Driven: Open to contributions and feature requests
  • โœ… Professional Quality: Enterprise-grade CI/CD and release automation

๐Ÿ”— Links

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • OpenAI Whisper for speech recognition
  • Microsoft Edge TTS for cloud synthesis
  • Kokoro ONNX for offline synthesis
  • Festival Speech Synthesis System
  • The open-source Linux community

Made with โค๏ธ for the Linux community

Professional speech tools that just work. ๐Ÿง๐ŸŽ™๏ธ

About

Professional text-to-speech and voice input tools for Linux systems. Multi-engine TTS, voice recording, cross-platform compatibility.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published