This is the repo for the text-to-speech dataset collection web application TTS Dataset Generator.
The application supports CSV upload and multi-line text input, multiple projects, RTL language support, and exports to Hugging Face.
Export to AWS S3 is under development
-
Clone the repo
git clone https://github.com/Kamal-Eldin/ASR-TTS-Data-Collection
-
Ensure docker desktop is running
-
Open project in vscode devcontainer
devcontainer.json:
Ensures the installation of python base image and project dependencies, notably:python 3.12-bookworm,Node.js 16anddocker-in-dockerpostCreate.sh:
Holds thepostCreateCommandto install python dependencies as per./requirements.txt
-
Execute the make target
deploymake deploy
Visit http://localhost:8500 to reach the web app
The make target deploy copies project.config into .env at the root path for setting up docker compose services.
The following environment variables must be declared in the project environment. These variables are curated in project.config at the repo's root.
# Database Configuration
MYSQL_HOST=db # must be the name of the compose service name for the database container
MYSQL_PORT=3306
MYSQL_USER=admin
MYSQL_DATABASE=tts_dataset_generator
# paths to secret file mount in the db container
MYSQL_ROOT_PASSWORD_FILE=/run/secrets/db_root_password
MYSQL_PASSWORD_FILE=/run/secrets/db_password
# Application Configuration
STORAGE_PATH=recordings
# Export Timeouts (in seconds)
HF_EXPORT_TIMEOUT=300
S3_EXPORT_TIMEOUT=300
# AWS Configuration (for S3 export)
AWS_DEFAULT_REGION=us-east-1
# paths to secret file mount in the app container for aws creds
AWS_ACCESS_KEY_ID_FILE=/run/secrets/aws_access_id
AWS_SECRET_ACCESS_KEY_FILE=/run/secrets/aws_access_secret
# Hugging Face Configuration (for HF export)
HUGGINGFACE_REPO=your_username/your_repo
# paths to secret file mount in the app container for hugging face
# Port to access the application's frontend
APP_PORT=8500
# Backend url with respect to a unified container for both front & backend services
BACKEND_URL=http://localhost:${APP_PORT}Each time the
APP_PORTenvironment variable or theBACKEND_URLare changed, the makedeploytarget must be re-executed
The directory ./secrets at root should hold 5 .txt files (gitignored) for the project secrets. 2 of which are mandatory for the mysql database setup (i.e., db_password.txt, db_root_password.txt)
- aws_access_id.txt
- aws_access_secret.txt
- db_password.txt
- db_root_password.txt
- hf_token.txt
Available as a docker container, with in the docker compose network.
The current image tag is docker.io/mysql:9
- π Multi-Project Support: Upload multiple CSV files, each as a separate project
- π€ Audio Recording: Record audio for each prompt with keyboard controls
- ποΈ Project Management: Create, delete, and manage projects independently
- π Progress Tracking: Track recording progress and resume from last position
- π΅ Audio Playback: Play previous recordings within projects
- βοΈ Export Options: Export datasets to Amazon S3 or Hugging Face
- βοΈ Settings Management: Configure storage paths and API credentials
- ποΈ Database Management: Clear entire database when needed
- π RTL Language Support: Full support for Right-to-Left languages (Arabic, Persian)
- π Flexible Input Methods: CSV upload or multi-line text input
- π― Smart UI: RTL text display with English interface
- Frontend: React + TypeScript + Vite + Tailwind CSS
- Backend: FastAPI + Python + SQLAlchemy
- Database: MySQL (with SQLite fallback for development)
- Storage: Local filesystem + Amazon S3 + Hugging Face Datasets
- Docker Desktop
- Python 3.8+
- Node.js 16+
- MySQL 8.0+ (optional - SQLite fallback available)
- Click "New Project" on the main page
- Enter a project name
- Choose input method:
- CSV Upload: Select a CSV file with prompts (one prompt per row)
- Multi-line Text: Type or paste prompts directly (one per line)
- Optional: Check "Right-to-Left (RTL) Language" for Arabic, Persian, etc.
- Click "Create Project"
When creating projects for RTL languages:
- Check the "Right-to-Left (RTL) Language" checkbox
- The text input area will display in RTL format
- Prompts will be properly formatted in the recording interface
- UI labels remain in English for consistency
- Navigate to a project
- Use keyboard controls:
- Enter: Start/Stop recording
- Left Arrow: Skip to next prompt
- Right Arrow: Go to previous prompt
- Space: Play/Stop current recording
For RTL projects, prompts are automatically displayed with proper RTL formatting:
- Text flows from right to left
- Proper text alignment for Arabic, Persian, etc.
- Maintains readability in the recording interface
-
Hugging Face Export:
- Configure your Hugging Face token in Settings
- Set your repository name
- Click "Export to Hugging Face"
-
Amazon S3 Export:
- Configure your AWS credentials in Settings
- Set your S3 bucket name
- Click "Export to S3"
- settings: Application configuration
- projects: Project information, prompts, and RTL settings
- prompts: Individual prompts with order and project association
- recordings: Audio recordings metadata with prompt association
- interactions: User interaction logs
- Project Isolation: Each project has its own recordings
- Progress Tracking: Resume recording from last position
- Metadata Storage: Recording timestamps and file information
- Audit Trail: Log all user interactions
- RTL Support: Projects can be marked as RTL for proper text display
- Prompt Management: Prompts are stored separately with order preservation
Configure where audio files are stored:
- Default:
recordings/directory - Can be changed in Settings
- Hugging Face: Token and repository configuration
- Amazon S3: Bucket name and credentials
- Timeouts: Configurable export timeouts
If ports are already in use:
# Kill processes on specific ports
lsof -ti:8000 | xargs kill -9 # Backend
lsof -ti:5173 | xargs kill -9 # Frontend (Vite default)
lsof -ti:5174 | xargs kill -9 # Frontend (Vite fallback)Note: Vite automatically finds the next available port if 5173 is in use.
Ensure proper file permissions:
chmod +x backend/setup_database.py
chmod +x backend/start_mysql.py
chmod +x backend/migrate_sqlite_to_mysql.py
mkdir -p recordings
chmod 755 recordingsThe application includes comprehensive RTL language support:
- Database: Projects have an
is_rtlfield to mark RTL languages - Frontend: Text inputs display in RTL format when RTL is selected
- Recording Interface: Prompts are displayed with proper RTL styling
- UI Consistency: Interface labels remain in English for consistency
Two flexible input methods are supported:
- CSV Upload: Traditional CSV file upload with one prompt per row
- Multi-line Text: Direct text input with one prompt per line
- Supports RTL text input when RTL checkbox is selected
- Real-time prompt counting
- Automatic empty line filtering
- Backend: Add new endpoints in
main.py - Frontend: Create new components in
src/components/ - Database: Update models and run migrations
MIT
Feel free to contribute and open a PR