AudioGen — Multilingual Audio Generation Pipeline

Automates text-to-speech audio generation for localized string sets. Reads a CSV of translated strings, syncs them to a Google Sheet for tracking, generates .mp3 files via ElevenLabs, uploads them to Google Drive, and writes the shareable links back to the sheet.

How It Works

Input CSV
    │
    ▼
CSV Ingestion       — parse & validate; expand into (key, language, text) tuples
    │
    ▼
Google Sheets Sync  — upsert rows; mark changed/new strings as needs_generation
    │
    ▼
Generation Engine   — call ElevenLabs TTS per row; save to output/
    │
    ▼
Google Drive Upload — upload to AudioAssets/{key}/; set public link
    │
    ▼
Sheet Update        — write Drive link; mark row as generated

Setup

1. Install dependencies

pip install -r requirements.txt

2. Google Cloud — service account

Create a project in Google Cloud Console
Enable Google Sheets API and Google Drive API
Create a Service Account (IAM & Admin → Service Accounts)
Download the JSON key → save to credentials/service_account.json

3. Google Sheet

Create a new Google Sheet at sheets.google.com
Share it with the service account email → Editor
Copy the Sheet ID from the URL: https://docs.google.com/spreadsheets/d/**SHEET_ID**/edit

The script auto-creates the Audio Tracker tab with correct headers on first run.

4. Google Drive — Shared Drive

Important: Service accounts have no personal Drive quota. Files must be stored in a Shared Drive.

In Google Drive, click + New → Shared drive
Add the service account email as a Contributor (or higher)
Copy the Shared Drive folder ID from its URL: https://drive.google.com/drive/folders/**FOLDER_ID**

5. ElevenLabs

Get your API key from elevenlabs.io → Profile → API Keys
Browse the Voice Library → filter by language → Add to My Voices

Copy voice IDs from My Voices or via API:

curl -H "xi-api-key: YOUR_KEY" https://api.elevenlabs.io/v1/voices

6. Configure `.env`

Copy .env.example to .env and fill in:

ELEVENLABS_API_KEY=sk-...
GOOGLE_SERVICE_ACCOUNT_JSON=./credentials/service_account.json
GOOGLE_SHEET_ID=your_sheet_id
GOOGLE_DRIVE_FOLDER_ID=your_shared_drive_folder_id
RATE_LIMIT_DELAY=1.0

7. Configure `voices.yaml`

Replace <VOICE_ID> placeholders with real ElevenLabs voice IDs:

voices:
  English: "21m00Tcm4TlvDq8ikWAM"
  Hindi: "pNInz6obpgDQGcFmaJgB"
  Kannada: "..."
  # ... one per language
default_voice: "21m00Tcm4TlvDq8ikWAM"  # fallback for unconfigured languages

Tip: The model used is eleven_multilingual_v2, which supports all 12 languages. You can use a single multilingual voice for everything and refine per-language later.

Input Format

A CSV file with one row per string key and one column per language.

Column	Required	Description
`S. No`	No	Row index — ignored
`key`	Yes	Unique string identifier (e.g. `selected_language_is`)
`English`	No	English text
`Hindi`	No	Hindi text
(any language)	No	Additional language columns are auto-detected

Rules:

key must be unique across all rows
Empty language cells are silently skipped
Adding a new language column automatically includes it in generation
The file must be UTF-8 encoded (UTF-8 BOM is handled automatically)

Example (sample.csv):

S. No,key,English,Hindi,Kannada
0,selected_language_is,Selected language is English,चुनी गई भाषा हिंदी है,ಆಯ್ಕೆ ಮಾಡಲಾದ ಭಾಷೆ ಕನ್ನಡ
1,welcome_message,Welcome,आपका स्वागत है,ಸ್ವಾಗತ

Usage

python run.py --csv <path> --mode <mode> [--dry-run] [--retry-errors]

Arguments

Argument	Required	Description
`--csv PATH`	No	Path to input CSV. If omitted, skips ingestion and processes whatever is already queued in the sheet
`--mode`	Yes	Execution mode (see below)
`--dry-run`	No	Show planned actions without making any API calls or sheet writes
`--retry-errors`	No	Retry all rows marked `error` in the sheet (see Retry section)

Modes

generate-all — process everything automatically, no prompts (best for CI/batch runs)

python run.py --csv strings.csv --mode generate-all

batch — review all languages for one key at a time, confirm before generating

python run.py --csv strings.csv --mode batch

─────────────────────────────────────────
Batch 1/6 | Key: selected_language_is
─────────────────────────────────────────
   1. English      → "Selected language is English"
   2. Hindi        → "चुनी गई भाषा हिंदी है"
   3. Kannada      → "ಆಯ್ಕೆ ಮಾಡಲಾದ ಭಾಷೆ ಕನ್ನಡ"

Generate this batch? [y/s/q]:

one-at-a-time — review and confirm each language row individually

python run.py --csv strings.csv --mode one-at-a-time

─────────────────────────────────────────
Item 3/24 | selected_language_is | Kannada
─────────────────────────────────────────
Text: "ಆಯ್ಕೆ ಮಾಡಲಾದ ಭಾಷೆ ಕನ್ನಡ"

Generate? [y/s/q]:

In both interactive modes: y = generate, s = skip (stays needs_generation), q = quit.

Dry run

Preview what would happen without touching anything:

python run.py --csv strings.csv --mode generate-all --dry-run

Output

Local files

Generated audio is saved to:

output/
└── {language}/
    └── {iso_code}_{key}.mp3

Example: output/Hindi/hi_selected_language_is.mp3

Files are retained after upload for manual recovery if needed.

Google Drive

Files are uploaded to:

<GOOGLE_DRIVE_FOLDER_ID>/
└── AudioAssets/
    └── {language}/
        ├── hi_selected_language_is.mp3
        ├── hi_welcome_message.mp3
        └── ...

Subfolders are auto-created. Re-generating a string replaces the existing file and preserves the Drive link.

Google Sheet (`Audio Tracker` tab)

Column	Description
`Key`	String key
`Language`	Language name
`Text`	Source text
`Audio Status`	`needs_generation` / `generating` / `generated` / `error` / `skipped`
`Drive Link`	Shareable link to the `.mp3` on Google Drive
`Last Updated`	Timestamp of last write
`Notes`	Error messages (if any)

Retrying errors

If rows show error status in the sheet (e.g. after a Drive upload failure), run:

python run.py --retry-errors --mode generate-all

If output/{key}__{language}.mp3 exists locally → re-uploads to Drive, marks generated (no ElevenLabs call)
If no local file exists → resets to needs_generation to be re-generated on the next normal run

Idempotency & Resumability

Running with the same CSV multiple times is safe — generated rows are never re-processed
If a run is interrupted mid-way, completed rows keep their generated status; the next run picks up from needs_generation rows only
Rows stuck in generating (process killed mid-call) can be manually reset to needs_generation in the sheet

Project Structure

AudioGen/
├── run.py              # CLI entry point
├── config.py           # .env + voices.yaml loader
├── ingestion.py        # CSV parser and validator
├── sheets.py           # Google Sheets read/write layer
├── generator.py        # ElevenLabs TTS + run mode logic
├── drive.py            # Google Drive upload layer
├── pipeline.py         # on_success callback, retry, summary
├── voices.yaml         # Language → ElevenLabs voice ID mapping
├── requirements.txt    # Python dependencies
├── .env.example        # Environment variable template
└── credentials/        # Service account JSON (gitignored)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioGen — Multilingual Audio Generation Pipeline

How It Works

Setup

1. Install dependencies

2. Google Cloud — service account

3. Google Sheet

4. Google Drive — Shared Drive

5. ElevenLabs

6. Configure `.env`

7. Configure `voices.yaml`

Input Format

Usage

Arguments

Modes

Dry run

Output

Local files

Google Drive

Google Sheet (`Audio Tracker` tab)

Retrying errors

Idempotency & Resumability

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.env.example		.env.example
.gitignore		.gitignore
PRD.md		PRD.md
README.md		README.md
config.py		config.py
drive.py		drive.py
generator.py		generator.py
ingestion.py		ingestion.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
run.py		run.py
sample.csv		sample.csv
sheets.py		sheets.py
voices.yaml		voices.yaml

Folders and files

Latest commit

History

Repository files navigation

AudioGen — Multilingual Audio Generation Pipeline

How It Works

Setup

1. Install dependencies

2. Google Cloud — service account

3. Google Sheet

4. Google Drive — Shared Drive

5. ElevenLabs

6. Configure .env

7. Configure voices.yaml

Input Format

Usage

Arguments

Modes

Dry run

Output

Local files

Google Drive

Google Sheet (Audio Tracker tab)

Retrying errors

Idempotency & Resumability

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

6. Configure `.env`

7. Configure `voices.yaml`

Google Sheet (`Audio Tracker` tab)

Packages