Multi-Speaker TTS Pipeline on Google Cloud

A production-ready Text-to-Speech (TTS) pipeline that converts conversational text with multiple speakers into natural-sounding audio using Google Cloud services and the Gemini API.

Features

Multi-speaker support: Automatically assigns different voices to multiple speakers
Real-time status updates: Server-Sent Events (SSE) for live job progress tracking
Scalable architecture: Built on Google Cloud Run and Cloud Functions
Flexible TTS models: Supports various Gemini models with customizable prompts
Asynchronous processing: Non-blocking job submission with background processing
GitHub integration: Includes Claude Code GitHub Action for automated assistance

Architecture Overview

Cloud Function (submit_audio_job): HTTP endpoint for job submission
Cloud Run Job (tts-worker): Background worker for TTS processing using Gemini API
Cloud Run Service (events-gateway): SSE gateway for real-time status updates
Cloud Storage: Input text and output audio file storage
Cloud Tasks: Job queue management (configuration in place but using Cloud Run Jobs)
Pub/Sub: Event-driven communication between components

Quick Start

Prerequisites

Google Cloud Project with billing enabled
gcloud CLI installed and configured
Required environment variables:

export PROJECT_ID="your-project-id"
export REGION="your-region"  # e.g., us-central1, asia-northeast1

Setup Steps

Update gcloud CLI
```
gcloud components update
```

Enable required Google Cloud APIs

gcloud services enable \
  cloudfunctions.googleapis.com \
  run.googleapis.com \
  eventarc.googleapis.com \
  cloudtasks.googleapis.com \
  pubsub.googleapis.com \
  storage.googleapis.com \
  artifactregistry.googleapis.com \
  secretmanager.googleapis.com \
  cloudbuild.googleapis.com

Set up IAM permissions

# Grant Cloud Build permissions to your user account
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="user:YOUR_EMAIL@example.com" \
  --role="roles/cloudbuild.builds.editor"

Store Gemini API key in Secret Manager

# Replace YOUR_GEMINI_API_KEY with your actual API key
echo -n "YOUR_GEMINI_API_KEY" | gcloud secrets create gemini-api-key --data-file=-

# Grant access to the service account
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format='value(projectNumber)')
gcloud secrets add-iam-policy-binding gemini-api-key \
  --member="serviceAccount:$PROJECT_NUMBER-compute@developer.gserviceaccount.com" \
  --role="roles/secretmanager.secretAccessor" \
  --project=$PROJECT_ID

Deployment

Automated Deployment with deploy.sh

The project includes a convenient deployment script that handles all components:

# Set required environment variables
export PROJECT_ID="your-project-id"
export REGION="asia-northeast1"

# Deploy all components
./deploy.sh all

# Or deploy individual components
./deploy.sh gateway  # Deploy SSE Gateway (Cloud Run Service)
./deploy.sh worker   # Deploy TTS Worker (Cloud Run Job)
./deploy.sh function # Deploy Submit Function (Cloud Function)

# Show help
./deploy.sh help

The script automatically:

Validates environment variables
Builds container images
Deploys services with proper configurations
Sets up environment variables and secrets

Manual Deployment (Advanced)

For detailed control over the deployment process:

Create Cloud Storage buckets

gsutil mb -l $REGION gs://$PROJECT_ID-tts-input
gsutil mb -l $REGION gs://$PROJECT_ID-tts-output

# Apply security settings to prevent bucket listing (recommended)
# See Security Considerations section for details

Create Artifact Registry repository

gcloud artifacts repositories create tts \
  --repository-format=docker \
  --location=$REGION

Create Cloud Tasks queue

gcloud tasks queues create tts-queue --location=$REGION

Deploy the TTS Worker (Cloud Run Job)

# Build and deploy
gcloud builds submit worker/tts_worker \
  --tag $REGION-docker.pkg.dev/$PROJECT_ID/tts/worker:latest

gcloud run jobs create tts-worker \
  --image $REGION-docker.pkg.dev/$PROJECT_ID/tts/worker:latest \
  --region $REGION \
  --set-secrets GEMINI_API_KEY=gemini-api-key:latest

Deploy the Events Gateway (Cloud Run Service)

# Build and deploy
gcloud builds submit events-gateway \
  --tag $REGION-docker.pkg.dev/$PROJECT_ID/tts/gateway:latest

gcloud run deploy events-gateway \
  --image $REGION-docker.pkg.dev/$PROJECT_ID/tts/gateway:latest \
  --region $REGION \
  --allow-unauthenticated \
  --set-env-vars "PROJECT_ID=$PROJECT_ID"

Set up Pub/Sub topics and subscriptions

# Create topics
gcloud pubsub topics create gcs-object-finalize-events
gcloud pubsub topics create tts-finished

# Set up GCS notifications
gsutil notification create \
  -t projects/$PROJECT_ID/topics/gcs-object-finalize-events \
  -f json \
  gs://$PROJECT_ID-tts-output

Deploy the Submit Function (Cloud Function)

gcloud functions deploy submit_audio_job \
  --gen2 \
  --region $REGION \
  --runtime python311 \
  --entry-point main \
  --source functions/submit_audio_job \
  --trigger-http \
  --allow-unauthenticated

Configuration

Supported Parameters

Parameter	Type	Default	Description
`script`	string	Required	Conversation text with speaker labels
`speakers`	array	Required	List of speaker names
`model`	string	`gemini-2.5-flash-preview-tts`	Gemini model to use
`prompt`	string	`TTS the following conversation:`	System prompt for TTS generation
`job_id`	string	Auto-generated UUID	Custom job identifier

Voice Assignment

Speakers are automatically assigned to available voices:

Voices rotate between "Kore" and "Puck"
Assignment is based on speaker order in the array
Consistent voice assignment for each speaker throughout the conversation

API Usage

1. Submit a TTS Job

# Get the Cloud Function URL
FUNCTION_URL=$(gcloud functions describe submit_audio_job \
  --region $REGION --format 'value(serviceConfig.uri)')

# Submit a job
curl -X POST "$FUNCTION_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "script": "Alice: Hello there!\\nBob: Hi Alice, how are you?",
    "speakers": ["Alice", "Bob"],
    "prompt": "Read this conversation naturally",
    "model": "gemini-2.5-flash-preview-tts"
  }'

Response:

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "events_url": "https://events-gateway-xxx.run.app/events/550e8400-e29b-41d4-a716-446655440000"
}

2. Monitor Job Progress with SSE

# Connect to the SSE endpoint
curl -N "https://events-gateway-xxx.run.app/events/550e8400-e29b-41d4-a716-446655440000"

SSE Events:

data: {"job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "waiting", "url": "https://storage.googleapis.com/..."}

data: {"job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "url": "https://storage.googleapis.com/..."}

3. Download Generated Audio

# Download the WAV file
curl -o output.wav "https://storage.googleapis.com/your-project-tts-output/550e8400-e29b-41d4-a716-446655440000.wav"

Client Examples

JavaScript/TypeScript

class TTSClient {
  constructor(functionUrl) {
    this.functionUrl = functionUrl;
  }

  async submitJob(script, speakers, options = {}) {
    const response = await fetch(this.functionUrl, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        script,
        speakers,
        prompt: options.prompt,
        model: options.model,
        job_id: options.jobId
      })
    });
    return response.json();
  }

  monitorJob(eventsUrl, callbacks) {
    const eventSource = new EventSource(eventsUrl);
    
    eventSource.onmessage = (event) => {
      const data = JSON.parse(event.data);
      
      switch (data.status) {
        case 'waiting':
          callbacks.onWaiting?.(data);
          break;
        case 'completed':
          callbacks.onCompleted?.(data);
          eventSource.close();
          break;
        case 'error':
        case 'timeout':
          callbacks.onError?.(data);
          eventSource.close();
          break;
      }
    };
    
    eventSource.onerror = () => {
      callbacks.onError?.({ error: 'Connection failed' });
      eventSource.close();
    };
    
    return eventSource;
  }
}

// Usage
const client = new TTSClient('YOUR_FUNCTION_URL');

const job = await client.submitJob(
  "Alice: Hello!\\nBob: Hi there!",
  ["Alice", "Bob"],
  { prompt: "Natural conversation" }
);

client.monitorJob(job.events_url, {
  onWaiting: (data) => console.log('Processing...'),
  onCompleted: (data) => {
    const audio = new Audio(data.url);
    audio.play();
  },
  onError: (error) => console.error('Failed:', error)
});

Python

import requests
import sseclient
import json

class TTSClient:
    def __init__(self, function_url):
        self.function_url = function_url
    
    def submit_job(self, script, speakers, prompt=None, model=None):
        payload = {
            "script": script,
            "speakers": speakers
        }
        if prompt:
            payload["prompt"] = prompt
        if model:
            payload["model"] = model
        
        response = requests.post(self.function_url, json=payload)
        return response.json()
    
    def monitor_job(self, events_url):
        response = requests.get(events_url, stream=True)
        client = sseclient.SSEClient(response)
        
        for event in client.events():
            data = json.loads(event.data)
            yield data
            
            if data.get("status") in ["completed", "error", "timeout"]:
                break

# Usage
client = TTSClient("YOUR_FUNCTION_URL")

job = client.submit_job(
    "Alice: Hello!\\nBob: Hi there!",
    ["Alice", "Bob"]
)

for update in client.monitor_job(job["events_url"]):
    print(f"Status: {update['status']}")
    if update["status"] == "completed":
        print(f"Audio URL: {update['url']}")

Testing

Test SSE Connection

Use the included test script to verify SSE functionality:

./test_sse.sh

This script:

Creates a test job ID
Establishes an SSE connection
Publishes a test message via Pub/Sub
Verifies message delivery

Manual Testing

# Test job submission
curl -X POST "$FUNCTION_URL" \
  -H "Content-Type: application/json" \
  -d '{"script": "Test: Hello", "speakers": ["Test"]}'

# Test SSE connection
curl -N "$EVENTS_URL"

Monitoring and Logging

Viewing Logs in GCP Console (For Beginners)

The TTS Worker now includes enhanced structured logging to help track audio generation issues. Here's how to monitor your jobs:

1. Access Cloud Logging

Go to Google Cloud Console
Select your project from the dropdown at the top
In the left menu, navigate to Logging → Logs Explorer

2. View TTS Worker Logs

Use these queries in the Logs Explorer search bar:

View all TTS worker logs:

logName="projects/YOUR_PROJECT_ID/logs/tts-worker"

View logs for a specific job:

logName="projects/YOUR_PROJECT_ID/logs/tts-worker"
jsonPayload.job_id="YOUR_JOB_ID"

View only errors:

logName="projects/YOUR_PROJECT_ID/logs/tts-worker"
severity="ERROR"

Track short audio issues:

logName="projects/YOUR_PROJECT_ID/logs/tts-worker"
jsonPayload.message="Audio duration too short"

3. Understanding Log Fields

Each log entry contains structured data:

job_id: Unique identifier for the job
duration_seconds: Length of generated audio
duration_minutes: Length in minutes
speakers: Array of speaker names
model: Gemini model used
retry_reason: Why a retry was needed (e.g., "duration_too_short")
total_processing_time_seconds: Total time to generate audio

4. Setting Up Alerts (Optional)

In Logs Explorer, create a query for short audio:

logName="projects/YOUR_PROJECT_ID/logs/tts-worker"
jsonPayload.duration_seconds < 60

Click Create Alert above the results
Configure notification channels (email, SMS, etc.)

Audio Duration Validation

The system now automatically validates audio duration:

Minimum Duration: 60 seconds (1 minute)
Automatic Retry: If audio is shorter than 60 seconds, the system will:
1. Log a warning with the actual duration
2. Retry generation with modified prompts (up to 2 additional attempts)
3. Add instructions to speak slowly and clearly
Metadata Storage: Duration is stored in GCS metadata for each file

To check audio duration for existing files:

# View file metadata including duration
gsutil stat gs://$PROJECT_ID-tts-output/JOB_ID.wav

Monitoring Dashboard (Quick Setup)

Create a simple dashboard to monitor your TTS jobs:

Go to Monitoring → Dashboards in GCP Console
Click Create Dashboard
Add these widgets:
- Log-based metric: Audio generation success rate
- Log-based metric: Average audio duration
- Log panel: Recent errors

Example metric for average duration:

Go to Logging → Logs-based Metrics
Click Create Metric
Name: tts_audio_duration

Filter:

logName="projects/YOUR_PROJECT_ID/logs/tts-worker"
jsonPayload.duration_seconds > 0

Field name: jsonPayload.duration_seconds
Create and use in dashboards

Troubleshooting

Common Issues

"PROJECT_ID not set" error
- Ensure environment variables are exported: export PROJECT_ID=your-project-id
"Permission denied" errors
- Check IAM permissions for service accounts
- Verify Secret Manager access for Gemini API key
SSE connection timeouts
- SSE connections have a 5-minute maximum duration
- Implement reconnection logic in production clients
Audio generation failures
- Verify Gemini API key is valid
- Check Cloud Run Job logs: gcloud run jobs executions list --job=tts-worker
- Ensure speaker names in script match the speakers array
Short audio files (< 60 seconds)
- The system now automatically retries generation for short audio
- Check logs for "Audio duration too short" warnings
- Common causes:
  - Very short input scripts
  - Fast speech generation by the model
  - Missing or truncated content
- Manual workarounds:
  - Add more conversational content
  - Include pauses or stage directions
  - Use prompts that encourage slower speech
Deployment failures
- Ensure all APIs are enabled
- Check Cloud Build logs for container build issues
- Verify Artifact Registry repository exists

Debugging Commands

# Check Cloud Run Job executions
gcloud run jobs executions list --job=tts-worker --region=$REGION

# View Cloud Run Job logs
gcloud logging read "resource.type=cloud_run_job AND resource.labels.job_name=tts-worker" --limit=50

# Check Cloud Function logs
gcloud functions logs read submit_audio_job --region=$REGION

# View SSE Gateway logs
gcloud run services logs read events-gateway --region=$REGION

# Check audio durations for recent jobs
gcloud logging read 'logName="projects/'$PROJECT_ID'/logs/tts-worker" jsonPayload.duration_seconds>0' \
  --format="table(jsonPayload.job_id, jsonPayload.duration_seconds, jsonPayload.duration_minutes)" \
  --limit=10

# Find jobs with short audio (< 60 seconds)
gcloud logging read 'logName="projects/'$PROJECT_ID'/logs/tts-worker" jsonPayload.message="Audio duration too short"' \
  --format="table(jsonPayload.job_id, jsonPayload.duration_seconds, timestamp)" \
  --limit=20

Security Considerations

Cloud Storage Security

The project includes security configurations to prevent unauthorized bucket listing while maintaining file accessibility:

Bucket Listing Protection: Public access to list bucket contents is disabled
File Access: Individual files remain accessible via direct URLs (configurable)
Security Scripts: Use ./secure_bucket.sh to apply security settings

To secure your buckets:

# Quick security fix (prevents bucket listing)
./secure_bucket.sh

# Or manually apply settings
gsutil iam ch -d allUsers:objectViewer gs://$PROJECT_ID-tts-output
gsutil iam ch -d allUsers:legacyBucketReader gs://$PROJECT_ID-tts-output

For enhanced security, consider using the signed URL implementation in events-gateway/main_secure.py which provides:

Time-limited access to generated files
No permanent public URLs
Better access control and auditing

See BUCKET_SECURITY_GUIDE.md for detailed security options.

GitHub Actions Security

When using the Claude Code GitHub Action workflow, configure these secrets in your repository settings:

ANTHROPIC_API_KEY: Your Anthropic API key for Claude
GOOGLE_CLOUD_SERVICE_ACCOUNT_KEY: Service account JSON key with appropriate permissions
GOOGLE_CLOUD_PROJECT_ID: Your GCP project ID

To add these secrets:

Go to Settings → Secrets and variables → Actions
Click "New repository secret"
Add each secret with the appropriate value

Other Security Recommendations

The Cloud Function endpoint is publicly accessible but can be secured with authentication
Gemini API key is stored in Secret Manager
Consider implementing:
- API key authentication for the Cloud Function
- Rate limiting and quota management
- VPC Service Controls for additional network security
Before making the repository public:
- Ensure no API keys or credentials are hardcoded
- Review all configuration files for sensitive data
- Use environment variables for all project-specific settings

Contributing

This project uses GitHub Actions with Claude Code for automated assistance. To request help:

Create an issue or pull request
Mention @claude in your comment
Claude will analyze and provide assistance

License

This project is provided as-is for demonstration purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.cursor		.cursor
.github/workflows		.github/workflows
events-gateway		events-gateway
functions/submit_audio_job		functions/submit_audio_job
worker/tts_worker		worker/tts_worker
.gitignore		.gitignore
BUCKET_SECURITY_GUIDE.md		BUCKET_SECURITY_GUIDE.md
QUICK_FIX.md		QUICK_FIX.md
README.md		README.md
deploy.sh		deploy.sh
secure_bucket.sh		secure_bucket.sh
test_sse.sh		test_sse.sh

Folders and files

Latest commit

History

Repository files navigation

Multi-Speaker TTS Pipeline on Google Cloud

Features

Architecture Overview

Quick Start

Prerequisites

Setup Steps

Deployment

Automated Deployment with deploy.sh

Manual Deployment (Advanced)

Configuration

Supported Parameters

Voice Assignment

API Usage

1. Submit a TTS Job

2. Monitor Job Progress with SSE

3. Download Generated Audio

Client Examples

JavaScript/TypeScript

Python

Testing

Test SSE Connection

Manual Testing

Monitoring and Logging

Viewing Logs in GCP Console (For Beginners)

1. Access Cloud Logging

2. View TTS Worker Logs

3. Understanding Log Fields

4. Setting Up Alerts (Optional)

Audio Duration Validation

Monitoring Dashboard (Quick Setup)

Troubleshooting

Common Issues

Debugging Commands

Security Considerations

Cloud Storage Security

GitHub Actions Security

Other Security Recommendations

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages