A production-ready Text-to-Speech (TTS) pipeline that converts conversational text with multiple speakers into natural-sounding audio using Google Cloud services and the Gemini API.
- Multi-speaker support: Automatically assigns different voices to multiple speakers
- Real-time status updates: Server-Sent Events (SSE) for live job progress tracking
- Scalable architecture: Built on Google Cloud Run and Cloud Functions
- Flexible TTS models: Supports various Gemini models with customizable prompts
- Asynchronous processing: Non-blocking job submission with background processing
- GitHub integration: Includes Claude Code GitHub Action for automated assistance
- Cloud Function (
submit_audio_job): HTTP endpoint for job submission - Cloud Run Job (
tts-worker): Background worker for TTS processing using Gemini API - Cloud Run Service (
events-gateway): SSE gateway for real-time status updates - Cloud Storage: Input text and output audio file storage
- Cloud Tasks: Job queue management (configuration in place but using Cloud Run Jobs)
- Pub/Sub: Event-driven communication between components
- Google Cloud Project with billing enabled
gcloudCLI installed and configured- Required environment variables:
export PROJECT_ID="your-project-id"
export REGION="your-region" # e.g., us-central1, asia-northeast1-
Update gcloud CLI
gcloud components update
-
Enable required Google Cloud APIs
gcloud services enable \ cloudfunctions.googleapis.com \ run.googleapis.com \ eventarc.googleapis.com \ cloudtasks.googleapis.com \ pubsub.googleapis.com \ storage.googleapis.com \ artifactregistry.googleapis.com \ secretmanager.googleapis.com \ cloudbuild.googleapis.com -
Set up IAM permissions
# Grant Cloud Build permissions to your user account gcloud projects add-iam-policy-binding $PROJECT_ID \ --member="user:YOUR_EMAIL@example.com" \ --role="roles/cloudbuild.builds.editor"
-
Store Gemini API key in Secret Manager
# Replace YOUR_GEMINI_API_KEY with your actual API key echo -n "YOUR_GEMINI_API_KEY" | gcloud secrets create gemini-api-key --data-file=- # Grant access to the service account PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format='value(projectNumber)') gcloud secrets add-iam-policy-binding gemini-api-key \ --member="serviceAccount:$PROJECT_NUMBER-compute@developer.gserviceaccount.com" \ --role="roles/secretmanager.secretAccessor" \ --project=$PROJECT_ID
The project includes a convenient deployment script that handles all components:
# Set required environment variables
export PROJECT_ID="your-project-id"
export REGION="asia-northeast1"
# Deploy all components
./deploy.sh all
# Or deploy individual components
./deploy.sh gateway # Deploy SSE Gateway (Cloud Run Service)
./deploy.sh worker # Deploy TTS Worker (Cloud Run Job)
./deploy.sh function # Deploy Submit Function (Cloud Function)
# Show help
./deploy.sh helpThe script automatically:
- Validates environment variables
- Builds container images
- Deploys services with proper configurations
- Sets up environment variables and secrets
For detailed control over the deployment process:
-
Create Cloud Storage buckets
gsutil mb -l $REGION gs://$PROJECT_ID-tts-input gsutil mb -l $REGION gs://$PROJECT_ID-tts-output # Apply security settings to prevent bucket listing (recommended) # See Security Considerations section for details
-
Create Artifact Registry repository
gcloud artifacts repositories create tts \ --repository-format=docker \ --location=$REGION -
Create Cloud Tasks queue
gcloud tasks queues create tts-queue --location=$REGION -
Deploy the TTS Worker (Cloud Run Job)
# Build and deploy gcloud builds submit worker/tts_worker \ --tag $REGION-docker.pkg.dev/$PROJECT_ID/tts/worker:latest gcloud run jobs create tts-worker \ --image $REGION-docker.pkg.dev/$PROJECT_ID/tts/worker:latest \ --region $REGION \ --set-secrets GEMINI_API_KEY=gemini-api-key:latest
-
Deploy the Events Gateway (Cloud Run Service)
# Build and deploy gcloud builds submit events-gateway \ --tag $REGION-docker.pkg.dev/$PROJECT_ID/tts/gateway:latest gcloud run deploy events-gateway \ --image $REGION-docker.pkg.dev/$PROJECT_ID/tts/gateway:latest \ --region $REGION \ --allow-unauthenticated \ --set-env-vars "PROJECT_ID=$PROJECT_ID"
-
Set up Pub/Sub topics and subscriptions
# Create topics gcloud pubsub topics create gcs-object-finalize-events gcloud pubsub topics create tts-finished # Set up GCS notifications gsutil notification create \ -t projects/$PROJECT_ID/topics/gcs-object-finalize-events \ -f json \ gs://$PROJECT_ID-tts-output
-
Deploy the Submit Function (Cloud Function)
gcloud functions deploy submit_audio_job \ --gen2 \ --region $REGION \ --runtime python311 \ --entry-point main \ --source functions/submit_audio_job \ --trigger-http \ --allow-unauthenticated
| Parameter | Type | Default | Description |
|---|---|---|---|
script |
string | Required | Conversation text with speaker labels |
speakers |
array | Required | List of speaker names |
model |
string | gemini-2.5-flash-preview-tts |
Gemini model to use |
prompt |
string | TTS the following conversation: |
System prompt for TTS generation |
job_id |
string | Auto-generated UUID | Custom job identifier |
Speakers are automatically assigned to available voices:
- Voices rotate between "Kore" and "Puck"
- Assignment is based on speaker order in the array
- Consistent voice assignment for each speaker throughout the conversation
# Get the Cloud Function URL
FUNCTION_URL=$(gcloud functions describe submit_audio_job \
--region $REGION --format 'value(serviceConfig.uri)')
# Submit a job
curl -X POST "$FUNCTION_URL" \
-H "Content-Type: application/json" \
-d '{
"script": "Alice: Hello there!\\nBob: Hi Alice, how are you?",
"speakers": ["Alice", "Bob"],
"prompt": "Read this conversation naturally",
"model": "gemini-2.5-flash-preview-tts"
}'Response:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"events_url": "https://events-gateway-xxx.run.app/events/550e8400-e29b-41d4-a716-446655440000"
}# Connect to the SSE endpoint
curl -N "https://events-gateway-xxx.run.app/events/550e8400-e29b-41d4-a716-446655440000"SSE Events:
data: {"job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "waiting", "url": "https://storage.googleapis.com/..."}
data: {"job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "completed", "url": "https://storage.googleapis.com/..."}
# Download the WAV file
curl -o output.wav "https://storage.googleapis.com/your-project-tts-output/550e8400-e29b-41d4-a716-446655440000.wav"class TTSClient {
constructor(functionUrl) {
this.functionUrl = functionUrl;
}
async submitJob(script, speakers, options = {}) {
const response = await fetch(this.functionUrl, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
script,
speakers,
prompt: options.prompt,
model: options.model,
job_id: options.jobId
})
});
return response.json();
}
monitorJob(eventsUrl, callbacks) {
const eventSource = new EventSource(eventsUrl);
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
switch (data.status) {
case 'waiting':
callbacks.onWaiting?.(data);
break;
case 'completed':
callbacks.onCompleted?.(data);
eventSource.close();
break;
case 'error':
case 'timeout':
callbacks.onError?.(data);
eventSource.close();
break;
}
};
eventSource.onerror = () => {
callbacks.onError?.({ error: 'Connection failed' });
eventSource.close();
};
return eventSource;
}
}
// Usage
const client = new TTSClient('YOUR_FUNCTION_URL');
const job = await client.submitJob(
"Alice: Hello!\\nBob: Hi there!",
["Alice", "Bob"],
{ prompt: "Natural conversation" }
);
client.monitorJob(job.events_url, {
onWaiting: (data) => console.log('Processing...'),
onCompleted: (data) => {
const audio = new Audio(data.url);
audio.play();
},
onError: (error) => console.error('Failed:', error)
});import requests
import sseclient
import json
class TTSClient:
def __init__(self, function_url):
self.function_url = function_url
def submit_job(self, script, speakers, prompt=None, model=None):
payload = {
"script": script,
"speakers": speakers
}
if prompt:
payload["prompt"] = prompt
if model:
payload["model"] = model
response = requests.post(self.function_url, json=payload)
return response.json()
def monitor_job(self, events_url):
response = requests.get(events_url, stream=True)
client = sseclient.SSEClient(response)
for event in client.events():
data = json.loads(event.data)
yield data
if data.get("status") in ["completed", "error", "timeout"]:
break
# Usage
client = TTSClient("YOUR_FUNCTION_URL")
job = client.submit_job(
"Alice: Hello!\\nBob: Hi there!",
["Alice", "Bob"]
)
for update in client.monitor_job(job["events_url"]):
print(f"Status: {update['status']}")
if update["status"] == "completed":
print(f"Audio URL: {update['url']}")Use the included test script to verify SSE functionality:
./test_sse.shThis script:
- Creates a test job ID
- Establishes an SSE connection
- Publishes a test message via Pub/Sub
- Verifies message delivery
# Test job submission
curl -X POST "$FUNCTION_URL" \
-H "Content-Type: application/json" \
-d '{"script": "Test: Hello", "speakers": ["Test"]}'
# Test SSE connection
curl -N "$EVENTS_URL"The TTS Worker now includes enhanced structured logging to help track audio generation issues. Here's how to monitor your jobs:
- Go to Google Cloud Console
- Select your project from the dropdown at the top
- In the left menu, navigate to Logging → Logs Explorer
Use these queries in the Logs Explorer search bar:
View all TTS worker logs:
logName="projects/YOUR_PROJECT_ID/logs/tts-worker"
View logs for a specific job:
logName="projects/YOUR_PROJECT_ID/logs/tts-worker"
jsonPayload.job_id="YOUR_JOB_ID"
View only errors:
logName="projects/YOUR_PROJECT_ID/logs/tts-worker"
severity="ERROR"
Track short audio issues:
logName="projects/YOUR_PROJECT_ID/logs/tts-worker"
jsonPayload.message="Audio duration too short"
Each log entry contains structured data:
job_id: Unique identifier for the jobduration_seconds: Length of generated audioduration_minutes: Length in minutesspeakers: Array of speaker namesmodel: Gemini model usedretry_reason: Why a retry was needed (e.g., "duration_too_short")total_processing_time_seconds: Total time to generate audio
- In Logs Explorer, create a query for short audio:
logName="projects/YOUR_PROJECT_ID/logs/tts-worker" jsonPayload.duration_seconds < 60 - Click Create Alert above the results
- Configure notification channels (email, SMS, etc.)
The system now automatically validates audio duration:
- Minimum Duration: 60 seconds (1 minute)
- Automatic Retry: If audio is shorter than 60 seconds, the system will:
- Log a warning with the actual duration
- Retry generation with modified prompts (up to 2 additional attempts)
- Add instructions to speak slowly and clearly
- Metadata Storage: Duration is stored in GCS metadata for each file
To check audio duration for existing files:
# View file metadata including duration
gsutil stat gs://$PROJECT_ID-tts-output/JOB_ID.wavCreate a simple dashboard to monitor your TTS jobs:
- Go to Monitoring → Dashboards in GCP Console
- Click Create Dashboard
- Add these widgets:
- Log-based metric: Audio generation success rate
- Log-based metric: Average audio duration
- Log panel: Recent errors
Example metric for average duration:
- Go to Logging → Logs-based Metrics
- Click Create Metric
- Name:
tts_audio_duration - Filter:
logName="projects/YOUR_PROJECT_ID/logs/tts-worker" jsonPayload.duration_seconds > 0 - Field name:
jsonPayload.duration_seconds - Create and use in dashboards
-
"PROJECT_ID not set" error
- Ensure environment variables are exported:
export PROJECT_ID=your-project-id
- Ensure environment variables are exported:
-
"Permission denied" errors
- Check IAM permissions for service accounts
- Verify Secret Manager access for Gemini API key
-
SSE connection timeouts
- SSE connections have a 5-minute maximum duration
- Implement reconnection logic in production clients
-
Audio generation failures
- Verify Gemini API key is valid
- Check Cloud Run Job logs:
gcloud run jobs executions list --job=tts-worker - Ensure speaker names in script match the speakers array
-
Short audio files (< 60 seconds)
- The system now automatically retries generation for short audio
- Check logs for "Audio duration too short" warnings
- Common causes:
- Very short input scripts
- Fast speech generation by the model
- Missing or truncated content
- Manual workarounds:
- Add more conversational content
- Include pauses or stage directions
- Use prompts that encourage slower speech
-
Deployment failures
- Ensure all APIs are enabled
- Check Cloud Build logs for container build issues
- Verify Artifact Registry repository exists
# Check Cloud Run Job executions
gcloud run jobs executions list --job=tts-worker --region=$REGION
# View Cloud Run Job logs
gcloud logging read "resource.type=cloud_run_job AND resource.labels.job_name=tts-worker" --limit=50
# Check Cloud Function logs
gcloud functions logs read submit_audio_job --region=$REGION
# View SSE Gateway logs
gcloud run services logs read events-gateway --region=$REGION
# Check audio durations for recent jobs
gcloud logging read 'logName="projects/'$PROJECT_ID'/logs/tts-worker" jsonPayload.duration_seconds>0' \
--format="table(jsonPayload.job_id, jsonPayload.duration_seconds, jsonPayload.duration_minutes)" \
--limit=10
# Find jobs with short audio (< 60 seconds)
gcloud logging read 'logName="projects/'$PROJECT_ID'/logs/tts-worker" jsonPayload.message="Audio duration too short"' \
--format="table(jsonPayload.job_id, jsonPayload.duration_seconds, timestamp)" \
--limit=20The project includes security configurations to prevent unauthorized bucket listing while maintaining file accessibility:
- Bucket Listing Protection: Public access to list bucket contents is disabled
- File Access: Individual files remain accessible via direct URLs (configurable)
- Security Scripts: Use
./secure_bucket.shto apply security settings
To secure your buckets:
# Quick security fix (prevents bucket listing)
./secure_bucket.sh
# Or manually apply settings
gsutil iam ch -d allUsers:objectViewer gs://$PROJECT_ID-tts-output
gsutil iam ch -d allUsers:legacyBucketReader gs://$PROJECT_ID-tts-outputFor enhanced security, consider using the signed URL implementation in events-gateway/main_secure.py which provides:
- Time-limited access to generated files
- No permanent public URLs
- Better access control and auditing
See BUCKET_SECURITY_GUIDE.md for detailed security options.
When using the Claude Code GitHub Action workflow, configure these secrets in your repository settings:
- ANTHROPIC_API_KEY: Your Anthropic API key for Claude
- GOOGLE_CLOUD_SERVICE_ACCOUNT_KEY: Service account JSON key with appropriate permissions
- GOOGLE_CLOUD_PROJECT_ID: Your GCP project ID
To add these secrets:
- Go to Settings → Secrets and variables → Actions
- Click "New repository secret"
- Add each secret with the appropriate value
- The Cloud Function endpoint is publicly accessible but can be secured with authentication
- Gemini API key is stored in Secret Manager
- Consider implementing:
- API key authentication for the Cloud Function
- Rate limiting and quota management
- VPC Service Controls for additional network security
- Before making the repository public:
- Ensure no API keys or credentials are hardcoded
- Review all configuration files for sensitive data
- Use environment variables for all project-specific settings
This project uses GitHub Actions with Claude Code for automated assistance. To request help:
- Create an issue or pull request
- Mention
@claudein your comment - Claude will analyze and provide assistance
This project is provided as-is for demonstration purposes.