Launch your own private instance of the Three-Up Orchestrator to Google Cloud in just one click. The deployment script will automatically provision your Storage Bucket, configure CORS, and set up your IAM roles.
A web application that generates three different emotional variations ("Takes") of a given text script and synthesizes them into speech using Google's Gemini TTS.
It demonstrates how to orchestrate:
- Gemini Generative AI (
gemini-3.1-flash-lite-preview) to rewrite the prompt strictly inserting emotion/technical voice tags. - Gemini TTS (
gemini-3.1-flash-tts-preview) to read the tagged variations with different vocal energies. - A Lit Web Component frontend with custom text-tag visualization.
- AI Orchestration: Automates an entire "VO Booth" session, generating an enhanced script and 3 unique emotional variations from a single prompt.
- Gemini TTS: Leverages the latest
gemini-3.1-flash-tts-previewmodel for high-fidelity voice synthesis. - Dual Design Systems: A frontend built with Lit Web Components, featuring two toggleable aesthetic themes (Synthetix Studio Dark & Sunrise Studio Light).
- Deep Observability: Fully instrumented with OpenTelemetry to track exact latency costs of Gemini prompts and audio generation.
- Go 1.25+
- Node.js 20+
- A Google Cloud Project with Vertex AI enabled.
- Application Default Credentials (ADC) configured locally (
gcloud auth application-default login).
Edit backend/.env to configure your Google Cloud project details and bucket for audio generation:
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1
PORT=8080
GENMEDIA_BUCKET=your-bucket-name
GEMINI_MODEL=gemini-3.1-flash-lite-preview
GEMINI_TTS_MODEL=gemini-3.1-flash-tts-previewThe application stores generated TTS audio in a GCS bucket (GENMEDIA_BUCKET) and streams it directly to the browser. To allow cross-origin audio playback, you must configure CORS on this bucket so the browser can stream the 206 Partial Content audio:
gcloud storage buckets update gs://<your-bucket-name> --cors-file=cors.jsonTo run the full stack locally during development, you can use the provided Makefile.
To start everything concurrently (frontend & backend):
make dev- The backend API runs on
http://localhost:8080 - The frontend dev server runs on
http://localhost:5173 - The frontend server automatically proxies
/apirequests to the backend.
Individual Service Commands:
make dev-server: Run just the Go backend.make dev-frontend: Run just the Vite frontend.make build-frontend: Build the production frontend bundle intobackend/dist.make build-run: Build the frontend, place it in the backend's directory, and run the backend. Since the backend is configured to serve static files fromdist/, the app will be fully available onhttp://localhost:8080.
To test functionality, open the frontend (http://localhost:5173 during make dev, or http://localhost:8080 via make build-run), paste a script like:
"Our kittens are raised in a cage-free environment with 24/7 medical supervision."
Click Generate Three-Up Takes. After processing, the UI will display the three variations with inline audio tags (e.g., [happy], [sarcasm]) and audio players to listen to the generated Gemini TTS output.
NOTE: Pricing based on Gemini 3.1 Flash Lite Preview and 3.1 Flash TTS Preview as of April 2026. (Sources: Vertex AI Pricing, Cloud Text-to-Speech Pricing)*
- Gemini 3.1 Flash-Lite (Text): $0.25/1M (Input), $1.50/1M (Output)
- Gemini 3.1 Flash TTS (Audio): $1.00/1M (Text Input), $20.00/1M (Audio Output tokens)
- Note: Audio is billed at 200 tokens per second of generated speech.
Context: A single user request providing a short 50-word script. The system generates an enhanced script, 3 tagged variations, and then synthesizes 3 separate audio files in parallel.
| Operation | Model | Tokens | Rate | Cost Estimate |
|---|---|---|---|---|
| Text Gen (Input) | gemini-3.1-flash-lite-preview |
~450 | $0.25 / 1M | ~$0.000113 |
| Text Gen (Output) | gemini-3.1-flash-lite-preview |
~400 | $1.50 / 1M | ~$0.000600 |
| TTS Gen (Input) | gemini-3.1-flash-tts |
~250 | $1.00 / 1M | ~$0.000250 |
| TTS Gen (Output) | gemini-3.1-flash-tts |
~12,000 (Audio) | $20.00 / 1M | ~$0.240000 |
| Total Cost | ~13,100 | ~$0.240963 (~24¢) |
Generating an entire 3-take orchestrated VO session costs roughly 24 cents, making this architecture highly scalable for production use cases.
For a deeper dive into the system architecture, component design, and operational learnings from building with Gemini TTS, please refer to the documentation:
- Architecture & Operational Learnings
- Synthetix Studio Design System (Dark)
- Sunrise Studio Design System (Light)
The application is containerized using Docker and is configured for deployment to Google Cloud Run.
- Ensure your
gcloudCLI is configured and authenticated. - Run the deployment script:
make deploy
# OR
./scripts/deploy.shThis script will:
- Load environment variables from
backend/.env(or use defaults). - Build the Docker container using
gcloud builds submit. - Deploy the application to Cloud Run with unauthenticated access enabled.
Upon success, gcloud will output the public URL of your application.
If you want to deploy this application publicly to a different Google Cloud project (without Identity-Aware Proxy restrictions), you can configure the deployment script via environment variables.
- Enable Required APIs in your new project:
gcloud services enable run.googleapis.com cloudbuild.googleapis.com aiplatform.googleapis.com storage.googleapis.com --project=<NEW_PROJECT_ID>
- Configure Environment Variables (either export them in your terminal or create a
.env.deployfile):export PROJECT_ID="<NEW_PROJECT_ID>" export USE_IAP="false" # Skips IAP setup and uses --allow-unauthenticated export SERVICE_NAME="threeup-audio" # (Optional) Override the default service name export GENMEDIA_BUCKET="<NEW_BUCKET>" # Ensure your target project has this bucket created and CORS-enabled
- Run the script:
./scripts/deploy.sh
The script will automatically create the necessary service account in the new project, grant it the required Vertex AI and Storage permissions, build the image, and deploy it to Cloud Run publicly.
To enable production features like bot protection, distributed rate limiting, and scalable analytics, you can set the following environment variables (either in your .env.deploy file or in the Cloud Run configuration):
To track generation events (voice actor popularity, text lengths, performance) in BigQuery:
BQ_DATASET: Your BigQuery Dataset ID.BQ_TABLE: Your BigQuery Table ID.DEMO_NAME: (Optional) Name to identify this app in the metrics (default:take3bounce).
Note: The service account deployed with Cloud Run automatically includes the required BigQuery permissions, provided the dataset already exists.
To protect the text generation endpoint against automated abuse:
RECAPTCHA_SITE_KEY: Your Google reCAPTCHA Enterprise site key.- Note: Ensure your domain or Cloud Run URL is added to the allowed domains list in the Google Cloud Console.
To strictly enforce rate limits across multiple horizontally scaled Cloud Run instances:
REDIS_URL: A standard Redis connection string (e.g.,redis://10.0.0.3:6379/0). This could be Google Cloud Memorystore or a Serverless Redis instance.
The Three-Up backend is fully instrumented with OpenTelemetry (OTel), providing deep visibility into the orchestration engine's performance. By default, it exports traces directly to Google Cloud Trace when deployed.
- HTTP Requests: Every incoming API call (e.g.,
/api/variations,/api/variation-single) is tracked from start to finish via theotelhttpmiddleware. - LLM Text Generation (
LLM_Generate_Text): Captures the exact latency of the Gemini prompt logic. - TTS Audio Synthesis (
TTS_Generation): Each parallel TTS audio request has its own child span. It captures the specifictake, thevoiceName, and crucially, the retryattemptnumber if the Vertex API throws a safety block. - Google Cloud Storage (
GCS_Audio_Upload): Tracks the final network latency of uploading the generated WAV files back to the bucket. - Downstream Linkage: Google's internal API network timings are automatically appended as leaf nodes to your traces!
To view traces generated from your local machine:
- Ensure your
GOOGLE_CLOUD_PROJECTis set in your.envfile. - Authenticate locally with Application Default Credentials:
gcloud auth application-default login - Run the backend:
cd backend && go run . - Generate a take in the UI, then navigate to the Trace page in your Google Cloud Console. You'll see a beautiful waterfall chart breaking down the exact millisecond cost of every Gemini and GCS interaction.