Gemini Live Agent Challenge · Category: UI Navigator
Live Demo → | Process Ticket | Morning Report | Contract Catalog
Support teams handling UI-related tickets face two compounding issues:
- Ambiguous screenshots — agents describe what they think they see, not what the UI actually shows. A "login doesn't work" ticket can mean 5 different root causes requiring 5 entirely different resolutions.
- Inconsistent responses — without a constrained action space, agents improvise answers. The same issue gets different instructions depending on who handles it, leading to repeat contacts and escalations.
MOVA Ticket Agent solves both: Gemini's multimodal vision reads the screenshot directly, and a contract system constrains which resolution paths are even possible — making every response auditable and repeatable.
- Reads screenshots, not descriptions — Gemini 2.5 Flash extracts up to 14 structured UI signals (error codes, form state, step indicators, button labels) directly from the uploaded image, bypassing agent interpretation bias
- Contracts constrain the answer space — each resolution path is defined in a JSON contract specifying allowed actions, required inputs, and permitted outputs; the model cannot hallucinate outside these bounds
- Channel-aware resolution — chat tickets receive a single actionable next step; email tickets receive a ranked solution list — same signal set, different contract, different output shape
- Full audit trail per episode — every ticket becomes an episode: raw signals → candidate contracts → business checks → resolution → saved to Firestore; reviewable anytime
- Overnight morning report — aggregates all processed episodes into a structured daily digest with channel breakdown, contract usage stats, and escalation flags
The deployed app is fully functional with real Gemini API calls, Firestore storage, and Cloud Storage.
-
Open → https://contract-support-agent-799834288723.europe-west3.run.app
-
Submit a chat ticket (simulates a user who can't log in):
- Description:
I can't log into my account, it says my credentials are invalid - Channel:
chat - Upload:
demo_assets/screenshots/login_screen_error.png(red error banner: "Invalid credentials or account locked", 5/5 failed attempts) - Click Process Ticket
- Description:
-
See the pipeline trace: signals extracted from screenshot → contracts ranked → business checks → single next step instruction for chat
-
Submit an email ticket (simulates a user stuck waiting for confirmation):
- Description:
I registered but never received the confirmation email - Channel:
email - Upload:
demo_assets/screenshots/registration_waiting_confirmation.png(confirmation pending screen, step 2/3) - Click Process Ticket
- Description:
-
See the difference: same pipeline, different contract → ranked solution list instead of single step
-
Morning Report → — see both episodes aggregated: channel breakdown, contract usage, escalation status
-
Contract Catalog → — browse all 8 contracts; click any to see its full specification (allowed actions, required inputs, output shape)
┌──────────────────────────────────────────────────────┐
│ User Input: ticket text + screenshot + channel │
└──────────────────┬───────────────────────────────────┘
│
┌────────▼────────┐
│ Signal Agent │ ← Gemini 2.5 Flash (multimodal)
│ │ contracts: extract_support_signals_v1
│ Extracts ≤14 │ identify_support_step_v1
│ structured │
│ UI signals │
└────────┬────────┘
│ signal_set: {error_code, form_state, step, ...}
┌────────▼────────┐
│ Contract Agent │ ← Gemini (text)
│ │ contract: rank_candidate_resolution_contracts_v1
│ Ranks which │
│ resolution │
│ contract fits │
└────────┬────────┘
│ selected_contract
┌────────▼──────────────┐
│ Business Check Agent │ ← Read-only Firestore queries
│ │ contracts: check_account_state_v1
│ Validates account / │ check_registration_state_v1
│ registration state │
└────────┬──────────────┘
│ checks: {account_locked, email_sent, ...}
┌────────▼──────────────────────────────────┐
│ Resolution Agent │ ← Gemini (text)
│ │
│ chat → chat_guided_resolution_v1 │
│ "single next step instruction" │
│ │
│ email → email_ranked_resolution_v1 │
│ "ranked solution list (1–3)" │
│ │
│ unknown→ escalate_unknown_case_v1 │
└────────┬──────────────────────────────────┘
│
┌────────▼────────┐
│ Episode │ → Firestore (full audit trail)
│ Storage │ → Morning Report aggregation
└─────────────────┘
Key design principle: The contract constrains the allowed action space. Gemini extracts signals and selects contracts — but cannot produce outputs outside the contract's allowed_actions and success_outputs. This makes every resolution auditable and reproducible.
| Layer | Technology | Why |
|---|---|---|
| LLM / Vision | Gemini 2.5 Flash | Multimodal: reads screenshot pixels directly; fast and cost-efficient for batch processing |
| Agent Orchestration | Google ADK (custom) | ADK-style agent classes with explicit contract references; each agent has a single responsibility |
| Gen AI SDK | google-genai |
Unified client for both Vertex AI and API key auth; structured JSON output mode |
| Backend | Python 3.12 + FastAPI | Async-first, clean route separation between HTML views and JSON API |
| Frontend | Jinja2 + Vanilla CSS | Zero JS dependencies; pipeline trace renders server-side for reliability |
| Persistence | Firestore | Schemaless episode storage; in-memory fallback for local dev without credentials |
| File Storage | Cloud Storage | Screenshot upload with signed URL pattern; MIME auto-detection |
| Deployment | Cloud Run | Serverless, scales to zero; deployed via gcloud run deploy --source . in one command |
| Contracts | JSON (8 files) | Declarative, version-controlled, diff-able; no code changes needed to constrain agent behavior |
All 8 contracts live in contracts/. Each contract is a JSON file defining:
{
"contract_id": "chat_guided_resolution_v1",
"kind": "resolution",
"purpose": "Provide the next single troubleshooting step for chat interaction",
"applicable_channels": ["chat"],
"required_inputs": ["selected_contract", "signal_set"],
"allowed_actions": ["generate_next_step_instruction"],
"success_outputs": ["chat_next_step_instruction"]
}| Contract | Kind | Role |
|---|---|---|
extract_support_signals_v1 |
signal_extraction | Defines the 14 signals Gemini must extract from a screenshot |
identify_support_step_v1 |
step_identification | Maps visual cues to a support step label |
rank_candidate_resolution_contracts_v1 |
diagnostic | Ranks which resolution contract best fits the signals |
check_account_state_v1 |
business_check | Read-only account lock / suspension check |
check_registration_state_v1 |
business_check | Read-only registration / email confirmation check |
chat_guided_resolution_v1 |
resolution | Chat channel: produce one next step |
email_ranked_resolution_v1 |
resolution | Email channel: produce ranked solution list |
escalate_unknown_case_v1 |
escalation | Fallback when no contract matches with sufficient confidence |
- Python 3.12+
pip- A Google AI Studio API key → aistudio.google.com (free tier works)
- (Optional) Google Cloud project with Firestore + Cloud Storage for full persistence
# 1. Clone and enter
git clone https://github.com/your-org/mova-ticket-agent
cd mova-ticket-agent
# 2. Create virtual environment and install
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# 3. Configure environment
cp .env.example .env
# Open .env and set GEMINI_API_KEY=your_key_here
# Everything else has sensible defaults (in-memory Firestore fallback)
# 4. Start
uvicorn app.main:app --reload --host 0.0.0.0 --port 8080| Variable | Required | Description |
|---|---|---|
GEMINI_API_KEY |
Yes | Google AI Studio API key |
GOOGLE_GENAI_USE_VERTEXAI |
No | Set true to use Vertex AI instead of API key |
GOOGLE_CLOUD_PROJECT |
No | GCP project ID (for Firestore + Storage) |
STORAGE_BUCKET |
No | Cloud Storage bucket name for screenshots |
MODEL_NAME |
No | Gemini model (default: gemini-2.5-flash) |
Without
GOOGLE_CLOUD_PROJECT, the app runs with an in-memory Firestore fallback — fully functional for demo purposes.
| Page | URL | What judges see |
|---|---|---|
| Process Ticket | / |
Ticket submission form + live activity stats |
| Morning Report | /demo/report |
Overnight digest: channel breakdown, contract usage, escalations |
| Ticket Detail | /demo/ticket/{id} |
Full pipeline trace: screenshot → signals → contracts → resolution |
| Contract Catalog | /demo/contracts |
All 8 contracts with kind badges and purpose descriptions |
| Contract Viewer | /demo/contracts/{id} |
Single contract: full JSON spec + allowed actions |
| Episode Viewer | /demo/episode/{id} |
Raw episode audit trail stored in Firestore |
POST /api/tickets/process Process ticket (multipart/form-data: description, channel, screenshot)
GET /api/contracts Contract catalog (JSON)
GET /api/contracts/{id} Single contract (JSON)
GET /api/reports/morning Morning report (JSON)
GET /api/episodes Episodes list (JSON)
GET /api/episodes/{id} Single episode (JSON)
gcloud run deploy contract-support-agent \
--source . \
--project YOUR_PROJECT_ID \
--region europe-west3 \
--allow-unauthenticated \
--set-env-vars="GOOGLE_GENAI_USE_VERTEXAI=false,GEMINI_API_KEY=YOUR_KEY,MODEL_NAME=gemini-2.5-flash"| Risk | How we addressed it |
|---|---|
| Hallucination | Resolution agents cannot produce outputs outside the contract's allowed_actions; the contract is passed verbatim into the prompt as a hard constraint |
| Scope creep | Business check agents are read-only by design; the contract explicitly lists prohibited_actions: ["write", "update", "delete"] |
| Auditability | Every ticket produces a Firestore episode with full intermediate state: raw signals, contract ranking scores, check results, and final resolution — reviewable at any time |
| Bias in signal extraction | Signal extraction contract defines an explicit closed vocabulary of 14 fields; the model fills in values or returns null, it cannot introduce new signal categories |
- Streaming resolution — surface Gemini's reasoning token-by-token via SSE for real-time chat UX
- Contract versioning UI — allow support managers to edit contracts in the browser and A/B test resolution quality
- Feedback loop — agents rate resolution quality post-ticket; low-rated outcomes trigger contract refinement suggestions via Gemini
- Multi-language signal extraction — extend prompt contracts to handle non-English UI screenshots (currently English only)
- Vertex AI RAG grounding — ground resolution contracts against a knowledge base of historical resolved tickets
mova-ticket-agent/
├── app/
│ ├── agents/ # ADK-style agents (signal, contract, business_check, resolution, root)
│ ├── services/ # Gemini, Firestore, Cloud Storage, business checks
│ ├── routers/ # FastAPI route handlers (html_pages, api)
│ ├── config.py # Settings (Pydantic BaseModel)
│ └── main.py # App factory
├── contracts/ # 8 JSON contracts (declarative action constraints)
├── templates/ # Jinja2 HTML templates
├── static/ # CSS + logo
├── demo_assets/
│ └── screenshots/ # Two pre-built demo PNG screenshots
├── Dockerfile
└── requirements.txt