Skip to content

mova-compact/MOVA-ticket-agent

Repository files navigation

MOVA logo

MOVA Ticket Agent

Contract-Driven Multimodal UI Support Agent

Gemini Live Agent Challenge · Category: UI Navigator

Live Demo Python Gemini FastAPI

Live Demo → | Process Ticket | Morning Report | Contract Catalog


The Problem

Support teams handling UI-related tickets face two compounding issues:

  • Ambiguous screenshots — agents describe what they think they see, not what the UI actually shows. A "login doesn't work" ticket can mean 5 different root causes requiring 5 entirely different resolutions.
  • Inconsistent responses — without a constrained action space, agents improvise answers. The same issue gets different instructions depending on who handles it, leading to repeat contacts and escalations.

MOVA Ticket Agent solves both: Gemini's multimodal vision reads the screenshot directly, and a contract system constrains which resolution paths are even possible — making every response auditable and repeatable.


What It Does

  • Reads screenshots, not descriptions — Gemini 2.5 Flash extracts up to 14 structured UI signals (error codes, form state, step indicators, button labels) directly from the uploaded image, bypassing agent interpretation bias
  • Contracts constrain the answer space — each resolution path is defined in a JSON contract specifying allowed actions, required inputs, and permitted outputs; the model cannot hallucinate outside these bounds
  • Channel-aware resolution — chat tickets receive a single actionable next step; email tickets receive a ranked solution list — same signal set, different contract, different output shape
  • Full audit trail per episode — every ticket becomes an episode: raw signals → candidate contracts → business checks → resolution → saved to Firestore; reviewable anytime
  • Overnight morning report — aggregates all processed episodes into a structured daily digest with channel breakdown, contract usage stats, and escalation flags

Live Demo — Step-by-Step for Judges

The deployed app is fully functional with real Gemini API calls, Firestore storage, and Cloud Storage.

Quick path (2 minutes)

  1. Openhttps://contract-support-agent-799834288723.europe-west3.run.app

  2. Submit a chat ticket (simulates a user who can't log in):

    • Description: I can't log into my account, it says my credentials are invalid
    • Channel: chat
    • Upload: demo_assets/screenshots/login_screen_error.png (red error banner: "Invalid credentials or account locked", 5/5 failed attempts)
    • Click Process Ticket
  3. See the pipeline trace: signals extracted from screenshot → contracts ranked → business checks → single next step instruction for chat

  4. Submit an email ticket (simulates a user stuck waiting for confirmation):

    • Description: I registered but never received the confirmation email
    • Channel: email
    • Upload: demo_assets/screenshots/registration_waiting_confirmation.png (confirmation pending screen, step 2/3)
    • Click Process Ticket
  5. See the difference: same pipeline, different contract → ranked solution list instead of single step

  6. Morning Report → — see both episodes aggregated: channel breakdown, contract usage, escalation status

  7. Contract Catalog → — browse all 8 contracts; click any to see its full specification (allowed actions, required inputs, output shape)


Architecture

┌──────────────────────────────────────────────────────┐
│  User Input: ticket text + screenshot + channel       │
└──────────────────┬───────────────────────────────────┘
                   │
          ┌────────▼────────┐
          │  Signal Agent   │  ← Gemini 2.5 Flash (multimodal)
          │                 │    contracts: extract_support_signals_v1
          │  Extracts ≤14   │              identify_support_step_v1
          │  structured     │
          │  UI signals     │
          └────────┬────────┘
                   │ signal_set: {error_code, form_state, step, ...}
          ┌────────▼────────┐
          │ Contract Agent  │  ← Gemini (text)
          │                 │    contract: rank_candidate_resolution_contracts_v1
          │ Ranks which     │
          │ resolution      │
          │ contract fits   │
          └────────┬────────┘
                   │ selected_contract
          ┌────────▼──────────────┐
          │ Business Check Agent  │  ← Read-only Firestore queries
          │                       │    contracts: check_account_state_v1
          │ Validates account /   │              check_registration_state_v1
          │ registration state    │
          └────────┬──────────────┘
                   │ checks: {account_locked, email_sent, ...}
          ┌────────▼──────────────────────────────────┐
          │         Resolution Agent                   │  ← Gemini (text)
          │                                            │
          │  chat   → chat_guided_resolution_v1        │
          │           "single next step instruction"   │
          │                                            │
          │  email  → email_ranked_resolution_v1       │
          │           "ranked solution list (1–3)"     │
          │                                            │
          │  unknown→ escalate_unknown_case_v1         │
          └────────┬──────────────────────────────────┘
                   │
          ┌────────▼────────┐
          │    Episode      │  → Firestore (full audit trail)
          │    Storage      │  → Morning Report aggregation
          └─────────────────┘

Key design principle: The contract constrains the allowed action space. Gemini extracts signals and selects contracts — but cannot produce outputs outside the contract's allowed_actions and success_outputs. This makes every resolution auditable and reproducible.


Tech Stack

Layer Technology Why
LLM / Vision Gemini 2.5 Flash Multimodal: reads screenshot pixels directly; fast and cost-efficient for batch processing
Agent Orchestration Google ADK (custom) ADK-style agent classes with explicit contract references; each agent has a single responsibility
Gen AI SDK google-genai Unified client for both Vertex AI and API key auth; structured JSON output mode
Backend Python 3.12 + FastAPI Async-first, clean route separation between HTML views and JSON API
Frontend Jinja2 + Vanilla CSS Zero JS dependencies; pipeline trace renders server-side for reliability
Persistence Firestore Schemaless episode storage; in-memory fallback for local dev without credentials
File Storage Cloud Storage Screenshot upload with signed URL pattern; MIME auto-detection
Deployment Cloud Run Serverless, scales to zero; deployed via gcloud run deploy --source . in one command
Contracts JSON (8 files) Declarative, version-controlled, diff-able; no code changes needed to constrain agent behavior

Contract System

All 8 contracts live in contracts/. Each contract is a JSON file defining:

{
  "contract_id": "chat_guided_resolution_v1",
  "kind": "resolution",
  "purpose": "Provide the next single troubleshooting step for chat interaction",
  "applicable_channels": ["chat"],
  "required_inputs": ["selected_contract", "signal_set"],
  "allowed_actions": ["generate_next_step_instruction"],
  "success_outputs": ["chat_next_step_instruction"]
}
Contract Kind Role
extract_support_signals_v1 signal_extraction Defines the 14 signals Gemini must extract from a screenshot
identify_support_step_v1 step_identification Maps visual cues to a support step label
rank_candidate_resolution_contracts_v1 diagnostic Ranks which resolution contract best fits the signals
check_account_state_v1 business_check Read-only account lock / suspension check
check_registration_state_v1 business_check Read-only registration / email confirmation check
chat_guided_resolution_v1 resolution Chat channel: produce one next step
email_ranked_resolution_v1 resolution Email channel: produce ranked solution list
escalate_unknown_case_v1 escalation Fallback when no contract matches with sufficient confidence

Getting Started

Prerequisites

  • Python 3.12+
  • pip
  • A Google AI Studio API key → aistudio.google.com (free tier works)
  • (Optional) Google Cloud project with Firestore + Cloud Storage for full persistence

Run locally in 4 steps

# 1. Clone and enter
git clone https://github.com/your-org/mova-ticket-agent
cd mova-ticket-agent

# 2. Create virtual environment and install
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt

# 3. Configure environment
cp .env.example .env
# Open .env and set GEMINI_API_KEY=your_key_here
# Everything else has sensible defaults (in-memory Firestore fallback)

# 4. Start
uvicorn app.main:app --reload --host 0.0.0.0 --port 8080

Open http://localhost:8080

Environment variables

Variable Required Description
GEMINI_API_KEY Yes Google AI Studio API key
GOOGLE_GENAI_USE_VERTEXAI No Set true to use Vertex AI instead of API key
GOOGLE_CLOUD_PROJECT No GCP project ID (for Firestore + Storage)
STORAGE_BUCKET No Cloud Storage bucket name for screenshots
MODEL_NAME No Gemini model (default: gemini-2.5-flash)

Without GOOGLE_CLOUD_PROJECT, the app runs with an in-memory Firestore fallback — fully functional for demo purposes.


UI Pages Reference

Page URL What judges see
Process Ticket / Ticket submission form + live activity stats
Morning Report /demo/report Overnight digest: channel breakdown, contract usage, escalations
Ticket Detail /demo/ticket/{id} Full pipeline trace: screenshot → signals → contracts → resolution
Contract Catalog /demo/contracts All 8 contracts with kind badges and purpose descriptions
Contract Viewer /demo/contracts/{id} Single contract: full JSON spec + allowed actions
Episode Viewer /demo/episode/{id} Raw episode audit trail stored in Firestore

API Reference

POST /api/tickets/process       Process ticket (multipart/form-data: description, channel, screenshot)
GET  /api/contracts             Contract catalog (JSON)
GET  /api/contracts/{id}        Single contract (JSON)
GET  /api/reports/morning       Morning report (JSON)
GET  /api/episodes              Episodes list (JSON)
GET  /api/episodes/{id}         Single episode (JSON)

Deploy to Cloud Run

gcloud run deploy contract-support-agent \
  --source . \
  --project YOUR_PROJECT_ID \
  --region europe-west3 \
  --allow-unauthenticated \
  --set-env-vars="GOOGLE_GENAI_USE_VERTEXAI=false,GEMINI_API_KEY=YOUR_KEY,MODEL_NAME=gemini-2.5-flash"

Responsible AI

Risk How we addressed it
Hallucination Resolution agents cannot produce outputs outside the contract's allowed_actions; the contract is passed verbatim into the prompt as a hard constraint
Scope creep Business check agents are read-only by design; the contract explicitly lists prohibited_actions: ["write", "update", "delete"]
Auditability Every ticket produces a Firestore episode with full intermediate state: raw signals, contract ranking scores, check results, and final resolution — reviewable at any time
Bias in signal extraction Signal extraction contract defines an explicit closed vocabulary of 14 fields; the model fills in values or returns null, it cannot introduce new signal categories

What's Next

  • Streaming resolution — surface Gemini's reasoning token-by-token via SSE for real-time chat UX
  • Contract versioning UI — allow support managers to edit contracts in the browser and A/B test resolution quality
  • Feedback loop — agents rate resolution quality post-ticket; low-rated outcomes trigger contract refinement suggestions via Gemini
  • Multi-language signal extraction — extend prompt contracts to handle non-English UI screenshots (currently English only)
  • Vertex AI RAG grounding — ground resolution contracts against a knowledge base of historical resolved tickets

Project Structure

mova-ticket-agent/
├── app/
│   ├── agents/          # ADK-style agents (signal, contract, business_check, resolution, root)
│   ├── services/        # Gemini, Firestore, Cloud Storage, business checks
│   ├── routers/         # FastAPI route handlers (html_pages, api)
│   ├── config.py        # Settings (Pydantic BaseModel)
│   └── main.py          # App factory
├── contracts/           # 8 JSON contracts (declarative action constraints)
├── templates/           # Jinja2 HTML templates
├── static/              # CSS + logo
├── demo_assets/
│   └── screenshots/     # Two pre-built demo PNG screenshots
├── Dockerfile
└── requirements.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors