Oneshot Copilot

A FastAPI-based copilot system that guides users through procedures step by step using computer vision analysis. The project is designed to be compatible with the Auki Real World Web ecosystem. This repository includes a Mentra Smart Glasses client application.

Overview

Oneshot Copilot guides users through multi-step procedures by analyzing video frames in real-time. It uses a VLM (Vision Language Model) service to verify each step's completion before allowing progression to the next step.

Important

Major Refactor Complete! Oneshot Copilot has been significantly refactored to be more modular. The system now supports multiple procedure sources and VLM providers. One new configuration uses an external Memory Service (private repository) for storing and retrieving procedure instructions. This is just one optional procedure configuration — you can still use local JSON procedures exactly as before.

Key Features

Step-by-step procedure guidance with configurable steps
Real-time frame analysis using VLM integration
Debounce logic requiring multiple consecutive YES responses
Frame buffering to handle high-frequency updates
Single in-flight request per user to prevent overload
Timeout handling with automatic reset
Client callbacks for progress updates
Modular VLM providers — switch between local and cloud VLM services
Flexible procedure sources — load procedures from local JSON or external Memory Service

Procedure Sources

Oneshot Copilot supports multiple ways to load procedure definitions:

Local JSON Procedures (Default)

Procedures are loaded from JSON files in app/data/procedures/. This is the standard approach and requires no external dependencies.

# Start a procedure by ID
POST /api/start_procedure?username=alice&procedure_id=pizza_custom@v1

# Or by file path  
POST /api/start_procedure?username=alice&procedure_file=app/data/procedures/pizza_custom.json

Memory Service Integration (Optional)

For advanced use cases, procedures can be fetched from an external Memory Service. This is a private repository that provides:

Centralized procedure storage and versioning
Dynamic procedure updates without redeployment
Integration with the Mentra ecosystem

To use Memory Service procedures, prefix the procedure ID with starting: in the MentraApp:

// In mainwebview.ejs
<div class="procedure-item" data-procedure="starting:detect_office_items">

This signals the backend to fetch the procedure from the Memory Service instead of local files.

Note

The Memory Service integration is optional. All existing local JSON procedures continue to work without any changes. You can mix and match both approaches.

Changing the VLM Provider

Oneshot Copilot supports multiple Vision Language Model (VLM) providers. Configure the provider in your .env file:

# VLM Provider Configuration
# Options: "local", "moondream", "auki_local"
VLM_PROVIDER=auki_local

Provider	Description	Required Config
`local`	Generic local VLM service	`VLM_URL`
`auki_local`	Auki Labs VLM Node (recommended for development)	`VLM_URL`
`moondream`	Moondream AI cloud service	`MOONDREAM_API_KEY`

See CLOUD_VLM.md for detailed Moondream configuration.

Architecture

Client → /ingest (frames) → State Machine → VLM Service
                                   ↓
Client ← Progress Updates ← State Machine ← /vlm/callback

Requirements

Core Requirements

Python 3.8+ with pip
FastAPI and dependencies (see requirements.txt)
VLM Service (Vision Language Model) - Choose one option:
- Local Auki VLM Node (Recommended for development):
  - Repository: Auki Labs VLM Node
  - Requires Docker with GPU support
  - Must run on port 8080
- Cloud VLM: Moondream AI account with API key

Mentra Integration (Required)

Mentra Account with:
- API key
- App domain (app.com.domain)
Setup tutorial: MentraOS Extended Example

RTSP/RTMP Streaming (Optional)

For automatic frame ingestion from video streams:

RTSP/RTMP Server (local or remote)

For local testing, you can use MediaMTX:

docker pull bluenviron/mediamtx
docker run --rm -it -p 8554:8554 -p 1935:1935 bluenviron/mediamtx

Configure RTSP_STREAM_URL in .env to enable

Installation

Clone the repository

cd oneshot_copilot

Create a virtual python environment in the root folder and run it.

venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Configure environment

cp .env.example .env
# Edit .env with your actual URLs

Docker Setup

This section provides instructions for building and running Oneshot Copilot using Docker. The Docker container includes both the FastAPI backend and the MentraApp frontend.

Prerequisites

Docker installed on your system
Configured environment files:
- .env in the root directory (copy from .env.example)
- MentraApp/.env in the MentraApp directory (copy from MentraApp/.env.example)
Optional GPU support: If using GPU-accelerated features (e.g., for VLM processing), ensure Docker has GPU access configured

Building the Docker Image

Clone the repository and navigate to the project directory:
```
git clone <repository-url>
cd oneshot-copilot
```
Build the Docker image:
```
docker build -t oneshot-copilot .
```
This will create a Docker image that includes:
- Python 3.11 environment with all dependencies
- Bun runtime for the MentraApp
- Both Oneshot Copilot and MentraApp services

Running the Container

Run the container with port mappings:
```
docker run -p 8000:8000 -p 3000:3000 oneshot-copilot
```
Port mappings:
- 8000: Oneshot Copilot FastAPI server
- 3000: MentraApp frontend server

Optional: Mount environment files (if you want to modify them without rebuilding):

docker run -p 8000:8000 -p 3000:3000 \
  -v $(pwd)/.env:/app/.env \
  -v $(pwd)/MentraApp/.env:/app/MentraApp/.env \
  oneshot-copilot

Optional: Run in detached mode:

docker run -d -p 8000:8000 -p 3000:3000 --name oneshot-container oneshot-copilot

Accessing the Services

Once the container is running:

Oneshot Copilot API: Available at http://localhost:8000
- API documentation: http://localhost:8000/docs
- Health check: http://localhost:8000/health
MentraApp Frontend: Available at http://localhost:3000
- Main application interface
- User authentication and procedure management

Environment Variables

The container expects the following environment variables to be configured in the respective .env files:

Root .env:

SELF_URL: Your server's public URL
MENTRA_URL: Mentra webhook URL
VLM_URL: VLM service URL
USE_CLOUD_VLM: Whether to use cloud VLM (true/false)
MOONDREAM_API_KEY: API key for Moondream AI (if using cloud VLM)
Other configuration options as documented in the Configuration section

MentraApp/.env:

MENTRAOS_API_KEY: Mentra OS API key
PORT: Port for Mentra app (default: 3000)
PACKAGE_NAME: Application package identifier
RTMP_URL: RTMP stream URL (optional)

Troubleshooting

Container fails to start: Ensure .env and MentraApp/.env files exist and are properly configured
Port conflicts: If ports 8000 or 3000 are already in use, modify the port mappings (e.g., -p 8001:8000)
Permission issues: Ensure Docker has access to the project directory
VLM connection issues: Verify VLM_URL is accessible from within the container
Logs: View container logs with docker logs <container-name> for debugging
GPU support: If using GPU features, ensure NVIDIA Docker runtime is configured (--gpus all flag may be needed)

Notes

The container runs both services simultaneously using a startup script
For development, consider mounting source code volumes for live reloading
Ensure your .env files contain all required configuration before building
The container uses Python virtual environment and Bun for optimal performance

Configuration

Edit .env file with your settings:

SELF_URL=https://your-server.ngrok-free.app
MENTRA_URL=https://client-webhook.example.com
VLM_URL=https://vlm-service.ngrok.app
MAX_FRAMES_PER_USER=10

# Optional: RTSP/RTMP Stream Auto-Ingestion
RTSP_STREAM_URL=rtsp://192.168.1.100:8554/live/stream
STREAM_USERNAME=stream_user

RTSP Stream Auto-Ingestion (Optional)

Oneshot Copilot can automatically ingest frames from an RTSP or RTMP stream. When configured, it will:

Connect to the stream on application startup
Filter frames by quality (blur and brightness)
Automatically POST quality frames to the /ingest endpoint
Send approximately 1 frame per second

To enable:

Set RTSP_STREAM_URL to your camera/stream URL
Set STREAM_USERNAME to identify frames from this stream

Local VLM Setup (Auki VLM Node)

For local development, you can use the Auki Labs VLM Node which provides vision language model capabilities:

Prerequisites

Docker with GPU support (NVIDIA GPU recommended)
NVIDIA Container Toolkit installed
At least 8GB GPU VRAM

Setup Instructions

Clone the Auki VLM Node repository:

git clone https://github.com/aukilabs/vlm-node.git
cd vlm-node

Build and run with GPU support:
```
make docker-gpu
```
Verify the service is running: The VLM Node will start on port 8080 by default. You can verify it's running:
```
curl http://localhost:8080/health
```
Configure Oneshot Copilot to use the local VLM: In your .env file, set:
```
VLM_URL=http://localhost:8080
USE_CLOUD_VLM=false
```

Note: The Auki VLM Node must be running before starting the Oneshot Copilot server.

Moondream AI Cloud VLM Support

Oneshot Copilot supports both local and cloud-based VLM services. By default, it uses a local VLM service, but you can switch to Moondream AI cloud VLM for production deployments.

To use Moondream AI cloud VLM:

Sign up at moondream.ai and get your API key
Set USE_CLOUD_VLM=true in .env
Set MOONDREAM_API_KEY=your_key in .env
See CLOUD_VLM.md for detailed documentation

Configuration example:

USE_CLOUD_VLM=true
MOONDREAM_API_KEY=your_moondream_api_key_here

Important Note: Moondream AI cloud does not support negative questions. When using cloud VLM, only positive questions from procedures are sent; negative questions are ignored (see CLOUD_VLM.md for details). Moondream AI cloud is also rate limited.

Mentra App Structure and Configuration

The Mentra app is a separate application that provides the frontend interface and handles user authentication for Oneshot Copilot. It's built using Express.js and the Mentra SDK.

Directory Structure

MentraApp/
├── src/
│   ├── index.ts                    # Main application entry point
│   ├── tools.ts                    # Mentra SDK tools integration
│   ├── webview.ts                  # Web view management
│   └── services/
│       ├── AudioFeedback.ts        # Audio feedback service
│       └── UserMetadataService.ts  # User metadata handling
├── views/
│   ├── login_view.ejs              # Login page template
│   ├── mainwebview.ejs             # Main interface template
│   └── taskview.ejs                # Task view template
├── public/
│   └── css/
│       └── style.css               # Application styles
├── .env                            # Environment configuration (create from .env.example)
├── .env.example                    # Example environment configuration
├── package.json                    # Node.js dependencies
└── tsconfig.json                   # TypeScript configuration

Mentra App Environment Configuration

Create a .env file in the MentraApp directory based on .env.example:

# Mentra OS API Key (required)
# Get this from your Mentra developer account at https://mentra.com
MENTRAOS_API_KEY=your_mentra_api_key_here

# Port for the Mentra app (default: 3000)
PORT=3000

# Package name for your Mentra application
# This should match your app registration in Mentra OS
PACKAGE_NAME=com.yourcompany.oneshotcopilot

# RTMP stream URL (optional)
# URL to the RTMP server for streaming video frames
# Must match the RTSP_STREAM_URL configuration in the main app
RTMP_URL=rtmp://192.168.1.100:1935/live/oneshot

Configuration Parameters Explained

MENTRAOS_API_KEY (Required): Your Mentra OS API key obtained from the Mentra developer portal
PORT (Optional): The port on which the Mentra app will run (default: 3000)
PACKAGE_NAME (Required): Your application's package identifier in Mentra OS (e.g., com.yourcompany.oneshotcopilot)
RTMP_URL (Optional): The RTMP stream URL for video ingestion, should correspond to the RTSP stream configured in the main application

Dependencies

The Mentra app requires:

Node.js 18 or higher (up to Node.js 22)
Bun runtime (recommended) or npm
Mentra SDK (@mentra/sdk)
Express.js for the web server
EJS for templating

Running the Application

0. Start the VLM Node (Required for Local VLM)

If using the local Auki VLM Node (recommended for development):

Navigate to the VLM Node directory:
```
cd vlm-node
```
Start the VLM service with GPU support:
```
make docker-gpu
```
Verify it's running on port 8080:
```
curl http://localhost:8080/health
```

Note: If using Moondream AI cloud VLM instead, skip this step and ensure USE_CLOUD_VLM=true in your .env file.

1. Start the Mentra App (Required)

The Mentra app handles user authentication and provides the frontend interface. It must be started before the main server.

Setup Mentra following the MentraOS Extended Example tutorial
Configure environment - Get API keys and configure .env in the MentraApp directory:
```
cd MentraApp
cp .env.example .env
# Edit .env with your Mentra credentials
```
Install dependencies and start:
```
bun update
bun run dev
```
Expose with Ngrok - Follow the tutorial to expose your local port with Ngrok

2. Start the Oneshot Copilot Server

After Mentra is running, start the main FastAPI server:

uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

The server will start at http://localhost:8000 Make sure to expose the server with Ngrok

Note: For development with auto-reload, you can use:

uvicorn app.main:app --reload

Running the RTMP Server (required)

docker pull bluenviron/mediamtx

docker run --rm -p 1935:1935 -p 8554:8554 -p 8888:8888 bluenviron/mediamtx Link to the Repo: https://github.com/bluenviron/mediamtx

SERVER API Endpoints

Procedure Control (*not thoroughly tested)

POST /start_procedure - Start a new procedure for a user
- Parameters: username, procedure_file (optional)
- Returns: {ok: true, procedure_id: "..."}
GET /status - Get user's current status
- Parameters: username
- Returns: State object with current step info
*POST /pause - Pause active procedure
- Parameters: username
- Returns: {ok: true}
*POST /resume - Resume paused procedure
- Parameters: username
- Returns: {ok: true}
*POST /abort - Abort active procedure
- Parameters: username
- Returns: {ok: true}
POST /trigger_stream_reconnect - Trigger immediate RTSP stream reconnection
- No parameters required
- Returns: {ok: true, message: "Stream reconnection triggered"}

Frame Ingestion

POST /ingest - Upload a frame for analysis
- Form data: username, frame_id, file (image)
- Returns: {ok: true, queued: true}

VLM Callback

POST /vlm/callback - Receive VLM analysis results (called by VLM service)
- Query params: user, procedure_id, step_id, frame_id, idem
- Body: {decision: "YES"|"NO"|"UNCERTAIN"|"NOT_APPLICABLE"}
- Returns: {ok: true}

Usage Example

Disclaimer: Multi-user has not been tested yet, so far usernames have been hardcoded.

1. Start a procedure

curl -X POST "http://localhost:8000/start_procedure?username=john&procedure_file=app/data/procedures/pizza_custom.json"

2. Send frames

curl -X POST "http://localhost:8000/ingest" \
  -F "username=john" \
  -F "frame_id=frame_001" \
  -F "file=@image.jpg"

3. Check status

curl "http://localhost:8000/status?username=john"

Procedure Format

Procedures are defined in JSON files in app/data/procedures/:

{
  "id": "pizza_custom@v1",
  "name": "Make a Custom Pizza",
  "version": 1,
  "steps": [
    {
      "id": 1,
      "name": "Show pizza dough",
      "positives": ["A plain pizza base or dough is clearly visible..."],
      "negatives": ["No pizza base or dough is visible..."],
      "timeout_s": 20,
      "debounce": {
        "consecutive_yes": 2
      }
    }
  ]
}

State Machine Logic

Frame buffering: Up to 10 frames per user (configurable)
Single in-flight: Only one VLM request active per user at a time
YES-only progression: Only YES decisions increment the counter
Debounce: Requires 2 consecutive YES responses by default
Timeout reset: Timeout resets the YES counter but keeps the same step

Project Structure

oneshot_copilot/
├── app/
│   ├── main.py                      # FastAPI application
│   ├── config.py                    # Configuration settings
│   ├── models/
│   │   ├── state.py                # UserState, Decision enums
│   │   └── procedure.py            # ProcedureDef, StepDef
│   ├── core/
│   │   ├── statemachine.py         # State machine logic
│   │   ├── vlm_client.py           # VLM HTTP client
│   │   ├── callbacks.py            # Event callbacks
│   │   └── frame_store.py          # Frame storage
│   ├── services/
│   │   ├── stream_quality_filter.py # RTSP stream ingestion
│   │   └── status_service.py       # Status tracking
│   └── api/
│       ├── procedure.py            # Procedure endpoints
│       ├── ingest.py               # Frame ingestion
│       └── vlm_callback.py         # VLM callback handler
└── app/data/
    └── procedures/
        └── pizza_custom.json       # Example procedure

Authors

Mika Haak "@augmentedcamel"

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.idea		.idea
MentraApp		MentraApp
app		app
archive/app		archive/app
docs		docs
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
AudioAnalysisDesign.md		AudioAnalysisDesign.md
CLOUD_VLM.md		CLOUD_VLM.md
DEBUG_VLM.md		DEBUG_VLM.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
ReadMe2.md		ReadMe2.md
STREAM_INTEGRATION.md		STREAM_INTEGRATION.md
ask_question.py		ask_question.py
debug_cli.py		debug_cli.py
debug_feedback.py		debug_feedback.py
inspect_memory_procedure.py		inspect_memory_procedure.py
reproduce_issue.py		reproduce_issue.py
requirements.txt		requirements.txt
start_dev.ps1		start_dev.ps1
verify_audio_pipeline.py		verify_audio_pipeline.py
verify_refactor.py		verify_refactor.py

Folders and files

Latest commit

History

Repository files navigation

Oneshot Copilot

Overview

Key Features

Procedure Sources

Local JSON Procedures (Default)

Memory Service Integration (Optional)

Changing the VLM Provider

Architecture

Requirements

Core Requirements

Mentra Integration (Required)

RTSP/RTMP Streaming (Optional)

Installation

Docker Setup

Prerequisites

Building the Docker Image

Running the Container

Accessing the Services

Environment Variables

Troubleshooting

Notes

Configuration

RTSP Stream Auto-Ingestion (Optional)

Local VLM Setup (Auki VLM Node)

Prerequisites

Setup Instructions

Moondream AI Cloud VLM Support

Mentra App Structure and Configuration

Directory Structure

Mentra App Environment Configuration

Configuration Parameters Explained

Dependencies

Running the Application

0. Start the VLM Node (Required for Local VLM)

1. Start the Mentra App (Required)

2. Start the Oneshot Copilot Server

Running the RTMP Server (required)

SERVER API Endpoints

Procedure Control (*not thoroughly tested)

Frame Ingestion

VLM Callback

Usage Example

1. Start a procedure

2. Send frames

3. Check status

Procedure Format

State Machine Logic

Project Structure

Authors

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages