A self-tuning, local-first AI video analytics runtime that adapts to your hardware in real time.
EdgeTune is a privacy-focused video analytics system that runs entirely on your local machine β no cloud, no API keys required. It pairs YOLOv8 object detection with a finite-state autopilot that continuously monitors your GPU/CPU and dynamically adjusts inference parameters (precision, resolution, frame skipping, model variant) to squeeze the best possible performance from your hardware.
An optional LLM analyst (via local Ollama or Google Gemini) explains every autopilot decision in plain language, so you always understand why the system made a specific optimization.
| Feature | Description |
|---|---|
| π Fully Local | Video never leaves your machine. All inference and analysis run on local hardware. |
| βοΈ Self-Tuning Autopilot | A 4-state FSM (Stable β Soft β Balanced β Aggressive) monitors GPU utilisation, FPS drops, and VRAM pressure, then auto-tunes inference parameters with hysteresis and cooldown to prevent oscillation. |
| π§ Dual-Brain Architecture | Fast Brain: YOLOv8 for real-time detection. Slow Brain: Local LLM for semantic explanations of system decisions. |
| π Real-Time Dashboard | Live GPU, VRAM, FPS, and latency charts streamed over WebSockets at low latency. |
| ποΈ Hot-Reconfigurable | Switch models (YOLOv8n/s/m), change autopilot mode (Speed / Balanced / Accuracy), or upload custom .pt models β all without restarting. |
| π¦ 3 Models Included | YOLOv8n (nano), YOLOv8s (small), and YOLOv8m (medium) are bundled out-of-the-box β no separate downloads needed. |
| π₯ Flexible Input | Webcam feed or uploaded video files with full playback controls (pause, seek, speed). |
| π₯ Export & Reporting | Download a CSV report of hardware telemetry, autopilot decisions, and LLM explanations. |
| π₯οΈ Hardware-Aware | Auto-detects NVIDIA GPUs via pynvml, reads VRAM and compute capability, and classifies into performance tiers (Low / Mid / High / CPU-only). Falls back gracefully to CPU. |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (Next.js 16 Β· React 19 Β· TypeScript Β· Tailwind CSS) β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ βββββββββββββ ββββββββββββββββ β
β βVideo Feedβ βGPU Chart β βFPS Graph β βVRAM Chart β βAutopilot Log β β
β ββββββββββββ ββββββββββββ ββββββββββββ βββββββββββββ ββββββββββββββββ β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββ ββββββββββββββββββ β
β βSource Selectorβ βModel Selectorβ βLLM Feed β βAnalysis Export β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββ ββββββββββββββββββ β
β β
β useWebSocket (custom hook) β
β β² WebSocket β
ββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββ
β Backend (Python Β· FastAPI Β· Uvicorn) β
β β β
β ββββββββββββββββββ βββββββ΄ββββββββββββββ βββββββββββββββββββββ β
β β REST API β β WebSocket Handler β β Pipeline β β
β β /api/health β β /ws β β Orchestrator β β
β β /api/hardware β β Β· telemetry β β β β
β β /api/inference β β Β· decisions β β VideoSource β β
β β /api/source β β Β· llm_explanation β β β β β
β β /api/models β β Β· video_frame β β InferenceEngine β β
β ββββββββββββββββββ β Β· source_progress β β β β β
β ββββββββββββββββββββββ β TelemetryMonitor β β
β β β β β
β βββββββββββββββββββββββ β Autopilot FSM β β
β β Hardware Profiler β β β β β
β β GPU/CPU detection β β LLM Analyst β β
β β VRAM / Tier / FP16 β β (Ollama/Gemini) β β
β βββββββββββββββββββββββ βββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- VideoSource captures frames from a webcam or uploaded file.
- InferenceEngine runs YOLOv8 detection with the current parameter set.
- TelemetryMonitor samples GPU utilisation, VRAM, FPS, and latency at 500 ms intervals.
- AutopilotController evaluates the telemetry snapshot against mode-specific thresholds and transitions the FSM, applying parameter changes (precision, resolution, frame skip, model swap).
- LLMAnalyst (optional) explains each state transition in 1β3 sentences via Ollama or Gemini.
- WebSocket Handler broadcasts annotated video frames, telemetry, decisions, and explanations to the dashboard.
EdgeTune/
βββ backend/
β βββ main.py # FastAPI entrypoint & pipeline orchestrator
β βββ requirements.txt
β βββ config/
β β βββ settings.yaml # All runtime configuration
β βββ core/
β β βββ inference_engine.py # YOLO wrapper with hot-reconfiguration
β β βββ autopilot_controller.py # 4-state FSM optimisation engine
β β βββ telemetry_monitor.py # GPU/CPU/FPS sampling
β β βββ hardware_profiler.py # GPU detection & tier classification
β β βββ video_source.py # Camera & file input with playback
β βββ llm/
β β βββ analyst.py # LLM decision explainer (Ollama/Gemini)
β β βββ discovery.py # Auto-detect available Ollama models
β βββ api/
β βββ routes.py # REST endpoints
β βββ websocket.py # WebSocket manager & broadcast helpers
β
βββ frontend/
β βββ package.json
β βββ src/
β βββ app/
β β βββ layout.tsx
β β βββ page.tsx # Main dashboard page
β βββ components/
β β βββ video-feed.tsx # Live annotated video stream
β β βββ gpu-chart.tsx # GPU utilisation chart
β β βββ vram-chart.tsx # VRAM usage chart
β β βββ fps-graph.tsx # FPS over time graph
β β βββ autopilot-timeline.tsx # Autopilot decision log
β β βββ llm-feed.tsx # LLM explanation feed
β β βββ source-selector.tsx # Webcam / file input picker
β β βββ model-selector.tsx # YOLO model switcher + upload
β β βββ mode-selector.tsx # Speed / Balanced / Accuracy toggle
β β βββ playback-controls.tsx # Video seek, pause, speed
β β βββ hardware-info.tsx # GPU/CPU hardware card
β β βββ analysis-export.tsx # CSV download
β β βββ connection-status.tsx # WebSocket status indicator
β βββ hooks/
β β βββ useWebSocket.ts # WebSocket client with auto-reconnect
β βββ lib/
β βββ api.ts # REST API client
β βββ types.ts # Shared TypeScript interfaces
β
βββ .gitignore
βββ README.md
| Requirement | Notes |
|---|---|
| Python 3.10+ | Backend runtime |
| Node.js 18+ | Frontend build |
| NVIDIA GPU | Recommended; falls back to CPU automatically |
| Ollama | Optional β only needed for local LLM explanations |
cd backend
# Create and activate a virtual environment
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate
# Install dependencies (includes PyTorch with CUDA via Ultralytics)
pip install -r requirements.txt
# Start the API server
python main.pyThe API will be available at http://localhost:8000. API docs at /docs.
cd frontend
npm install
npm run devOpen http://localhost:3000 in your browser.
If you want AI-powered explanations of autopilot decisions:
# Install Ollama from https://ollama.com
ollama pull phi3:mini # lightweight, fast
# or: ollama pull llama3 / mistralEdgeTune auto-discovers available Ollama models at startup. You can also configure Gemini in settings.yaml by setting a GEMINI_API_KEY.
- Open the dashboard at
http://localhost:3000. - Pick a video source β webcam or upload a video file.
- (Optional) Select a YOLO model or upload a custom
.ptfile. - Choose an autopilot mode β Speed, Balanced, or Accuracy.
- Click Start Inference.
- Watch the dashboard in real time:
- Video Feed β annotated detections overlaid on the live stream.
- Performance Cards β GPU %, FPS, VRAM, and latency at a glance.
- Charts β GPU utilisation, VRAM, and FPS history graphs.
- Autopilot Timeline β every FSM transition with reason and applied parameters.
- LLM Insights β plain-language explanations of optimisation decisions.
- When finished, Export Analysis as a CSV.
The main dashboard displays live video streams with YOLOv8 object detection overlays, showing confidence scores for detected objects. The interface features real-time performance metrics and a three-panel layout: autopilot decisions on the left, advisor explanations in the center, and analysis data on the right.
Real-time performance charts show GPU utilisation, VRAM usage, latency, and FPS tracking. The system displays historical trends with interactive graphs and provides detailed telemetry alongside autopilot state transitions and LLM-powered explanations of optimization decisions.
In Accuracy mode, EdgeTune provides maximum detection coverage with comprehensive object identification across the entire frame β detecting multiple object classes simultaneously while maintaining high precision on complex street scenes.
All settings live in backend/config/settings.yaml.
Key configuration options
| Section | Key | Default | Description |
|---|---|---|---|
source |
type |
camera |
camera or file |
source |
processing_mode |
paced |
paced (real-time) or benchmark (max speed) |
inference |
model_variant |
yolov8n |
yolov8n, yolov8s, or yolov8m |
inference |
device |
auto |
auto, cuda:0, or cpu |
inference |
backend |
pytorch |
pytorch, onnx, or tensorrt |
autopilot |
mode |
balanced |
speed, balanced, or accuracy |
autopilot |
escalate_gpu_threshold |
90 |
GPU % to trigger escalation |
autopilot |
cooldown_seconds |
5.0 |
Min seconds between state changes |
llm |
provider |
ollama |
ollama or gemini |
llm |
enabled |
true |
Toggle LLM explanations on/off |
server |
port |
8000 |
Backend API port |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/health |
System health check (GPU, inference, LLM status) |
GET |
/api/hardware |
Detected hardware profile |
GET |
/api/telemetry |
Latest telemetry snapshot |
GET |
/api/telemetry/history?n=60 |
Rolling telemetry history |
GET |
/api/autopilot |
Current autopilot state and mode |
PUT |
/api/autopilot/mode |
Change autopilot mode |
POST |
/api/inference/start |
Start the inference pipeline |
POST |
/api/inference/stop |
Stop the inference pipeline |
POST |
/api/source/upload |
Upload a video file |
GET |
/api/source/files |
List available source files |
GET |
/api/source/info |
Current source metadata |
POST |
/api/source/playback |
Playback control (pause, seek, speed) |
GET |
/api/models |
List available YOLO models |
POST |
/api/models/upload |
Upload a custom .pt model |
POST |
/api/models/switch |
Hot-swap the active model |
The single WebSocket connection streams these message types:
| Type | Payload | Direction |
|---|---|---|
telemetry |
GPU %, FPS, VRAM, latency, CPU % | Server β Client |
autopilot_decision |
State transition, action, reason, params | Server β Client |
llm_explanation |
Plain-text explanation of a decision | Server β Client |
video_frame |
Base64-encoded JPEG frame | Server β Client |
source_progress |
File progress, frame number, paused state | Server β Client |
status |
Toast notifications (errors, info) | Server β Client |
ping / pong |
Keep-alive heartbeat | Both |
| Layer | Technology |
|---|---|
| Backend | Python 3.10+, FastAPI, Uvicorn, Ultralytics YOLOv8, PyTorch, OpenCV |
| Frontend | Next.js 16, React 19, TypeScript, Tailwind CSS 4 |
| Communication | WebSocket (real-time) + REST (control plane) |
| GPU Monitoring | pynvml, psutil |
| LLM Integration | Ollama (local) Β· Google Gemini (optional cloud) |
| Configuration | YAML (settings.yaml) |
Contributions are welcome! Please fork the repository and submit a pull request.
MIT License β see LICENSE for details.


