-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
area/distributedDistributed coordination and TiKVDistributed coordination and TiKVpriority/mediumMedium priorityMedium prioritysize/LLarge: 1-2 weeksLarge: 1-2 weekstype/featureNew feature or functionalityNew feature or functionalitytype/infrastructureInfrastructure, CI/CD, DevOpsInfrastructure, CI/CD, DevOps
Description
Summary
Add a simple web UI for monitoring job progress, viewing job details, and managing jobs (retry, cancel). Provides a dashboard for operators.
Parent Epic
- [Epic] Distributed Roboflow with Alibaba Cloud (OSS + ACK) #9 Distributed Roboflow with TiKV Coordination
Dependencies
- Depends on: [Phase 4.1] Add TiKV client and define distributed schema #40 (TiKV Client), [Phase 10.1] Add CLI for job submission [READY TO START] #50 (CLI for API patterns)
- Optional: [Phase 7.1] Add Prometheus metrics for monitoring #21 (Prometheus metrics for charts)
Design
Architecture
┌─────────────────────────────────────────────────────┐
│ Web UI (SPA) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌────────┐ │
│ │Dashboard│ │Job List │ │Job Detail│ │Workers │ │
│ └─────────┘ └─────────┘ └─────────┘ └────────┘ │
└────────────────────────┬────────────────────────────┘
│ REST API
┌────────────────────────┴────────────────────────────┐
│ API Server │
│ - Rust (axum/actix-web) │
│ - Embedded in roboflow binary │
│ - Connects to TiKV │
└────────────────────────┬────────────────────────────┘
│
┌────┴────┐
│ TiKV │
└─────────┘
Technology Stack
- Backend: Rust with axum (lightweight, async)
- Frontend: Simple SPA (htmx + Alpine.js, or React)
- Embedded: Single binary, no separate deployment
- Alternative: Serve static files, API endpoints
Tasks
10.2.1 Create API Server Module
- Create
src/api/mod.rs - Create
src/api/server.rs - Add axum dependency (feature-gated)
- Define
ApiServerstruct:- TiKV client
- Config (port, host)
- Start command:
roboflow ui --port 8080
10.2.2 Implement Job API Endpoints
GET /api/jobs # List jobs (with filters)
GET /api/jobs/:id # Get job details
POST /api/jobs # Submit new job
POST /api/jobs/:id/retry # Retry failed job
POST /api/jobs/:id/cancel # Cancel job
DELETE /api/jobs/:id # Delete job
GET /api/stats # Get statistics
GET /api/workers # List active workers
10.2.3 Implement Dashboard Endpoint
GET /api/dashboard:{ "jobs": { "pending": 10, "processing": 5, "completed": 100, "failed": 2, "dead": 0 }, "workers": { "active": 5, "total_processed": 1000 }, "throughput": { "jobs_per_hour": 50, "bytes_per_hour": 1073741824 } }
10.2.4 Create Frontend - Dashboard Page
- Create
src/api/static/index.html - Dashboard view:
- Job status counts (cards/gauges)
- Recent jobs table
- Active workers count
- Throughput chart (if metrics available)
- Auto-refresh every 5 seconds
10.2.5 Create Frontend - Job List Page
- Create job list view:
- Table with: ID, Source, Status, Progress, Duration
- Filter by status (tabs or dropdown)
- Pagination
- Search by source path
- Actions:
- Click row → Job detail
- Bulk select → Retry/Cancel/Delete
10.2.6 Create Frontend - Job Detail Page
- Create job detail view:
- Full job info
- Timeline (created → started → completed)
- Error message (if failed)
- Checkpoint info (frame progress)
- Output file links
- Actions:
- Retry button (if failed)
- Cancel button (if processing)
- Delete button
10.2.7 Create Frontend - Workers Page
- Create workers view:
- Table: Pod ID, Status, Current Job, Last Seen
- Stale workers highlighted
- Health indicators
10.2.8 Add Submit Job Form
- Form fields:
- Source URL (text input)
- Output URL (text input)
- Config file (dropdown or upload)
- Submit button
- Validation feedback
- Success/error notification
10.2.9 Add Bulk Operations
- Select multiple jobs
- Bulk actions:
- Retry all selected
- Cancel all selected
- Delete all selected
- Confirmation dialog
- Progress indicator
10.2.10 Add Authentication (Optional)
- Simple auth options:
- Basic auth via env var
- API key header
- No auth (internal network)
- Configuration flag to enable
10.2.11 Embed Static Files
- Use rust-embed or include_bytes!
- Serve from memory (no external files)
- Single binary deployment
UI Mockup
┌─────────────────────────────────────────────────────────────┐
│ Roboflow Dashboard [Submit Job] │
├─────────────────────────────────────────────────────────────┤
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Pending │ │Processing│ │Completed │ │ Failed │ │
│ │ 10 │ │ 5 │ │ 100 │ │ 2 │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ Recent Jobs [Filter: All ▼] │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ID │ Source │ Status │ Progress │ │
│ ├────────┼──────────────────┼───────────┼────────────┤ │
│ │ abc123 │ raw/file1.mcap │ Processing│ 45% │ │
│ │ def456 │ raw/file2.mcap │ Completed │ 100% │ │
│ │ ghi789 │ raw/file3.mcap │ Failed │ 32% [Retry]│ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ Active Workers: 5 │ Throughput: 50 jobs/hr │
└─────────────────────────────────────────────────────────────┘
Acceptance Criteria
- API server starts with
roboflow ui - All REST endpoints work
- Dashboard shows job counts
- Job list with filtering works
- Job detail page shows full info
- Retry/Cancel/Delete actions work
- Submit job form works
- Workers page shows active workers
- Auto-refresh updates data
- Single binary (embedded static files)
- Responsive design (works on mobile)
Files to Create
src/api/mod.rssrc/api/server.rssrc/api/routes/mod.rssrc/api/routes/jobs.rssrc/api/routes/workers.rssrc/api/routes/stats.rssrc/api/static/index.htmlsrc/api/static/app.jssrc/api/static/style.css
Files to Modify
Cargo.toml(add axum, tower, rust-embed)src/bin/roboflow.rs(add ui subcommand)src/lib.rs(add api module)
Feature Flag
[features]
web-ui = ["axum", "tower", "rust-embed"]Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/distributedDistributed coordination and TiKVDistributed coordination and TiKVpriority/mediumMedium priorityMedium prioritysize/LLarge: 1-2 weeksLarge: 1-2 weekstype/featureNew feature or functionalityNew feature or functionalitytype/infrastructureInfrastructure, CI/CD, DevOpsInfrastructure, CI/CD, DevOps