Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"typescript.tsdk": "node_modules\\typescript\\lib"
}
76 changes: 76 additions & 0 deletions DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Kine-Sight Documentation

## Overview
**Kine-Sight** is an interactive, browser-based AI fitness coaching application. It leverages real-time local AI processing to track the user's fitness movements, count repetitions, verify correct posture using visual pose estimation, and provide real-time motivating and corrective feedback through a local Large Language Model (LLM).

Because it relies on WebAssembly (WASM) and browser-based AI inference, it is highly private and fast—processing happens entirely on the user's device without needing external API calls to remote servers.

## Tech Stack & Dependencies
- **Frontend Framework**: React 19 + TypeScript
- **Bundler**: Vite
- **Computer Vision (Pose Detection)**: `@mediapipe/tasks-vision` runs MediaPipe's lightweight pose-landmarker model in the browser.
- **On-Device LLM (Voice/Text Coach)**: `@mlc-ai/web-llm` enables running local LLMs (e.g., Llama-3-8B-Instruct) natively in the browser using WebGPU and WebAssembly.
- **Audio Routing**: Web Audio API manages immediate correct/incorrect rep sound indications.

---

## Core Architecture and Data Flow

### 1. `src/App.tsx` & `src/main.tsx`
These act as the entry points of the application. `App.tsx` serves as a simple shell that mounts the primary view: `FitnessTab`.

### 2. `src/components/FitnessTab.tsx`
This is the core view and primary state-machine controller for the application.

**Key responsibilities:**
- **UI Architecture**: Displays the pre-workout "Dashboard/Bento" view with exercise choices. During a workout, it renders the video feed, the canvas overlay (for the skeleton), and the AI Coach's rolling text feedback.
- **Lifecycle Management**:
- Activates the camera (`getUserMedia`).
- Initializes the MediaPipe Pose model.
- Initializes the WebLLM coach in the background.
- Coordinates a pre-workout countdown.
- **Detection Loop (`startDetectionLoop`)**:
- Runs recursively via `requestAnimationFrame` to sample frames from the active `<video>` element.
- Passes frames to `poseEngine.ts` to get a pose analysis.
- Extracts coordinates representing the skeleton, mirrors the inputs so the user sees a "mirror" of themselves, and draws the skeletal lines and angle labels on a `<canvas>` element hovering over the video feed.
- **Rep counting and Debounce logic**:
- Maintains state tracking for `up` / `down` / `middle` positions.
- Requires 5 consecutive identical form indications (`DEBOUNCE_FRAMES = 5`) to prevent flickering.
- Emits local audio sounds (dings/buzzes) upon rep completion and counts total correct vs. incorrect reps.
- **AI Coach Triggering**:
- When a rep completes, it generates an event outlining the user's performance (e.g. "User completed a perfect rep" or "User completed a rep with poor form") and dispatches it to the local LLM.

### 3. `src/fitness/poseEngine.ts`
This file encapsulates all spatial math and exercise movement profiles.

**Key components:**
- **MediaPipe Wrapper (`getPoseLandmarker`)**: Lazily downloads and loads the MediaPipe WASM model.
- **Angle Calculation**: Provides math helpers like `calcAngle` to find interior angles between 3 distinct joints (e.g. Shoulder → Elbow → Wrist) to deduce joint extension.
- **Exercise Definitions (`EXERCISES`)**: A declarative list defining each workout (Squats, Bicep Curls, Push-ups, Lunges, Shoulder Press, Plank). Each definition contains an `analyze()` function that is called per-frame. It accepts raw 3D landmarks and evaluates:
- `position`: The current posture state (e.g. going `down` in a squat, or coming `up`).
- `form`: Evaluates if the exercise is functionally sound (e.g. `bad` form if the torso leans too far forward in lunges).
- `confident`: Determines if all the required joints for the exercise are fully within the camera's view.
- **Drawing Overlays**: Includes `drawSkeleton` and `drawAngleBadge` to construct the helpful UI augmentations on top of the camera feed.

### 4. `src/llm/llmEngine.ts`
This handles the initialization and prompting for the on-device AI coach.

- **Engine Initialization**: Connects to `@mlc-ai/web-llm` to download and load a local LLM in chunks (uses Llama-3-8B-Instruct by default).
- **Feedback Generation (`generateFeedback`)**: Constructs concise prompts. It gives the AI basic stats (how many reps are done, what is the present form quality) and streams back a quick, 1-2 sentence response. The streaming callback is then pumped to the React UI in `FitnessTab`.

### 5. `src/components/DemoVideo.tsx`
Handles displaying short looped demonstration `.mp4` references for each exercise prior to starting. It has error boundaries in case files are missing.

---

## Technical Highlights & Optimizations
- **On-Device Execution**: The entire pose pipeline AND the intelligence pipeline run within the browser.
- **Adaptive Debouncing**: Rep counting is shielded against temporary 1-frame glitches in body tracking by forcing sequential consecutive positional hits.
- **Audio Context Priming**: The Web Audio API initializes during the initial user gesture ("Start Workout" click), allowing it to bypass native browser auto-play prevention, leading to flawless, zero-latency repetition feedback audio.

## Scaling the Product
To add a new exercise:
1. Identify the 3 joints that create the defining movement angle (e.g. Hip-Knee-Ankle).
2. Add a new configuration block inside `EXERCISES` in `poseEngine.ts`.
3. Provide the angle bounds defining an "up" state vs. a "down" state.
4. Specify limits defining "good" vs. "bad" form.
162 changes: 51 additions & 111 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,130 +1,70 @@
# RunAnywhere Web Starter App
# 🏋️‍♂️ Kine-Sight: Your AI-Powered Digital Fitness Trainer

A minimal React + TypeScript starter app demonstrating **on-device AI in the browser** using the [`@runanywhere/web`](https://www.npmjs.com/package/@runanywhere/web) SDK. All inference runs locally via WebAssembly — no server, no API key, 100% private.
[![Live Demo](https://img.shields.io/badge/Live_Demo-kine--sight.vercel.app-blue?style=for-the-badge)](https://kine-sight.vercel.app)
[![YouTube Demo](https://img.shields.io/badge/YouTube-Watch_Demo-FF0000?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/watch?v=egs9YfF6UwM)

## Features

| Tab | What it does |
|-----|-------------|
| **Chat** | Stream text from an on-device LLM (LFM2 350M) |
| **Vision** | Point your camera and describe what the VLM sees (LFM2-VL 450M) |
| **Voice** | Speak naturally — VAD detects speech, STT transcribes, LLM responds, TTS speaks back |
**Kine-Sight** is a cutting-edge digital fitness trainer with an "eye." By leveraging on-device Vision-Language Models (VLMs), it watches your form, tracks your progress, catches your mistakes, and provides real-time, actionable feedback to help you perfect your fitness journey.

## Quick Start
Everything runs **100% locally in your browser** via WebAssembly—meaning zero server costs, zero API keys, and complete privacy for your camera feed.

```bash
npm install
npm run dev
```
## ✨ Features

Open [http://localhost:5173](http://localhost:5173). Models are downloaded on first use and cached in the browser's Origin Private File System (OPFS).
- **👀 Real-Time Vision Tracking:** Uses your device's camera to analyze your posture, reps, and movements in real-time.
- **🗣️ Interactive Feedback:** Get instant corrections and motivational feedback when your form breaks down.
- **🔒 100% Private & Secure:** Powered by `@runanywhere/web`, all AI inference happens locally on your device. Your camera feed never leaves your browser.
- **⚡ Blazing Fast On-Device AI:** Utilizes optimized WASM engines to run Vision (LFM2-VL) and Text (LFM2) models directly in the web browser.
- **🎙️ Voice Integration:** Speak to your AI trainer naturally, and it will respond via text-to-speech.

## How It Works

```
@runanywhere/web (npm package)
├── WASM engine (llama.cpp, whisper.cpp, sherpa-onnx)
├── Model management (download, OPFS cache, load/unload)
└── TypeScript API (TextGeneration, STT, TTS, VAD, VLM, VoicePipeline)
```
## 🛠️ Tech Stack

The app imports everything from `@runanywhere/web`:
- **Frontend:** React, TypeScript, Vite
- **Styling:** CSS / HTML5
- **Local AI Engine:** `@runanywhere/web` (llama.cpp, whisper.cpp, sherpa-onnx)
- **Deployment:** Vercel

```typescript
import { RunAnywhere, SDKEnvironment } from '@runanywhere/web';
import { TextGeneration, VLMWorkerBridge } from '@runanywhere/web-llamacpp';
## 🚀 Quick Start

await RunAnywhere.initialize({ environment: SDKEnvironment.Development });
### Prerequisites
- Node.js (v18 or higher)
- npm or yarn
- A modern web browser (Chrome 120+ or Edge 120+ recommended)

// Stream LLM text
const { stream } = await TextGeneration.generateStream('Hello!', { maxTokens: 200 });
for await (const token of stream) { console.log(token); }
### Installation

// VLM: describe an image
const result = await VLMWorkerBridge.shared.process(rgbPixels, width, height, 'Describe this.');
```
1. **Clone the repository:**
```bash
git clone [https://github.com/himanshuranjan2552/Kine-Sight.git](https://github.com/himanshuranjan2552/Kine-Sight.git)
cd Kine-Sight
2. **Install dependencies:**
```bash
npm install
3. **Start the development server:**
```bash
npm run dev
4. **Open in Browser:**
Navigate to http://localhost:5173.

## Project Structure
(Note: AI Models will be downloaded on first use and cached locally in your browser's Origin Private File System).

```
src/
├── main.tsx # React root
├── App.tsx # Tab navigation (Chat | Vision | Voice)
├── runanywhere.ts # SDK init + model catalog + VLM worker
├── workers/
│ └── vlm-worker.ts # VLM Web Worker entry (2 lines)
├── hooks/
│ └── useModelLoader.ts # Shared model download/load hook
├── components/
│ ├── ChatTab.tsx # LLM streaming chat
│ ├── VisionTab.tsx # Camera + VLM inference
│ ├── VoiceTab.tsx # Full voice pipeline
│ └── ModelBanner.tsx # Download progress UI
└── styles/
└── index.css # Dark theme CSS
```

## Adding Your Own Models
## 🧠 How It Works Under the Hood

Edit the `MODELS` array in `src/runanywhere.ts`:
- Kine-Sight is built on top of the **RunAnywhere SDK**. It uses a combination of powerful local models.
- **VLM** (Vision-Language Model): Captures frames from your webcam and analyzes your body positioning.
- **LLM** (Large Language Model): Processes the vision data to formulate encouraging text or corrective instructions.
- **Voice Pipeline** (VAD, STT, TTS): Allows you to ask the trainer questions hands-free while working out.

```typescript
{
id: 'my-custom-model',
name: 'My Model',
repo: 'username/repo-name', // HuggingFace repo
files: ['model.Q4_K_M.gguf'], // Files to download
framework: LLMFramework.LlamaCpp,
modality: ModelCategory.Language, // or Multimodal, SpeechRecognition, etc.
memoryRequirement: 500_000_000, // Bytes
}
```
## 🤝 Contributing
Contributions, issues, and feature requests are welcome! Feel free to check the issues page if you want to contribute.
1. Fork the Project
2. Create your Feature Branch `git checkout -b feature/AmazingFeature`
3. Commit your Changes `git commit -m 'Add some AmazingFeature'`
4. Push to the Branch `git push origin feature/AmazingFeature`
5. Open a Pull Request

Any GGUF model compatible with llama.cpp works for LLM/VLM. STT/TTS/VAD use sherpa-onnx models.

## Deployment

### Vercel

```bash
npm run build
npx vercel --prod
```

The included `vercel.json` sets the required Cross-Origin-Isolation headers.

### Netlify

Add a `_headers` file:

```
/*
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: credentialless
```

### Any static host

Serve the `dist/` folder with these HTTP headers on all responses:

```
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: credentialless
```

## Browser Requirements

- Chrome 96+ or Edge 96+ (recommended: 120+)
- WebAssembly (required)
- SharedArrayBuffer (requires Cross-Origin Isolation headers)
- OPFS (for persistent model cache)

## Documentation

- [SDK API Reference](https://docs.runanywhere.ai)
- [npm package](https://www.npmjs.com/package/@runanywhere/web)
- [GitHub](https://github.com/RunanywhereAI/runanywhere-sdks)

## License

MIT
## 📄 License
This project is licensed under the MIT License.
---
Built with ❤️ to make fitness smarter, safer, and more accessible.
102 changes: 100 additions & 2 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,106 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="theme-color" content="#0F172A" />
<meta name="mobile-web-app-capable" content="yes" />
<title>RunAnywhere AI Starter</title>
<link rel="icon" href="data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'><text y='.9em' font-size='90'>🤖</text></svg>" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="description" content="AI Fitness Coach powered by MediaPipe" />
<title>Kine-Sight AI Fitness Trainer</title>
<link rel="icon" type="image/svg+xml" href="/icon.svg" />
<link rel="apple-touch-icon" href="/icon.svg" />

<!-- TAILWIND & GOOGLE FONTS -->
<script src="https://cdn.tailwindcss.com?plugins=forms,container-queries"></script>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700;800;900&display=swap" rel="stylesheet"/>
<link href="https://fonts.googleapis.com/css2?family=Material+Symbols+Outlined:wght,FILL@100..700,0..1&display=swap" rel="stylesheet"/>

<script id="tailwind-config">
tailwind.config = {
darkMode: "class",
theme: {
extend: {
colors: {
"on-background": "#101d25",
"inverse-on-surface": "#e5f2fe",
"inverse-primary": "#ffb59d",
"outline-variant": "#e5beb2",
"surface-container-highest": "#d7e4f0",
"surface": "#f5faff",
"inverse-surface": "#25323b",
"primary-container": "#d14300",
"surface-bright": "#f5faff",
"background": "#f5faff",
"on-secondary-container": "#636262",
"tertiary-container": "#697680",
"on-primary-fixed-variant": "#832700",
"surface-dim": "#cedce7",
"error": "#ba1a1a",
"on-surface": "#101d25",
"outline": "#907065",
"surface-variant": "#d7e4f0",
"surface-tint": "#ab3500",
"on-error-container": "#93000a",
"on-secondary": "#ffffff",
"on-primary-fixed": "#390c00",
"secondary": "#5f5e5e",
"on-tertiary": "#ffffff",
"on-tertiary-container": "#fcfcff",
"on-primary-container": "#fffbff",
"secondary-fixed": "#e5e2e1",
"primary": "#a73400",
"primary-fixed-dim": "#ffb59d",
"surface-container-high": "#dceaf5",
"surface-container-lowest": "#ffffff",
"primary-fixed": "#ffdbd0",
"secondary-fixed-dim": "#c8c6c5",
"on-secondary-fixed-variant": "#474746",
"error-container": "#ffdad6",
"tertiary": "#515e67",
"on-primary": "#ffffff",
"on-tertiary-fixed-variant": "#3c4852",
"surface-container-low": "#eaf5ff",
"surface-container": "#e2f0fb",
"tertiary-fixed": "#d7e4f0",
"on-tertiary-fixed": "#101d25",
"secondary-container": "#e2dfde",
"on-surface-variant": "#5c4037",
"on-error": "#ffffff",
"on-secondary-fixed": "#1b1b1c",
"tertiary-fixed-dim": "#bbc8d3"
},
fontFamily: {
"headline": ["Inter"],
"body": ["Inter"],
"label": ["Inter"]
},
borderRadius: {"DEFAULT": "0.125rem", "lg": "0.25rem", "xl": "0.5rem", "full": "0.75rem"},
},
},
}
</script>
<style>
.material-symbols-outlined { font-variation-settings: 'FILL' 0, 'wght' 400, 'GRAD' 0, 'opsz' 24; }
.hide-scrollbar::-webkit-scrollbar { display: none; }
.hide-scrollbar { -ms-overflow-style: none; scrollbar-width: none; }

/* Dark mode overrides for semantic Tailwind colors */
.dark .bg-surface { background-color: #0F172A !important; }
.dark .bg-surface-container-lowest { background-color: #1E293B !important; }
.dark .bg-surface-container-low { background-color: #1a2332 !important; }
.dark .bg-surface-container { background-color: #1E293B !important; }
.dark .bg-surface-container-high { background-color: #253345 !important; }
.dark .bg-surface-container-highest { background-color: #334155 !important; }
.dark .text-on-surface { color: #e2e8f0 !important; }
.dark .text-on-background { color: #f1f5f9 !important; }
.dark .text-secondary { color: #94A3B8 !important; }
.dark .bg-inverse-surface { background-color: #e2e8f0 !important; }
.dark .text-inverse-on-surface { color: #1E293B !important; }
.dark .border-outline-variant\/20 { border-color: rgba(71, 85, 105, 0.3) !important; }

/* Smooth theme transition */
body, .bg-surface, .bg-surface-container-lowest, header {
transition: background-color 0.3s ease, color 0.3s ease;
}
</style>
</head>
<body>
<div id="root"></div>
Expand Down
Loading