Skip to content

farmountain/SmartGlass-AI-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

713 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ‘“๐Ÿค– SmartGlass AI Agent

A multimodal AI assistant for smart glasses, integrating:

  • ๐ŸŽ™๏ธ Whisper (speech-to-text)
  • ๐Ÿ‘๏ธ CLIP / DeepSeek-Vision (vision-language understanding)
  • ๐Ÿง  student: Llama-3.2-3B / Qwen-2.5-3B (Week 10/11 plan) for natural language generation (legacy GPT-2 path deprecated)

Built for the Meta Ray-Ban Wayfarer and similar wearable devices. Includes an 18-week learning program with step-by-step Google Colab workshops, and a fully functional modular Python agent (SmartGlassAgent) for real-world deployment. The SmartGlassAgent and primary SDK classes are considered stable as of v1.0, so downstream apps can rely on their public methods without churn.

๐Ÿ“„ Latest weekly doc: Week 4 Report.


๐ŸŒŸ Features

  • ๐ŸŽค Speech Recognition: Real-time transcription with OpenAI Whisper
  • ๐Ÿ‘๏ธ Visual Understanding: Scene and object analysis using CLIP or DeepSeek-Vision
  • ๐Ÿ’ฌ Language Generation: Responses via the student Llama-3.2-3B / Qwen-2.5-3B interim models (GPT-2 deprecated)
  • ๐Ÿ”„ Multimodal Integration: Voice + Vision โ†’ LLM-powered interaction
  • ๐Ÿงช Google Colab Ready: Modular 18-week training + live testing
  • ๐Ÿ”ง Modular Agent SDK: SmartGlassAgent class with clean APIs and stable SDK entry points as of v1.0

๐Ÿš€ Quick Start

๐ŸŒ Contribute Through Pull Requests

  1. Propose your change on the web. Navigate to the file you want to update in GitHub and choose Edit this file. GitHub will automatically fork the repository if needed and open the web editor.
  2. Describe your intent. After editing, provide a concise summary of the change, add any relevant testing notes, and click Propose changes to start a new pull request.
  3. Open the pull request. Review the diff, confirm the base branch, and submit the PR. No local cloning is required.
  4. Let CI validate the update. Wait for the automated checks to finish. Inspect the linked logs and artifacts to verify linting, tests, and documentation previews.
  5. Address feedback. Use the web editor to push follow-up commits, respond to reviewer comments, and request re-runs of any failed checks.

๐Ÿ•ถ๏ธ Meta Ray-Ban Integration with Device Access Toolkit (DAT)

SmartGlass-AI-Agent now includes comprehensive documentation for integrating with the Meta Wearables Device Access Toolkit (DAT), enabling AI-powered experiences on Ray-Ban Meta and Ray-Ban Display glasses.

๐Ÿ“š Integration & Testing Guides

  • Hardware Testing Guide - โญ NEW Complete hardware testing manual

    • Meta Ray-Ban + OPPO Reno 12 setup guide
    • 8-part comprehensive testing workflow
    • 4 end-to-end test scenarios
    • Performance benchmarks & troubleshooting
    • Deployment checklist
  • Performance Optimization Guide - โญ NEW Optimization strategies

    • Current vs target benchmarks
    • Frame compression & SNN quantization
    • Profiling tools setup & load testing
    • Battery optimization techniques
  • Meta DAT Integration Guide - Complete setup and integration guide

    • Platform setup (Android & iOS)
    • Core concepts and runtime flow
    • Privacy and compliance guidelines
    • Troubleshooting common issues
  • Hello SmartGlass Quickstart - 30-minute hands-on tutorial

    • Step-by-step mobile app creation
    • Backend setup and testing
    • End-to-end implementation examples
    • Mock Device testing without hardware
  • Implementation Progress - โญ NEW Current project status

    • Week 1-6: Complete (PR #278-#287)
    • Week 7-8: In progress (75% overall)
    • Testing scenarios & known issues
    • Technical roadmap

๐ŸŽฏ Quick Navigation

Getting Started:

  1. Read the Hello SmartGlass Quickstart for a hands-on introduction
  2. Review the Meta DAT Integration Guide for comprehensive documentation
  3. Follow the Hardware Testing Guide for Meta Ray-Ban + OPPO Reno 12 setup
  4. Check the Implementation Progress for current project status (75% complete)

Key Features:

  • โœ… Camera frame streaming from Ray-Ban Meta glasses
  • โœ… Microphone audio capture and processing
  • โœ… AI-powered scene analysis and response generation
  • โœ… Mock Device support for development without hardware
  • โœ… Privacy-first design with user controls
  • โœ… Cross-platform (Android & iOS)

๐Ÿ—๏ธ Architecture Overview

Ray-Ban Meta Glasses โ†’ Mobile App (Edge Sensor Hub) โ†’ SmartGlass AI Backend
     (Camera/Mic)         (DAT SDK + Processing)        (Whisper/CLIP/SNN)

The mobile app acts as an edge sensor hub, streaming multimodal data from the glasses to the SmartGlassAgent backend for AI processing, then displaying responses and executing actions.


๐Ÿ“ฆ Legacy Local Installation (Optional)

If you still need to run the project locally, you can follow the classic setup:

git clone https://github.com/farmountain/SmartGlass-AI-Agent.git
cd SmartGlass-AI-Agent
pip install -r requirements.txt

Microphone + TTS prerequisites

The Meta provider streams audio via either sounddevice (default) or PyAudio, and uses pyttsx3 for offline text-to-speech. Both capture backends depend on PortAudio:

  • Linux (Debian/Ubuntu): sudo apt-get install portaudio19-dev then pip install sounddevice (or pip install pyaudio).
  • macOS (Homebrew): brew install portaudio then pip install sounddevice (or pip install pyaudio).
  • Windows: install the matching PortAudio binary or use pip install pipwin && pipwin install pyaudio. If sounddevice installation fails, prefer the pyaudio fallback.

If you only need playback, pyttsx3 has no external audio driver requirement, but microphone capture will need at least one of the PortAudio-backed libraries above.

๐Ÿ“Š Render documentation KPIs

Generate a Markdown-formatted table from the latest KPI CSVs under docs/artifacts:

python scripts/doc_kpi_table.py --artifacts docs/artifacts > /tmp/doc_kpis.md

The CI summary step automatically runs this helper and posts the newest table alongside the other benchmark outputs.


๐Ÿ“ฑ Android HTTP integration

Building a mobile client? Follow the step-by-step endpoint guide in docs/android_integration.md for payloads, session handling, and a local server quickstart.


๐Ÿ”’ Edge runtime privacy controls

The edge runtime defaults to not retaining raw audio, frames, or transcripts in memory to reduce the risk of accidental data leakage. Opt-in persistence is available through environment variables when you need debugging traces:

Environment variable Default Effect
STORE_RAW_AUDIO false Keep per-session audio buffers in memory for replay and policy enforcement.
STORE_RAW_FRAMES false Preserve recent video frames so subsequent queries can reuse the latest view.
STORE_TRANSCRIPTS false Retain transcripts generated from audio ingestion and text queries.

See PRIVACY.md for detailed threat-modeling notes and guidance on when to enable each option.


๐Ÿง‘โ€๐Ÿซโžก๏ธ๐Ÿง  Teacherโ€“Student SNN Pipeline (Concise)

  • Pipeline: scripts/train_snn_student.py distills transformer teachers (from tiny GPT-2 to Llama-3.2-3B/Qwen-2.5-3B) into spiking-friendly students with configurable SNN hyperparameters, LR scheduling, and temperature-scaled KD. Designed for both quick Colab demos and production training.

  • Configuration: Supports SNN timesteps, surrogate gradients (sigmoid, fast_sigmoid, triangular, arctan), spike thresholds, LR schedulers (constant, cosine, linear), and comprehensive metadata tracking with git commit hashes.

  • Artifacts: Training writes student.pt and metadata.json under artifacts/snn_student by default (override with --output-dir).

  • Documentation:

  • Launch training (demo):

    python scripts/train_snn_student.py \
      --teacher-model sshleifer/tiny-gpt2 \
      --dataset synthetic \
      --num-steps 50 \
      --batch-size 4 \
      --output-dir artifacts/snn_student_demo
  • Launch training (production with Llama-3.2-3B):

    python scripts/train_snn_student.py \
      --teacher-model meta-llama/Llama-3.2-3B \
      --dataset wikitext-2 \
      --num-steps 10000 \
      --batch-size 4 \
      --grad-accum-steps 8 \
      --max-length 512 \
      --lr 3e-4 \
      --scheduler cosine \
      --warmup-steps 500 \
      --snn-timesteps 8 \
      --snn-surrogate fast_sigmoid \
      --output-dir artifacts/snn_student_llama
  • SNN inference demo: Load the saved student (or fall back to the stubbed path) via the SNNLLMBackend demo module and generate a quick response:

    python - <<'PY'
    from src.llm_snn_backend import SNNLLMBackend
    
    backend = SNNLLMBackend(model_path="artifacts/snn_student/student.pt")
    print(backend.generate("Hello from the glasses", max_tokens=24))
    PY

    The backend will automatically reuse the saved artifact when available and degrade gracefully to a stubbed tokenizer/model when the files are missing, keeping the demo runnable on any machine.ใ€F:src/llm_snn_backend.pyโ€ L21-L104ใ€‘ใ€F:src/llm_snn_backend.pyโ€ L143-L181ใ€‘


๐Ÿ”จ Use the Agent in Python

The agent now consumes any backend implementing src.llm_backend_base.BaseLLMBackend so you can swap language generators without touching the agent logic. The built-in SNNLLMBackend exposes the same interface for on-device, spiking-friendly generation, while the ANN/GPT-2 adapter remains available for comparison.

Action-aware multimodal query with the SNN backend (on-device capable):

import os

from src.llm_snn_backend import SNNLLMBackend
from src.smartglass_agent import SmartGlassAgent

# Runs fully on-device when the spiking student checkpoint is present
snn_backend = SNNLLMBackend(model_path="artifacts/snn_student/student.pt")

agent = SmartGlassAgent(
    whisper_model="base",
    clip_model="openai/clip-vit-base-patch32",
    llm_backend=snn_backend,
    provider=os.getenv("PROVIDER", "mock"),  # honors PROVIDER env var
)

result = agent.process_multimodal_query(
    text_query="Describe the scene and propose next steps",
    image_input="A person standing next to a bicycle",
)

print("response:", result["response"])
print("actions:")
for action in result["actions"]:
    print(" -", action.get("type"), action.get("payload"))

See Action schema and RaySkillKit mapping for the structured envelope, sample payloads, and how to bind each action entry to a concrete skill implementation. The same process_multimodal_query shape applies to any BaseLLMBackend, so swapping in cloud backends or the ANN GPT-2 adapter continues to return action-aware responses while SNNLLMBackend keeps generation entirely on-device when the checkpoint is available.

CLI demo

To try the same pipeline from the terminal, run the examples/cli_smartglass.py demo from the repository root. It loads an image, walks through the agent pipeline, and streams the generated response (optionally using the SNN backend):

python -m examples.cli_smartglass --image images/scene.jpg --backend snn

Omit --backend snn to use the default backend.

Provider selection

drivers.providers.get_provider constructs the driver layer for you. When you omit the name argument, it reads the PROVIDER environment variable (default: "mock") so scripts and tests can share a single default selection. Supported provider names are mock, meta, vuzix, xreal, openxr, and visionos; unknown values fall back to the deterministic mock provider. SmartGlassAgent mirrors this behaviorโ€”if you skip the provider argument, it calls get_provider() under the hood to honor the environment variable:

export PROVIDER=mock  # default, optional

Passing a string uses the same resolver explicitly:

from src.smartglass_agent import SmartGlassAgent

agent = SmartGlassAgent(provider="meta")

You can also create the provider yourself and pass it into the agent for explicit control:

from drivers.providers import get_provider
from src.smartglass_agent import SmartGlassAgent

provider = get_provider("meta", api_key="YOUR_META_APP_KEY")
agent = SmartGlassAgent(provider=provider)

PROVIDER=meta now selects a Meta Ray-Ban SDK wrapper that automatically falls back to deterministic mocks whenever the metarayban SDK package is not installed. The wrapper accepts three key configuration fields when you construct it directly in Python:

  • api_key โ€“ optional API token to pass through to SDK calls that require auth.
  • device_id โ€“ Ray-Ban device identifier to stamp on camera/mic/audio/haptics payloads (defaults to RAYBAN-MOCK-DEVICE).
  • transport โ€“ SDK transport hint such as ble or wifi (defaults to mock).

Example:

from drivers.providers.meta import MetaRayBanProvider

provider = MetaRayBanProvider(
    api_key="YOUR_META_APP_KEY",
    device_id="RAYBAN-1234",
    transport="ble",
    prefer_sdk=True,  # only flips on if the metarayban SDK is importable
)

When prefer_sdk=True and the metarayban dependency is importable, the provider now routes camera, microphone, audio, overlay, and haptics calls into the official SDK while threading your api_key, device_id, and transport through every request. CI and default local runs keep using the mock data because prefer_sdk defaults to False and the SDK is not present in the test environment. If the SDK import fails or a runtime SDK call raises, the provider logs the failure and immediately falls back to deterministic fixtures so you can still exercise the flows without hardware.

Deterministic vendor-specific mocks are also available so you can stub integrations for different runtimes:

export PROVIDER=vuzix    # 640x480 RGB frames + waveguide overlay metadata
export PROVIDER=xreal    # 1080p Beam-style captures + Nebula overlay stubs
export PROVIDER=openxr   # Square eye-buffers with host-delegated overlays
export PROVIDER=visionos # 1440x1440 persona frames + shared-space overlays

Each of these providers exposes deterministic camera/mic fixtures tuned to the vendor's expected resolutions, vendor-tagged audio/permission responses, and a has_display() helper that the SDK uses to reflect true overlay availability.

Meta SDK requirements and fallback behavior

The mock-first Meta Ray-Ban wrapper lives in drivers/providers/meta.py and now calls into the metarayban package whenever it is available. Install it locally with pip install metarayban and provide a valid api_key plus the device_id/transport for your Ray-Ban device to stream live camera and microphone data, trigger SDK-managed audio output, render overlays, and request haptics. All SDK calls are guarded with mock fallbacks, so CI remains green even when the SDK is absent.

๐Ÿ•ถ๏ธ Meta Ray-Ban Integration

The Android sample still ships with a MetaRayBanManager faรงade that mirrors the expected Meta Ray-Ban SDK shape. It continues to emit deterministic placeholder behavior for UI wiring while the Python provider now talks to the official SDK when present:

  • connect(deviceId, transport) logs a connection attempt and simulates setup while threading the provided device_id and transport so they line up with the Python providerโ€™s SDK-backed calls.ใ€F:sdk-android/src/main/kotlin/com/smartglass/sdk/rayban/MetaRayBanManager.ktโ€ L14-L38ใ€‘
  • capturePhoto() returns a packaged placeholder bitmap until the Android SDK surfaces camera streaming, and startAudioStreaming() emits a short flow of labeled fake audio chunks.ใ€F:sdk-android/src/main/kotlin/com/smartglass/sdk/rayban/MetaRayBanManager.ktโ€ L40-L73ใ€‘
  • TODOs remain in place to swap these mocks for the official Android interfaces; the Python side already wires the same fields through to the metarayban package and will continue to fall back to deterministic fixtures when the SDK is unavailable or raises.ใ€F:sdk-android/src/main/kotlin/com/smartglass/sdk/rayban/MetaRayBanManager.ktโ€ L20-L52ใ€‘ใ€F:sdk-android/src/main/kotlin/com/smartglass/sdk/rayban/MetaRayBanManager.ktโ€ L69-L76ใ€‘

๐Ÿ“š For complete Meta DAT integration with official Device Access Toolkit SDKs, see:


๐Ÿงฉ RaySkillKit Skill Catalogue

RaySkillKit now ships with a compact catalogue of twelve skills that blend legacy validation fixtures with the travel and retail packs produced by the raycli workflow:

  • skill_001 โ€“ Spatial Navigation Assistant (navigation/routing baseline)
  • skill_002 โ€“ Vision Detection Baseline (vision detection baseline)
  • skill_003 โ€“ Speech Transcription Baseline (audio speech baseline)
  • travel_fastlane โ€“ Airport FastLane Wait Estimator (travel operations regression)
  • travel_safebubble โ€“ Air Travel SafeBubble Risk Assessor (travel safety regression)
  • travel_bargaincoach โ€“ BargainCoach Fare Forecaster (travel commerce forecasting)
  • retail_wtp_radar โ€“ Retail WTP Radar (retail pricing regression)
  • retail_capsule_gaps โ€“ Capsule Gap Forecaster (retail supply forecasting)
  • retail_minute_meal โ€“ Minute Meal Throughput (retail operations regression)
  • rt_wtp_radar โ€“ Runtime Retail WTP Radar (retail pricing regression)
  • rt_capsule_gaps โ€“ Runtime Capsule Gap Forecaster (retail supply forecasting)
  • rt_minute_meal โ€“ Runtime Minute Meal Throughput (retail operations regression)

Model and stats artifacts generated by raycli train_travel_pack are published under rayskillkit/skills/{models,stats}/travel, so downstream tooling and release scripts can resolve the new paths without additional configuration. The travel model binaries are produced on demand by CI and distributed with release bundles rather than being committed directly to the repository. Running raycli train_retail_pack --output-root rayskillkit/skills --manifest-path rayskillkit/skills.json will train the retail fixtures and emit quantized INT8 ONNX exports alongside stats under rayskillkit/skills/{models,stats}/retail; these artifacts are version-controlled to keep the SDK regression suite deterministic.

๐Ÿ“ฆ Pilot drop packaging

RaySkillKit binaries and stats are distributed as self-contained pilot drops. Each drop includes:

  1. skills_bundle.zip โ€“ a compressed copy of rayskillkit/skills/{models,stats}.
  2. release_manifest.json โ€“ a manifest produced by cicd.make_manifest describing every file and its SHA256 digest.
  3. release_manifest.sig โ€“ an Ed25519 signature of the manifest emitted by cicd.sign_manifest.

Use cicd/package_release.py to assemble these artifacts locally. Provide an Ed25519 seed through either --key (path to raw bytes) or --key-env (name of an environment variable containing the seed in hex/base64) so the manifest can be signed:

export PILOT_SIGNING_KEY=$(openssl rand -hex 32)  # replace with your long-term key material
python cicd/package_release.py \
  --staging-dir dist/local_pilot_drop \
  --bundle-name skills_bundle.zip \
  --key-env PILOT_SIGNING_KEY

ls dist/local_pilot_drop
# skills/  skills_bundle.zip  release_manifest.json  release_manifest.sig

Tagged pushes (e.g., v1.2.3) automatically run the same script via the Release Packaging GitHub Actions workflow. The workflow expects a MANIFEST_SIGNING_KEY repository secret containing the Ed25519 seed; it uploads the bundle, manifest, and signature as workflow artifacts and attaches them to a draft GitHub Release so the files are ready for distribution without manual steps.


๐Ÿ“ˆ Benchmarks

Audio latency bench

Run the synthetic audio benchmark to profile EnergyVAD frame counts and ASRStream stability without relying on any external recordings:

python bench/audio_bench.py --out artifacts/audio_latency.csv

The script procedurally generates deterministic tone, silence, and speech-like signals, replays scripted MockASR partials, and writes latency/frame/reversal metrics to both artifacts/audio_latency.csv and the telemetry metrics artifacts for CI consumption.

Image keyframe + OCR bench

Profile the select_keyframes and VQEncoder pipeline alongside the synthetic OCR mock:

python bench/image_bench.py

The script renders deterministic clips (static, gradient, motion), records selection/encoding timings into artifacts/image_latency.csv, and evaluates MockOCR precision on fabricated panels with results stored in artifacts/ocr_results.csv. See the Week 3 Report for design notes, invariances, and interpretation tips.

๐Ÿ“Š Runtime metrics

The edge runtime emits per-stage latency histograms for the VAD, ASR, Vision, LLM, and Skill phases. Each stage wraps its critical section in a record_latency(<stage>) context manager so the /metrics endpoint aggregates counts, totals, averages, and min/max timings for individual stages plus an all roll-up across them. Alongside latencies, the endpoint surfaces lifecycle counters (sessions.created, sessions.active) and total query volume so operators can track load and concurrency without inspecting logs. The response also reports a boolean display_available flag inferred from the active session agents (via SmartGlassAgent.has_display/display/overlay attributes) and, if no sessions exist, from the configured provider hint (display|glass|hud) to indicate whether the deployment can render overlays.

๐Ÿ” CI Audio Validation

Automated checks exercise both the VAD and ASR stacks entirely with synthetic fixtures so contributors can run the suite without credentials or cloud audio services. Unit tests such as tests/test_vad_thresholds.py and tests/test_vad_framing.py validate the energy math in EnergyVAD, while tests/test_asr_interface_contract.py and tests/test_asr_delta_gate.py assert that ฮด-gated streaming transcripts remain stable under injected noise. The bench/audio_bench.py workflow (added in PR #25) is wired into CI to publish the audio_latency.csv artifact summarising reversal counts and latency distributions. By default ASRStream instantiates the deterministic MockASR unless SMARTGLASS_USE_WHISPER=1, keeping the end-to-end validation loop entirely offline-friendly.


๐Ÿงญ 18-Week Learning Journey (Google Colab Curriculum)

Week Module
1 Multimodal Basics: Whisper + CLIP + GPT
2 Wake Words with Whisper
3 Scene Description with Vision-Language Models
4 Intent Detection + Prompt Engineering
5 Meta Ray-Ban SDK Simulation
6 Real-Time Audio Streaming
7 Visual OCR and Translation
8 Domain-Specific Voice Commands
9 On-Device Tiny Models (CPU)
10 Caching & Optimization
11 Mobile UI/UX Considerations
12 Meta Ray-Ban Deployment Flow
13 Use Case: Healthcare
14 SNN Student Distillation & Edge Eval
15 Android Integration & Client Walkthrough
16 Advanced SmartGlass Agent Orchestration
17 Action Mapping with RaySkillKit
18 Pitch Deck + Commercial Demo

See roadmap.md for full breakdown.


๐Ÿง  Use Case Highlights

  • ๐Ÿช Retail: "Hey Athena, price check"
  • ๐Ÿงณ Travel: "Translate this sign"
  • ๐Ÿฅ Healthcare: "Show patient vitals"
  • ๐Ÿ‘ฎ Security: "Alert mode on"
  • ๐ŸŽ“ Education: "Explain this object"

๐Ÿ““ Try It on Colab (Week 1 Session)

Open In Colab


๐Ÿงฑ Project Structure

SmartGlass-AI-Agent/
โ”œโ”€โ”€ colab_notebooks/    # 18-week training notebooks and end-to-end labs
โ”œโ”€โ”€ docs/               # Architecture notes, integration guides, privacy docs
โ”œโ”€โ”€ drivers/            # Provider implementations (see drivers/providers)
โ”œโ”€โ”€ examples/           # Domain-specific demos and CLIs
โ”œโ”€โ”€ rayskillkit/        # Skill registry and artifacts
โ”œโ”€โ”€ scripts/            # Training, packaging, and utility scripts
โ”œโ”€โ”€ sdk-android/        # Kotlin/Android SDK (stable APIs as of v1.0)
โ”œโ”€โ”€ sdk_python/         # Python SDK facade (mirrors src/ agent APIs)
โ”œโ”€โ”€ src/                # Core SmartGlassAgent implementation and backends
โ”œโ”€โ”€ tests/              # Automated regression suite
โ””โ”€โ”€ README.md

๐Ÿ“‹ Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • transformers, torchaudio, whisper, soundfile, Pillow, numpy
  • GPU Recommended for Colab (for Whisper + CLIP)

๐Ÿงช Testing on Meta Ray-Ban

  1. Use Ray-Ban app to capture photo/audio
  2. Transfer to your device or notebook
  3. Load into the SmartGlassAgent
  4. Use vision + audio inputs to trigger LLM responses

๐Ÿข Commercial Licensing

This project is available under the Apache 2.0 License for open learning and research use.

For commercial deployment, OEM integration, or enterprise modules:

๐Ÿ“ฉ farmountain@gmail.com

Commercial license includes:

  • Priority support
  • Proprietary components (e.g. RAG, EHR, NLU)
  • Integration with Meta SDK or smartglasses hardware

See NOTICE for details.


๐Ÿ‘จโ€๐Ÿ’ป Author

Liew Keong Han (@farmountain) AI Architect | AI Researcher ๐Ÿ”— GitHub


๐Ÿ“„ License

Licensed under the Apache License 2.0. See LICENSE for terms.


Built with โค๏ธ for the future of wearable AI

About

Build a multimodal AI assistant for smart glasses using Whisper, vision-language models, and LLM hand on code with Google Colab, tested with Meta Ray-Ban Wayfarer.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors