Skip to content

mwlaboratories/stt-anywhere

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

stt-anywhere

Kyutai STT powered speech-to-text service. Streams words in real time from any device on your network. Wayland desktops via push-to-talk, AR glasses via WebSocket relay. Runs entirely on your hardware.

speak → words appear live ✦ no cloud, no subscription, no latency

How it works

┌─────────────────────────────────────────────────┐
│  GPU workstation (always on)                    │
│                                                 │
│  stt-anywhere.py (one systemd service)          │
│  ├── push-to-talk    keyboard → wtype (local)   │
│  ├── :8099           WebSocket relay (remote)   │
│  │                                              │
│  │         both send audio to:                  │
│  │                ▼                             │
│  │   moshi-server :8098 (CUDA)                  │
│  │   audio → text (~500ms latency)              │
│  │   model in VRAM (~2.4 GB)                    │
│  └──────────────────────────────────────────────│
└────────────────────────┬────────────────────────┘
                         │
          Tailscale / LAN (your network)
                         │
       ┌─────────────────┼─────────────────┐
       │                 │                 │
       ▼                 ▼                 ▼
┌─────────────┐  ┌──────────────┐  ┌─────────────┐
│   Desktop   │  │  AR Glasses  │  │   Laptop    │
│   (Wayland) │  │ (Even G2)    │  │  (Wayland)  │
│             │  │              │  │             │
│  Mod+Space  │  │  deltaclaw   │  │  Mod+Space  │
│  to record  │  │  web app     │  │  to record  │
│  wtype to   │  │  tap to talk │  │  wtype to   │
│  cursor     │  │  voice msgs  │  │  cursor     │
└─────────────┘  └──────────────┘  └─────────────┘

Wayland push-to-talk

Press a key, speak, release. Words stream into whatever text field has focus via wtype. Works on niri, Hyprland, Sway, COSMIC, and any wlroots-based compositor.

AR glasses relay

The relay server accepts WebSocket audio from remote clients (e.g. deltaclaw on EvenRealities G2 glasses) and returns transcribed words as JSON.

Why

  • Real-time : words appear while you’re still talking
  • Private : audio never leaves your machine or network
  • Multi-device : one GPU serves desktops, laptops, glasses, anything
  • Set and forget : systemd services start with your session

Quick start

Add the flake input and import the Home Manager module:

# flake.nix
{
  inputs.stt-anywhere.url = "github:mwlaboratories/stt-anywhere";

  # in your home-manager config:
  imports = [ inputs.stt-anywhere.homeManagerModules.default ];

  services.stt-anywhere = {
    enable = true;
    cudaCapability = "8.6";  # RTX 3060-3090
  };
}

Rebuild, bind a key to toggle recording, done:

// niri
binds {
    Mod+Space { spawn "systemctl" "--user" "kill" "-s" "USR1" "stt-anywhere.service"; } // toggle recording
    Mod+T     { spawn "sh" "-c" "if systemctl --user is-active --quiet stt-anywhere.service; then systemctl --user stop stt-anywhere.service && notify-send 'stt-anywhere' 'Stopped'; else systemctl --user start stt-anywhere.service && notify-send 'stt-anywhere' 'Started'; fi"; } // start/stop service
}

The model (~2.4 GB) downloads from HuggingFace on first run.

Configuration examples

Single machine (GPU + client together)

services.stt-anywhere = {
  enable = true;
  cudaCapability = "8.6";
};

GPU server (shared on your network)

services.stt-anywhere = {
  enable = true;
  cudaCapability = "8.6";
  serverAddr = "0.0.0.0";  # accept moshi connections over Tailscale
  relayPort = 8099;         # WebSocket relay for remote clients
};

Client (laptop, no GPU)

services.stt-anywhere = {
  enable = true;
  enableServer = false;
  serverUrl = "ws://workstation:8098";
};

Requirements

Server (runs inference)

  • NVIDIA GPU with CUDA
  • NixOS (flake-based)

Client (records + types)

  • Wayland compositor + PipeWire
  • NixOS (flake-based)
  • No GPU needed when connecting to a remote server

CUDA capability

GPUCapability
RTX 3060-3090"8.6"
RTX 4060-4090"8.9"
RTX 5070-5090"10.0"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors