stt-anywhere

Kyutai STT powered speech-to-text service. Streams words in real time from any device on your network. Wayland desktops via push-to-talk, AR glasses via WebSocket relay. Runs entirely on your hardware.

speak → words appear live ✦ no cloud, no subscription, no latency

How it works

┌─────────────────────────────────────────────────┐
│  GPU workstation (always on)                    │
│                                                 │
│  stt-anywhere.py (one systemd service)          │
│  ├── push-to-talk    keyboard → wtype (local)   │
│  ├── :8099           WebSocket relay (remote)   │
│  │                                              │
│  │         both send audio to:                  │
│  │                ▼                             │
│  │   moshi-server :8098 (CUDA)                  │
│  │   audio → text (~500ms latency)              │
│  │   model in VRAM (~2.4 GB)                    │
│  └──────────────────────────────────────────────│
└────────────────────────┬────────────────────────┘
                         │
          Tailscale / LAN (your network)
                         │
       ┌─────────────────┼─────────────────┐
       │                 │                 │
       ▼                 ▼                 ▼
┌─────────────┐  ┌──────────────┐  ┌─────────────┐
│   Desktop   │  │  AR Glasses  │  │   Laptop    │
│   (Wayland) │  │ (Even G2)    │  │  (Wayland)  │
│             │  │              │  │             │
│  Mod+Space  │  │  deltaclaw   │  │  Mod+Space  │
│  to record  │  │  web app     │  │  to record  │
│  wtype to   │  │  tap to talk │  │  wtype to   │
│  cursor     │  │  voice msgs  │  │  cursor     │
└─────────────┘  └──────────────┘  └─────────────┘

Wayland push-to-talk

Press a key, speak, release. Words stream into whatever text field has focus via wtype. Works on niri, Hyprland, Sway, COSMIC, and any wlroots-based compositor.

AR glasses relay

The relay server accepts WebSocket audio from remote clients (e.g. deltaclaw on EvenRealities G2 glasses) and returns transcribed words as JSON.

Why

Real-time : words appear while you’re still talking
Private : audio never leaves your machine or network
Multi-device : one GPU serves desktops, laptops, glasses, anything
Set and forget : systemd services start with your session

Quick start

Add the flake input and import the Home Manager module:

# flake.nix
{
  inputs.stt-anywhere.url = "github:mwlaboratories/stt-anywhere";

  # in your home-manager config:
  imports = [ inputs.stt-anywhere.homeManagerModules.default ];

  services.stt-anywhere = {
    enable = true;
    cudaCapability = "8.6";  # RTX 3060-3090
  };
}

Rebuild, bind a key to toggle recording, done:

// niri
binds {
    Mod+Space { spawn "systemctl" "--user" "kill" "-s" "USR1" "stt-anywhere.service"; } // toggle recording
    Mod+T     { spawn "sh" "-c" "if systemctl --user is-active --quiet stt-anywhere.service; then systemctl --user stop stt-anywhere.service && notify-send 'stt-anywhere' 'Stopped'; else systemctl --user start stt-anywhere.service && notify-send 'stt-anywhere' 'Started'; fi"; } // start/stop service
}

The model (~2.4 GB) downloads from HuggingFace on first run.

Configuration examples

Single machine (GPU + client together)

services.stt-anywhere = {
  enable = true;
  cudaCapability = "8.6";
};

GPU server (shared on your network)

services.stt-anywhere = {
  enable = true;
  cudaCapability = "8.6";
  serverAddr = "0.0.0.0";  # accept moshi connections over Tailscale
  relayPort = 8099;         # WebSocket relay for remote clients
};

Client (laptop, no GPU)

services.stt-anywhere = {
  enable = true;
  enableServer = false;
  serverUrl = "ws://workstation:8098";
};

Requirements

Server (runs inference)

NVIDIA GPU with CUDA
NixOS (flake-based)

Client (records + types)

Wayland compositor + PipeWire
NixOS (flake-based)
No GPU needed when connecting to a remote server

CUDA capability

GPU	Capability
RTX 3060-3090	`"8.6"`
RTX 4060-4090	`"8.9"`
RTX 5070-5090	`"10.0"`

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
nix		nix
.gitignore		.gitignore
flake.lock		flake.lock
flake.nix		flake.nix
readme.org		readme.org
stt-anywhere.py		stt-anywhere.py
stt-config.toml		stt-config.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stt-anywhere

How it works

Wayland push-to-talk

AR glasses relay

Why

Quick start

Configuration examples

Single machine (GPU + client together)

GPU server (shared on your network)

Client (laptop, no GPU)

Requirements

Server (runs inference)

Client (records + types)

CUDA capability

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

stt-anywhere

How it works

Wayland push-to-talk

AR glasses relay

Why

Quick start

Configuration examples

Single machine (GPU + client together)

GPU server (shared on your network)

Client (laptop, no GPU)

Requirements

Server (runs inference)

Client (records + types)

CUDA capability

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages