Training custom models in 2026: dependency fixes + Dockerized and Native solution

(edited title to indicate a native script is included in the referenced repo.)

First off, thank you for building this. OpenWakeWord is the reason I have a working custom wake word system running on my machine right now instead of being stuck with a cloud service or someone else's trigger phrase. The project is genuinely useful and I'm grateful it exists.

That said, I spent the better part of a week getting the training pipeline to work on a modern Linux system (Ubuntu-based, Python 3.12 default, CUDA 12.x). The dependency stack has aged out of compatibility with current distros and package versions, and there's no documentation for what breaks or how to fix it. I wanted to share what I found in case it helps others (or you, if you ever revisit this project).

Every issue below is in the upstream dependency stack, *not in your code*:

1. **`torch==1.13.1`** - No wheels exist for Python 3.12+. Requires Python 3.10 or 3.11 (via deadsnakes PPA on Ubuntu).
2. **`pyarrow`** - Newer versions broke the `datasets` library API (`PyExtensionType` removed). Fix: pin `pyarrow<15.0.0`.
3. **`fsspec`** - Newer versions broke `datasets` glob patterns. Fix: pin `fsspec<2024.1.0`.
4. **`webrtcvad`** - Requires C compilation but `build-essential` and `python3.X-dev` aren't listed as dependencies.
5. **`python3.10-venv`** - Package name is version-specific on Ubuntu. Scripts that install `python3-venv` get the wrong one.
6. **HuggingFace `.cache` directories** - `hf_hub_download` leaves `.cache/` directories alongside downloaded files. The training code tries to load them as audio and crashes.
7. **MIT RIR `16khz/` subdirectory** - Downloaded RIR files are nested in a subdirectory the training code doesn't expect.
8. **MIT RIR sample rate** - Original files from MIT are 32kHz. Training expects 16kHz. Requires conversion with ffmpeg.
9. **Docker shared memory** - PyTorch DataLoader needs `--shm-size=32g` or workers get killed with "No space left on device."
10. **HuggingFace rate limiting** - Downloading training data as individual files (tens of thousands of requests) triggers rate limits. Solved by packaging as a single tarball.
11. **Training segfault on cleanup** - `train.py` segfaults after saving the model. The model file is fine - the crash happens during cleanup. Harmless but scary.
12. **Python output buffering in Docker** - Progress output doesn't appear without `PYTHONUNBUFFERED=1`.

## What I built

Rather than just documenting the fixes, I containerized the entire training pipeline in Docker, as well as a native linux install, so the fragile dependency stack is frozen and isolated:

- **Dockerfile** with every package pinned to known working versions (torch 1.13.1+cu117, tensorflow-cpu 2.8.1, datasets 2.14.4, etc.)
- **Interactive wrapper script** that walks users through wake word selection, training settings, and launches the container
- **Custom wake word support** - any phrase, not hardcoded
- **Training data hosted on HuggingFace** as a single ~20GB tarball (avoids rate limiting)
- **Empirical testing data** from multiple training runs comparing sample counts, augmentation rounds, neuron depth, and single vs. two-word phrases

The repo is here: https://github.com/briankelley/atlas-voice-training/

The Dockerfile pins your repo to commit `368c037` and piper-sample-generator to commit `f1988a4`.

## Empirical findings (probably not useful for you since the default options produced the highest quality models already. I'm sure this wasn't an accident.)

Ran 5+ training configurations on both an RTX 3060 laptop and RTX 4090 desktop:

| Wake Word | Samples | Config | Accuracy | Recall | FP/hr |
|-----------|---------|--------|----------|--------|-------|
| "Hey Atlas" | 50k | 2 aug, 32n | **81.10%** | **62.48%** | 2.12 |
| "Hey Atlas" | 100k | 2 aug, 32n | 77.47% | 55.08% | **0.62** |
| "Atlas" | 50k | 3 aug, 32n | 71.64% | 43.54% | 2.57 |
| "Atlas" | 50k | 2 aug, 64n | 71.94% | 44.04% | 2.48 |
| "Globe Master" | 50k | 2 aug, 32n | 81.07% | 62.20% | **1.24** |

## Not asking for anything specific

I understand projects age and maintainers move on. I'm not requesting changes - just sharing what I ran into and what I did about it in case it saves someone else the same week of debugging. If any of this is useful to you or the project, happy to help however I can.

Thanks again for OpenWakeWord. It's a great piece of work that got me started on the implementation I'm using now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training custom models in 2026: dependency fixes + Dockerized and Native solution #317

What I built

Empirical findings (probably not useful for you since the default options produced the highest quality models already. I'm sure this wasn't an accident.)

Not asking for anything specific

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Wake Word	Samples	Config	Accuracy	Recall	FP/hr
"Hey Atlas"	50k	2 aug, 32n	81.10%	62.48%	2.12
"Hey Atlas"	100k	2 aug, 32n	77.47%	55.08%	0.62
"Atlas"	50k	3 aug, 32n	71.64%	43.54%	2.57
"Atlas"	50k	2 aug, 64n	71.94%	44.04%	2.48
"Globe Master"	50k	2 aug, 32n	81.07%	62.20%	1.24

Training custom models in 2026: dependency fixes + Dockerized and Native solution #317

Description

What I built

Empirical findings (probably not useful for you since the default options produced the highest quality models already. I'm sure this wasn't an accident.)

Not asking for anything specific

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions