(edited title to indicate a native script is included in the referenced repo.)
First off, thank you for building this. OpenWakeWord is the reason I have a working custom wake word system running on my machine right now instead of being stuck with a cloud service or someone else's trigger phrase. The project is genuinely useful and I'm grateful it exists.
That said, I spent the better part of a week getting the training pipeline to work on a modern Linux system (Ubuntu-based, Python 3.12 default, CUDA 12.x). The dependency stack has aged out of compatibility with current distros and package versions, and there's no documentation for what breaks or how to fix it. I wanted to share what I found in case it helps others (or you, if you ever revisit this project).
Every issue below is in the upstream dependency stack, not in your code:
torch==1.13.1 - No wheels exist for Python 3.12+. Requires Python 3.10 or 3.11 (via deadsnakes PPA on Ubuntu).
pyarrow - Newer versions broke the datasets library API (PyExtensionType removed). Fix: pin pyarrow<15.0.0.
fsspec - Newer versions broke datasets glob patterns. Fix: pin fsspec<2024.1.0.
webrtcvad - Requires C compilation but build-essential and python3.X-dev aren't listed as dependencies.
python3.10-venv - Package name is version-specific on Ubuntu. Scripts that install python3-venv get the wrong one.
- HuggingFace
.cache directories - hf_hub_download leaves .cache/ directories alongside downloaded files. The training code tries to load them as audio and crashes.
- MIT RIR
16khz/ subdirectory - Downloaded RIR files are nested in a subdirectory the training code doesn't expect.
- MIT RIR sample rate - Original files from MIT are 32kHz. Training expects 16kHz. Requires conversion with ffmpeg.
- Docker shared memory - PyTorch DataLoader needs
--shm-size=32g or workers get killed with "No space left on device."
- HuggingFace rate limiting - Downloading training data as individual files (tens of thousands of requests) triggers rate limits. Solved by packaging as a single tarball.
- Training segfault on cleanup -
train.py segfaults after saving the model. The model file is fine - the crash happens during cleanup. Harmless but scary.
- Python output buffering in Docker - Progress output doesn't appear without
PYTHONUNBUFFERED=1.
What I built
Rather than just documenting the fixes, I containerized the entire training pipeline in Docker, as well as a native linux install, so the fragile dependency stack is frozen and isolated:
- Dockerfile with every package pinned to known working versions (torch 1.13.1+cu117, tensorflow-cpu 2.8.1, datasets 2.14.4, etc.)
- Interactive wrapper script that walks users through wake word selection, training settings, and launches the container
- Custom wake word support - any phrase, not hardcoded
- Training data hosted on HuggingFace as a single ~20GB tarball (avoids rate limiting)
- Empirical testing data from multiple training runs comparing sample counts, augmentation rounds, neuron depth, and single vs. two-word phrases
The repo is here: https://github.com/briankelley/atlas-voice-training/
The Dockerfile pins your repo to commit 368c037 and piper-sample-generator to commit f1988a4.
Empirical findings (probably not useful for you since the default options produced the highest quality models already. I'm sure this wasn't an accident.)
Ran 5+ training configurations on both an RTX 3060 laptop and RTX 4090 desktop:
| Wake Word |
Samples |
Config |
Accuracy |
Recall |
FP/hr |
| "Hey Atlas" |
50k |
2 aug, 32n |
81.10% |
62.48% |
2.12 |
| "Hey Atlas" |
100k |
2 aug, 32n |
77.47% |
55.08% |
0.62 |
| "Atlas" |
50k |
3 aug, 32n |
71.64% |
43.54% |
2.57 |
| "Atlas" |
50k |
2 aug, 64n |
71.94% |
44.04% |
2.48 |
| "Globe Master" |
50k |
2 aug, 32n |
81.07% |
62.20% |
1.24 |
Not asking for anything specific
I understand projects age and maintainers move on. I'm not requesting changes - just sharing what I ran into and what I did about it in case it saves someone else the same week of debugging. If any of this is useful to you or the project, happy to help however I can.
Thanks again for OpenWakeWord. It's a great piece of work that got me started on the implementation I'm using now.
(edited title to indicate a native script is included in the referenced repo.)
First off, thank you for building this. OpenWakeWord is the reason I have a working custom wake word system running on my machine right now instead of being stuck with a cloud service or someone else's trigger phrase. The project is genuinely useful and I'm grateful it exists.
That said, I spent the better part of a week getting the training pipeline to work on a modern Linux system (Ubuntu-based, Python 3.12 default, CUDA 12.x). The dependency stack has aged out of compatibility with current distros and package versions, and there's no documentation for what breaks or how to fix it. I wanted to share what I found in case it helps others (or you, if you ever revisit this project).
Every issue below is in the upstream dependency stack, not in your code:
torch==1.13.1- No wheels exist for Python 3.12+. Requires Python 3.10 or 3.11 (via deadsnakes PPA on Ubuntu).pyarrow- Newer versions broke thedatasetslibrary API (PyExtensionTyperemoved). Fix: pinpyarrow<15.0.0.fsspec- Newer versions brokedatasetsglob patterns. Fix: pinfsspec<2024.1.0.webrtcvad- Requires C compilation butbuild-essentialandpython3.X-devaren't listed as dependencies.python3.10-venv- Package name is version-specific on Ubuntu. Scripts that installpython3-venvget the wrong one..cachedirectories -hf_hub_downloadleaves.cache/directories alongside downloaded files. The training code tries to load them as audio and crashes.16khz/subdirectory - Downloaded RIR files are nested in a subdirectory the training code doesn't expect.--shm-size=32gor workers get killed with "No space left on device."train.pysegfaults after saving the model. The model file is fine - the crash happens during cleanup. Harmless but scary.PYTHONUNBUFFERED=1.What I built
Rather than just documenting the fixes, I containerized the entire training pipeline in Docker, as well as a native linux install, so the fragile dependency stack is frozen and isolated:
The repo is here: https://github.com/briankelley/atlas-voice-training/
The Dockerfile pins your repo to commit
368c037and piper-sample-generator to commitf1988a4.Empirical findings (probably not useful for you since the default options produced the highest quality models already. I'm sure this wasn't an accident.)
Ran 5+ training configurations on both an RTX 3060 laptop and RTX 4090 desktop:
Not asking for anything specific
I understand projects age and maintainers move on. I'm not requesting changes - just sharing what I ran into and what I did about it in case it saves someone else the same week of debugging. If any of this is useful to you or the project, happy to help however I can.
Thanks again for OpenWakeWord. It's a great piece of work that got me started on the implementation I'm using now.