Skip to content

Comments

Wake-Word#618

Draft
tachyonicbytes wants to merge 3 commits intocjpais:mainfrom
tachyonicbytes:feat/wakeword
Draft

Wake-Word#618
tachyonicbytes wants to merge 3 commits intocjpais:mainfrom
tachyonicbytes:feat/wakeword

Conversation

@tachyonicbytes
Copy link

Before Submitting This PR

Please confirm you have done the following:

If this is a feature or change that was previously closed/rejected:

  • I have explained in the description below why this should be reconsidered
  • I have gathered community feedback (link to discussion below)

Human Written Description

Hi! This is a Draft PR for the discussion of the WakeWord implementation. I chose a draft PR because the code is not production ready, and most likely won't be until we sort out all the details.

I started from the openWakeWord project, that seems to train wakewords, as well as provide an inference engine based on onnx to run them.

In order to understand how openWakeWord works, I stripped the entire logic and one example into a single file, very easy to understand, that I provide in this github gist here.

Basically, there are three models, the melspectrogram model, the embedding model from google and the actual wakeword model. And all that's needed is chopping up the audio into suitable pieces (resampling if needed), feeding the input from one model into the other up until the wakeword model, which gives us the prediction. The "Hey Mycroft" model seemed to have the best activation from all the models included.

Secondly, I translated the code to Rust, as a standalone example, provided in this github gist here. There are some more goodies than the simple example here. A project that already ports a form of openWakeWord to rust already exists, but I couldn't make it work.

Using the functional code from this example, I was able to implement a very rudimentary wakeword function for Handy, using the already present "Always-On Microphone", that's a prerequisite for the wakeword to function.

I posit that the models themselves are small enough and the license is not problematic to distribute them along with the app, not making the user also download them, but closer inspection of the openWakeWord repo should be done in order to do so. Otherwise, we can just download them like we do for the transcription models.

There is an additional question of stopping the recording after the wakeword was uttered. For the simple example, I did a 5 second delay, but we should find something more permanent here.

Ideally also, a "Hey Handy" model should be trained. The code from the oww repo seems to provide us with all we need in order to do it, we just need the computing capacity and data. I'd be an interesting exercise for the Handy community, if they want their voice / pronunciation to be a part of the model :)

You have to manually download these three models from openWakeWord and place them in order to form this directory tree:

https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/hey_mycroft_v0.1.onnx
https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/embedding_model.onnx
https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/melspectrogram.onnx

src-tauri/resources/models/embedding_model.onnx
src-tauri/resources/models/hey_mycroft_v0.1.onnx
src-tauri/resources/models/melspectrogram.onnx

Related Issues/Discussions

Fixes #
Discussion:

Community Feedback

I announced the community on discord, with a video of the feature. @cjpais seemed interested in the code itself, saying that the feature was previously requested, but I couldn't find exactly where.

Testing

I tested the code manually, because only some e2e tests would be able to test such a vertical feature fully. But there didn't seem to be any e2e tests when I started.

I also added a test for the activation itself. It requires another external resource, a single-channel 16khz .wav file of the wake-word being spoken, that I didn't also include until we sort out what files are acceptable in a PR.

Screenshots/Videos (if applicable)

The original video

wakeword.mp4

AI Assistance

  • No AI was used in this PR
  • AI was used (please describe below)

If AI was used:

  • Tools used: VSCode + Agent mode, with either GPT5, GPT-4o or Claude Sonnet 4.
  • How extensively: Much of the editing and making sense of the code. It needed a lot of steering and hand-holding. I originally tried a direct port from openWakeWord, but it failed spectacularly. It couldn't even make the first gist in this PR description, that was made manually. It did translate that code to rust though (the second gist), but bugfixes were needed even then.

@cjpais
Copy link
Owner

cjpais commented Jan 19, 2026

I just want to say thank you for this and at the very least other people can use this in their forks given you've put it here. Right now I am overwhelmed with issues and trying to stabilize the app as well as get the new keyboard implementation in, so it will take I would guess months at this point for me to review and get this in, just as a heads up.

@tachyonicbytes
Copy link
Author

tachyonicbytes commented Jan 19, 2026

Yeah, no sweat, the functionality works for me, so I don't have to wait for anything to use it. Take care of making the software better.

@pchalasani
Copy link
Contributor

new keyboard implementation

@cjpais curious what this is about?

@cjpais
Copy link
Owner

cjpais commented Jan 30, 2026

@pchalasani #580

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants