Draft
Conversation
Implements wake-word functionality, based on the models from the openWakeWord project.
Owner
|
I just want to say thank you for this and at the very least other people can use this in their forks given you've put it here. Right now I am overwhelmed with issues and trying to stabilize the app as well as get the new keyboard implementation in, so it will take I would guess months at this point for me to review and get this in, just as a heads up. |
Author
|
Yeah, no sweat, the functionality works for me, so I don't have to wait for anything to use it. Take care of making the software better. |
Contributor
@cjpais curious what this is about? |
Owner
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Before Submitting This PR
Please confirm you have done the following:
If this is a feature or change that was previously closed/rejected:
Human Written Description
Hi! This is a Draft PR for the discussion of the WakeWord implementation. I chose a draft PR because the code is not production ready, and most likely won't be until we sort out all the details.
I started from the openWakeWord project, that seems to train wakewords, as well as provide an inference engine based on onnx to run them.
In order to understand how openWakeWord works, I stripped the entire logic and one example into a single file, very easy to understand, that I provide in this github gist here.
Basically, there are three models, the melspectrogram model, the embedding model from google and the actual wakeword model. And all that's needed is chopping up the audio into suitable pieces (resampling if needed), feeding the input from one model into the other up until the wakeword model, which gives us the prediction. The "Hey Mycroft" model seemed to have the best activation from all the models included.
Secondly, I translated the code to Rust, as a standalone example, provided in this github gist here. There are some more goodies than the simple example here. A project that already ports a form of openWakeWord to rust already exists, but I couldn't make it work.
Using the functional code from this example, I was able to implement a very rudimentary wakeword function for Handy, using the already present "Always-On Microphone", that's a prerequisite for the wakeword to function.
I posit that the models themselves are small enough and the license is not problematic to distribute them along with the app, not making the user also download them, but closer inspection of the openWakeWord repo should be done in order to do so. Otherwise, we can just download them like we do for the transcription models.
There is an additional question of stopping the recording after the wakeword was uttered. For the simple example, I did a 5 second delay, but we should find something more permanent here.
Ideally also, a "Hey Handy" model should be trained. The code from the oww repo seems to provide us with all we need in order to do it, we just need the computing capacity and data. I'd be an interesting exercise for the Handy community, if they want their voice / pronunciation to be a part of the model :)
You have to manually download these three models from openWakeWord and place them in order to form this directory tree:
https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/hey_mycroft_v0.1.onnx
https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/embedding_model.onnx
https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/melspectrogram.onnx
Related Issues/Discussions
Fixes #
Discussion:
Community Feedback
I announced the community on discord, with a video of the feature. @cjpais seemed interested in the code itself, saying that the feature was previously requested, but I couldn't find exactly where.
Testing
I tested the code manually, because only some e2e tests would be able to test such a vertical feature fully. But there didn't seem to be any e2e tests when I started.
I also added a test for the activation itself. It requires another external resource, a single-channel 16khz
.wavfile of the wake-word being spoken, that I didn't also include until we sort out what files are acceptable in a PR.Screenshots/Videos (if applicable)
The original video
wakeword.mp4
AI Assistance
If AI was used: