Conversation
|
@jianfch would be great if you can review this :) Thanks. |
jianfch
left a comment
There was a problem hiding this comment.
whispercpp_model.align = MethodType(align, whispercpp_model)
whispercpp_model.align_words = MethodType(align_words, whispercpp_model)
whispercpp_model.refine = MethodType(refine, whispercpp_model)Alignment requires additional logic in get_whisper_alignment_func(), and for refinement in get_whisper_refinement_func().
Let uses model_type == 'cpp'. For alignment, the model_type is handled in align() and align_words(), and for refinement in refine().
The get_whisper_alignment_func() simply returns a function with this signature. The returned list contains dictionaries containing word, start and end. Similar case for get_whisper_refinement_func() but the function returns a tensor of the confidence scores for the input tokens.
If pywhispercpp does not have the low level bindings that return the necessary data for either case then we can just move forward first without implementing alignment/refinement.
Otherwise, it looks great. We just need to add "cpp" to the extra_require in setup.py:
"mlx": [
"mlx-whisper"
],
"cpp": [
"pywhispercpp"
]And include a test in test.yml below the mlx-test. Something like this:
cpp-test:
runs-on: macos-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install FFmpeg
run: brew install ffmpeg
- name: Install Package with pywhispercpp dependencies
run: pip3 install .["cpp"]
- name: Run CPP transcribe tests
run: python test/test_transcribe.py load_whispercpp
- name: Run CPP align tests
run: python test/test_align.py load_whispercpp
- name: Run CPP refine tests
run: python test/test_refine.py load_whispercpp
whisper.cpp is the gold standard for transcription on Macs.
It is fast, and more accurate than mlx-whisper (which does not support beam search).
While I personally do not have a Mac - and am happy to stay unmac-d - some of the users of our non-profit (ivrit.ai) use it.
We'd like them to enjoy the wonderful benefits of accurate timing, hence want to add whisper.cpp support to stable-ts.
If you can review, merge and release a new version with this code - that will be wonderful.
Many thanks!