feat: Add transcription of audio file by olejsc · Pull Request #371 · cjpais/Handy

olejsc · 2025-11-22T18:05:16Z

This is a basic support for uploading files and having them transcribed (#299)

Dislaimer: AI was used to assist with this.

Feel free to correct, adjust and tweak as much as you like.

A couple of notes:

Several design liberties was taken. I.e how to handle history entries for example, but also design liberties where the upload functionality should reside. Feel free to adjust / correct it, this is just a Proof of concept really.
I tried as best as I can to avoid any new dependencies. Rodio which wraps around symphonia is used. The only new dependency is the dialogue file picker, which was needed to open the folders.
NOT tested on MacOS ( I dont own one..)
File limit set to 64 mb for the time beeing
A copy of the file is NOT made for the time beeing, so it does not get copied to the recordings folder, but instead the database gets a path to the file being transcribed.
- If file is missing from its path, we cannot playback it (disabled playback button). The transcribed text remains in history though.
- Audio recordings now have a microphone icon on their history entry, while manual file uploads have a document icon ( + tooltip)
Im unsure how it would handle if you were to start another transcribing (with microphone) while it is processing a file. Edge cases are plenty I suspect. 🤔
I think i managed to keep post processing functionality working with it, as it just re-uses existing post processing logic.
Existing logic for what to do with the transcription when its done should remain identical (copy/paste + storing in history).
AI helped me quite a far bit with this. 7-9 chats with it.

Core Features:

Users can now upload audio files (MP3, WAV, M4A, FLAC, OGG, AAC) through a new UploadAudioButton component
Backend transcribe_file command handles the full transcription pipeline: decode → transcribe → post-process → save → paste
New decode_audio_file function converts various audio formats to 16kHz mono PCM samples using the rodio decoder

History System Changes:

Added source_file_path column to distinguish between uploaded files and mic recordings
Uploaded files reference the original source file instead of creating WAV copies
File existence checks prevent playback errors for missing uploaded files
UI shows icons (FileText vs Mic) and warnings for missing source files

User Interface:

Upload button integrated into History Settings with loading states and error handling
Real-time event system for transcription status (file-transcription-started, completed, failed)
Audio player component supports disabled state for missing files

Technical Details:

Uses tauri-plugin-dialog for file picker integration
File validation: 64MB size limit, supported format checking
Full post-processing pipeline support (LLM, Chinese variant conversion)
Database migration to v4 for new schema
The feature maintains parity with regular recordings - uploaded files receive the same post-processing and are saved to history, but skip WAV duplication to avoid unnecessary storage.

Gif demo:

- Added `UploadAudioButton` component for selecting and uploading audio files. - Implemented `transcribe_file` command to handle audio transcription requests. - Introduced `decode_audio_file` function for decoding various audio formats to PCM samples. - Enhanced `HistoryManager` to support saving transcriptions with optional source file paths. - Updated `HistorySettings` to include audio file upload functionality and display transcription status. - Added audio file existence checks and improved error handling during transcription. - Integrated audio playback with `AudioPlayer` component, including disabled state for missing files. - Updated Tauri plugins and capabilities to support dialog and file system operations.

olejsc · 2025-11-22T18:18:02Z

One more thing; I'm not sure how it fares with different audio formats. I had to take some considerate technical choices in terms of audio processing:

Decode an audio file (MP3, WAV, FLAC, etc.) to mono PCM samples at 16kHz .
I just went with AI recomemndation on this topic. I have no technical clue if its good quality. When I tested it I could get the transcription to work at normal levels, but would be neat if someone with knowledge of audio processing could give their opinion on best practices here. Regarding mono/steroe they just get blended.

olejsc · 2025-11-24T11:58:22Z

I suspect this can be used to support issue for reprocessing transcriptions: #125

cjpais · 2025-11-27T10:41:44Z

im not sure im ready to pull this feature in yet. not sure what the best ui for it is. this is okay, but i suspect theres something a bit nicer. not sure yet

cjpais · 2025-11-28T00:49:04Z

closing in favor of #381

cleanup

c3c5399

olejsc changed the title ~~feat: Add audio file upload and transcription functionality~~ feat: Add transcription of audio file Nov 22, 2025

Signal46 mentioned this pull request Nov 25, 2025

Added feature of transcription of local files (WAV, MP3 and M4A) along with progressbar #381

Open

cjpais closed this Nov 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

feat: Add transcription of audio file#371

feat: Add transcription of audio file#371
olejsc wants to merge 2 commits intocjpais:mainfrom
olejsc:feature/transcribe-audio-file

olejsc commented Nov 22, 2025

Uh oh!

olejsc commented Nov 22, 2025

Uh oh!

olejsc commented Nov 24, 2025

Uh oh!

cjpais commented Nov 27, 2025

Uh oh!

cjpais commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

olejsc commented Nov 22, 2025

Core Features:

History System Changes:

User Interface:

Technical Details:

Uh oh!

olejsc commented Nov 22, 2025

Uh oh!

olejsc commented Nov 24, 2025

Uh oh!

cjpais commented Nov 27, 2025

Uh oh!

cjpais commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants