Skip to content

Comments

feat: Add transcription of audio file#371

Closed
olejsc wants to merge 2 commits intocjpais:mainfrom
olejsc:feature/transcribe-audio-file
Closed

feat: Add transcription of audio file#371
olejsc wants to merge 2 commits intocjpais:mainfrom
olejsc:feature/transcribe-audio-file

Conversation

@olejsc
Copy link
Contributor

@olejsc olejsc commented Nov 22, 2025

This is a basic support for uploading files and having them transcribed (#299)

Dislaimer: AI was used to assist with this.

Feel free to correct, adjust and tweak as much as you like.

A couple of notes:

  • Several design liberties was taken. I.e how to handle history entries for example, but also design liberties where the upload functionality should reside. Feel free to adjust / correct it, this is just a Proof of concept really.
  • I tried as best as I can to avoid any new dependencies. Rodio which wraps around symphonia is used. The only new dependency is the dialogue file picker, which was needed to open the folders.
  • NOT tested on MacOS ( I dont own one..)
  • File limit set to 64 mb for the time beeing
  • A copy of the file is NOT made for the time beeing, so it does not get copied to the recordings folder, but instead the database gets a path to the file being transcribed.
    • If file is missing from its path, we cannot playback it (disabled playback button). The transcribed text remains in history though.
    • Audio recordings now have a microphone icon on their history entry, while manual file uploads have a document icon ( + tooltip)
  • Im unsure how it would handle if you were to start another transcribing (with microphone) while it is processing a file. Edge cases are plenty I suspect. 🤔
  • I think i managed to keep post processing functionality working with it, as it just re-uses existing post processing logic.
  • Existing logic for what to do with the transcription when its done should remain identical (copy/paste + storing in history).
  • AI helped me quite a far bit with this. 7-9 chats with it.

Core Features:

  • Users can now upload audio files (MP3, WAV, M4A, FLAC, OGG, AAC) through a new UploadAudioButton component
  • Backend transcribe_file command handles the full transcription pipeline: decode → transcribe → post-process → save → paste
  • New decode_audio_file function converts various audio formats to 16kHz mono PCM samples using the rodio decoder

History System Changes:

  • Added source_file_path column to distinguish between uploaded files and mic recordings
  • Uploaded files reference the original source file instead of creating WAV copies
  • File existence checks prevent playback errors for missing uploaded files
  • UI shows icons (FileText vs Mic) and warnings for missing source files

User Interface:

  • Upload button integrated into History Settings with loading states and error handling
  • Real-time event system for transcription status (file-transcription-started, completed, failed)
  • Audio player component supports disabled state for missing files

Technical Details:

  • Uses tauri-plugin-dialog for file picker integration
  • File validation: 64MB size limit, supported format checking
  • Full post-processing pipeline support (LLM, Chinese variant conversion)
  • Database migration to v4 for new schema
  • The feature maintains parity with regular recordings - uploaded files receive the same post-processing and are saved to history, but skip WAV duplication to avoid unnecessary storage.

Gif demo:
demo-file-upload-audio-file-transcribe

- Added `UploadAudioButton` component for selecting and uploading audio files.
- Implemented `transcribe_file` command to handle audio transcription requests.
- Introduced `decode_audio_file` function for decoding various audio formats to PCM samples.
- Enhanced `HistoryManager` to support saving transcriptions with optional source file paths.
- Updated `HistorySettings` to include audio file upload functionality and display transcription status.
- Added audio file existence checks and improved error handling during transcription.
- Integrated audio playback with `AudioPlayer` component, including disabled state for missing files.
- Updated Tauri plugins and capabilities to support dialog and file system operations.
@olejsc
Copy link
Contributor Author

olejsc commented Nov 22, 2025

One more thing; I'm not sure how it fares with different audio formats. I had to take some considerate technical choices in terms of audio processing:

  • Decode an audio file (MP3, WAV, FLAC, etc.) to mono PCM samples at 16kHz .
    I just went with AI recomemndation on this topic. I have no technical clue if its good quality. When I tested it I could get the transcription to work at normal levels, but would be neat if someone with knowledge of audio processing could give their opinion on best practices here. Regarding mono/steroe they just get blended.

@olejsc olejsc changed the title feat: Add audio file upload and transcription functionality feat: Add transcription of audio file Nov 22, 2025
@olejsc
Copy link
Contributor Author

olejsc commented Nov 24, 2025

I suspect this can be used to support issue for reprocessing transcriptions: #125

@cjpais
Copy link
Owner

cjpais commented Nov 27, 2025

im not sure im ready to pull this feature in yet. not sure what the best ui for it is. this is okay, but i suspect theres something a bit nicer. not sure yet

@cjpais
Copy link
Owner

cjpais commented Nov 28, 2025

closing in favor of #381

@cjpais cjpais closed this Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants