Skip to content

feat: add audio transcription functionality#2268

Open
viniciusventura29 wants to merge 16 commits intomainfrom
feat/chat-transcription
Open

feat: add audio transcription functionality#2268
viniciusventura29 wants to merge 16 commits intomainfrom
feat/chat-transcription

Conversation

@viniciusventura29
Copy link
Contributor

@viniciusventura29 viniciusventura29 commented Jan 22, 2026

What is this contribution about?

Adds voice input capabilities to the chat interface with automatic speech-to-text transcription.
Changes

New API endpoint

  • POST /:org/transcribe - Accepts audio files or URLs and returns transcribed text

New bindings

  • TRANSCRIPTION_BINDING - For speech-to-text providers
  • OBJECT_STORAGE_BINDING - For temporary audio file storage

Frontend

  • useAudioRecorder hook for browser microphone capture
  • Chat input now supports voice recording

How it works

1 - User records audio in the browser
2 - Audio is uploaded to Object Storage (temp file)
3 - Transcription service processes the audio URL
4 - Temp file is cleaned up automatically
5 - Transcribed text is returned to the chat

Requirements

  • Connections implementing:
  • TRANSCRIPTION_BINDING (e.g., OpenAI Whisper, Deepgram)
  • OBJECT_STORAGE_BINDING (e.g., S3, R2, GCS) - only needed for file uploads

Screenshots/Demonstration

https://www.loom.com/share/2299fc0160364c6ead724c8d8925d04d

Review Checklist

  • PR title is clear and descriptive
  • Changes are tested and working
  • Documentation is updated (if needed)
  • No breaking changes

Summary by cubic

Adds voice input to chat with an audio recorder and a new transcription API. Audio is sent as a blob or public URL, transcribed via a TRANSCRIPTION binding, and the text is inserted into the chat input.

  • New Features

    • New POST /api/:org/transcribe route that accepts an audio file or URL, validates up to 25MB and blocks localhost/private IPs, and calls TRANSCRIBE_AUDIO via the TRANSCRIPTION binding.
    • Uploaded blobs are converted to base64 server-side for direct transcription (no object storage needed).
    • Chat input gets a mic button with recording/transcribing states; sends recorded audio to the API and appends the transcribed text. Button is enabled only when a TRANSCRIPTION provider is available.
    • New useAudioRecorder hook and transcription binding types/schemas (supported formats: webm, mp3/mpeg, mp4/m4a, wav, ogg, flac, video/webm).
  • Migration

    • Connect at least one provider implementing TRANSCRIPTION; the mic button appears only when available.
    • Ensure browser mic permissions; recordings up to ~3 minutes are supported.

Written for commit 046cb4b. Summary will update on new commits.

- Introduced a new transcription API route to handle audio-to-text conversion.
- Implemented audio recording capabilities in the chat input component, allowing users to record and transcribe audio messages.
- Added hooks for audio recording management and binding detection for transcription and object storage.
- Updated the chat context to include binding availability for transcription services.
- Enhanced the UI to show recording options based on available bindings.
@github-actions
Copy link
Contributor

🧪 Benchmark

Should we run the MCP Gateway benchmark for this PR?

React with 👍 to run the benchmark.

Reaction Action
👍 Run quick benchmark (10 & 128 tools)

Benchmark will run on the next push after you react.

@github-actions
Copy link
Contributor

Release Options

Should a new version be published when this PR is merged?

React with an emoji to vote on the release type:

Reaction Type Next Version
👍 Prerelease 2.28.1-alpha.1
🎉 Patch 2.28.1
❤️ Minor 2.29.0
🚀 Major 3.0.0

Current version: 2.28.0

Deployment

  • Deploy to production (triggers ArgoCD sync after Docker image is published)

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 8 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="apps/mesh/src/web/hooks/use-audio-recorder.ts">

<violation number="1" location="apps/mesh/src/web/hooks/use-audio-recorder.ts:245">
P2: stopRecording gates on React state, which can be stale right after startRecording. This can cause stopRecording to return null and leave the MediaRecorder running. Check the recorder's state instead of `isRecording`.</violation>
</file>

<file name="packages/bindings/src/well-known/transcription.ts">

<violation number="1" location="packages/bindings/src/well-known/transcription.ts:37">
P2: TranscriptionInputSchema allows an empty object (both `audio` and `audioUrl` are optional), so callers can submit no audio source. Enforce that at least one of `audio` or `audioUrl` is provided to prevent invalid requests from passing validation.</violation>
</file>

<file name="apps/mesh/src/api/routes/transcribe.ts">

<violation number="1" location="apps/mesh/src/api/routes/transcribe.ts:317">
P1: Validate `audioUrl` (scheme/host) before passing it to the transcription service to avoid SSRF or non-HTTP URLs being processed.</violation>
</file>

<file name="apps/mesh/src/web/components/chat/input.tsx">

<violation number="1" location="apps/mesh/src/web/components/chat/input.tsx:286">
P2: The stop flow leaves the button enabled until after stopRecording resolves. A second click during this window can call stopRecording twice and overwrite the stored resolver, leaving the first await unresolved. Set the “transcribing/stopping” state before awaiting stopRecording (or disable the button while stopping) to prevent duplicate stop calls.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

…essages

- Renamed and consolidated functions for finding connections with specific bindings to enhance code clarity and reusability.
- Updated error messages in the ChatInput component to provide clearer feedback to users regarding audio recording and transcription failures.
- Improved UI text for better user experience during audio recording and transcription processes.
- Added a new function to validate audio URLs, ensuring only HTTP/HTTPS URLs with public hosts are accepted.
- Updated the transcription API route to validate the audioUrl parameter before processing.
- Enhanced the TranscriptionInputSchema to enforce the requirement of either 'audio' or 'audioUrl' for transcription requests.
- Improved the audio recorder hook to check the actual state of the media recorder before stopping it.
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="apps/mesh/src/api/routes/transcribe.ts">

<violation number="1" location="apps/mesh/src/api/routes/transcribe.ts:50">
P1: SSRF validation only checks hostname strings and misses DNS rebinding/private IP resolution. A public hostname that resolves to a private/link-local IP will pass validation and still allow SSRF.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

- Added a function to check if an IP address is private, improving the validation of audio URLs.
- Updated the validateAudioUrl function to resolve DNS and ensure that URLs do not resolve to private or internal IP addresses.
- Modified the transcription API route to await the validation of audioUrl, ensuring proper error handling for invalid URLs.
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="apps/mesh/src/api/routes/transcribe.ts">

<violation number="1" location="apps/mesh/src/api/routes/transcribe.ts:51">
P1: Extend the IPv6 checks to reject IPv4‑mapped IPv6 addresses (e.g., `::ffff:127.0.0.1`). Otherwise an attacker can bypass the SSRF filter by using IPv4‑mapped private IPs.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

viniciusventura29 and others added 8 commits January 22, 2026 12:05
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
- Updated the isPrivateIp function to safely handle undefined values when checking IPv4-mapped addresses, ensuring robust validation of IP addresses.
…L, removing object storage dependency. Update chat context to eliminate object storage binding checks.
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="apps/mesh/src/api/routes/transcribe.ts">

<violation number="1" location="apps/mesh/src/api/routes/transcribe.ts:249">
P2: `finalAudioUrl` is being set to a base64 data URL and then passed as `audioUrl`. The binding defines `audio` for base64 data and `audioUrl` for fetchable URLs, so this risks provider incompatibility for file uploads. Pass base64 via the `audio` field instead of `audioUrl` when using inline data.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

…create new document if empty. Improved handling of last paragraph content for seamless integration of transcriptions.
…string instead of data URL. Update related comments and error handling for improved clarity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant