feat: add audio transcription functionality#2268
feat: add audio transcription functionality#2268viniciusventura29 wants to merge 16 commits intomainfrom
Conversation
- Introduced a new transcription API route to handle audio-to-text conversion. - Implemented audio recording capabilities in the chat input component, allowing users to record and transcribe audio messages. - Added hooks for audio recording management and binding detection for transcription and object storage. - Updated the chat context to include binding availability for transcription services. - Enhanced the UI to show recording options based on available bindings.
🧪 BenchmarkShould we run the MCP Gateway benchmark for this PR? React with 👍 to run the benchmark.
Benchmark will run on the next push after you react. |
Release OptionsShould a new version be published when this PR is merged? React with an emoji to vote on the release type:
Current version: Deployment
|
…t-transcription
There was a problem hiding this comment.
4 issues found across 8 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="apps/mesh/src/web/hooks/use-audio-recorder.ts">
<violation number="1" location="apps/mesh/src/web/hooks/use-audio-recorder.ts:245">
P2: stopRecording gates on React state, which can be stale right after startRecording. This can cause stopRecording to return null and leave the MediaRecorder running. Check the recorder's state instead of `isRecording`.</violation>
</file>
<file name="packages/bindings/src/well-known/transcription.ts">
<violation number="1" location="packages/bindings/src/well-known/transcription.ts:37">
P2: TranscriptionInputSchema allows an empty object (both `audio` and `audioUrl` are optional), so callers can submit no audio source. Enforce that at least one of `audio` or `audioUrl` is provided to prevent invalid requests from passing validation.</violation>
</file>
<file name="apps/mesh/src/api/routes/transcribe.ts">
<violation number="1" location="apps/mesh/src/api/routes/transcribe.ts:317">
P1: Validate `audioUrl` (scheme/host) before passing it to the transcription service to avoid SSRF or non-HTTP URLs being processed.</violation>
</file>
<file name="apps/mesh/src/web/components/chat/input.tsx">
<violation number="1" location="apps/mesh/src/web/components/chat/input.tsx:286">
P2: The stop flow leaves the button enabled until after stopRecording resolves. A second click during this window can call stopRecording twice and overwrite the stored resolver, leaving the first await unresolved. Set the “transcribing/stopping” state before awaiting stopRecording (or disable the button while stopping) to prevent duplicate stop calls.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
…essages - Renamed and consolidated functions for finding connections with specific bindings to enhance code clarity and reusability. - Updated error messages in the ChatInput component to provide clearer feedback to users regarding audio recording and transcription failures. - Improved UI text for better user experience during audio recording and transcription processes.
- Added a new function to validate audio URLs, ensuring only HTTP/HTTPS URLs with public hosts are accepted. - Updated the transcription API route to validate the audioUrl parameter before processing. - Enhanced the TranscriptionInputSchema to enforce the requirement of either 'audio' or 'audioUrl' for transcription requests. - Improved the audio recorder hook to check the actual state of the media recorder before stopping it.
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="apps/mesh/src/api/routes/transcribe.ts">
<violation number="1" location="apps/mesh/src/api/routes/transcribe.ts:50">
P1: SSRF validation only checks hostname strings and misses DNS rebinding/private IP resolution. A public hostname that resolves to a private/link-local IP will pass validation and still allow SSRF.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
- Added a function to check if an IP address is private, improving the validation of audio URLs. - Updated the validateAudioUrl function to resolve DNS and ensure that URLs do not resolve to private or internal IP addresses. - Modified the transcription API route to await the validation of audioUrl, ensuring proper error handling for invalid URLs.
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="apps/mesh/src/api/routes/transcribe.ts">
<violation number="1" location="apps/mesh/src/api/routes/transcribe.ts:51">
P1: Extend the IPv6 checks to reject IPv4‑mapped IPv6 addresses (e.g., `::ffff:127.0.0.1`). Otherwise an attacker can bypass the SSRF filter by using IPv4‑mapped private IPs.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
…t-transcription
…admin into feat/chat-transcription
- Updated the isPrivateIp function to safely handle undefined values when checking IPv4-mapped addresses, ensuring robust validation of IP addresses.
…t-transcription
…t-transcription
…e preparation logic
…L, removing object storage dependency. Update chat context to eliminate object storage binding checks.
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="apps/mesh/src/api/routes/transcribe.ts">
<violation number="1" location="apps/mesh/src/api/routes/transcribe.ts:249">
P2: `finalAudioUrl` is being set to a base64 data URL and then passed as `audioUrl`. The binding defines `audio` for base64 data and `audioUrl` for fetchable URLs, so this risks provider incompatibility for file uploads. Pass base64 via the `audio` field instead of `audioUrl` when using inline data.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
…create new document if empty. Improved handling of last paragraph content for seamless integration of transcriptions.
…string instead of data URL. Update related comments and error handling for improved clarity.
…ok to eliminate unnecessary dependency.
What is this contribution about?
Adds voice input capabilities to the chat interface with automatic speech-to-text transcription.
Changes
New API endpoint
New bindings
Frontend
How it works
1 - User records audio in the browser
2 - Audio is uploaded to Object Storage (temp file)
3 - Transcription service processes the audio URL
4 - Temp file is cleaned up automatically
5 - Transcribed text is returned to the chat
Requirements
Screenshots/Demonstration
https://www.loom.com/share/2299fc0160364c6ead724c8d8925d04d
Review Checklist
Summary by cubic
Adds voice input to chat with an audio recorder and a new transcription API. Audio is sent as a blob or public URL, transcribed via a TRANSCRIPTION binding, and the text is inserted into the chat input.
New Features
Migration
Written for commit 046cb4b. Summary will update on new commits.