feat: audio ingestion via Whisper API by RaviTharuma · Pull Request #31 · AVIDS2/memorix

RaviTharuma · 2026-03-29T22:26:59Z

Summary

Adds an memorix_ingest_audio MCP tool that transcribes audio files using the Whisper API and stores the transcript as a memorix observation.

What's included

src/multimodal/audio-loader.ts — Whisper API client with:
- Configurable model (MEMORIX_WHISPER_MODEL, default: whisper-1)
- Language hint and prompt support
- Chunked file reading for large files
- Automatic format detection from extension (mp3, wav, m4a, ogg, flac, webm, mp4)
src/multimodal/index.ts — shared multimodal utilities (MIME detection, file validation)
src/server.ts — MCP tool registration (memorix_ingest_audio)
tests/multimodal/audio-loader.test.ts — 8 tests covering:
- Transcript extraction and observation storage
- Unsupported format rejection
- Missing file handling
- Language and prompt passthrough
- API error propagation

Design decisions

LLM-optional: requires MEMORIX_WHISPER_API_KEY (falls back to OPENAI_API_KEY). Tool returns clear error if unconfigured.
Stores transcript as how-it-works observation type with audio metadata in facts.
No new npm dependencies — uses native fetch for API calls and fs for file reading.

Tests

8 pass, 0 fail

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e1369a556

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-29T22:31:31Z

src/server.ts

+        markInternalWrite();
+        const result = await ingestAudio(


Move internal-write marker to the actual write point

markInternalWrite() is called before ingestAudio(), but audio ingestion performs a remote transcription that can run up to 120 seconds; the hot-reload skip window is only 10 seconds, so long transcriptions can expire the skip window before storeObservation writes. In that case the file watcher will treat this internal write as external and trigger unnecessary reload/reindex work, which can cause avoidable contention during ingest-heavy workflows.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-29T22:31:31Z

tests/multimodal/audio-loader.test.ts

@@ -0,0 +1,173 @@
+import { describe, it, expect, afterEach, beforeEach } from 'bun:test';


Use Vitest APIs in the new test file

The repository’s test command runs Vitest and includes tests/**/*.test.ts, but this new test imports from bun:test. Under the declared test runner, that import is unresolved, so CI/local npm test will fail before executing these assertions. Switching this file to Vitest imports (or globals) is necessary to keep the test suite runnable.

Useful? React with 👍 / 👎.

AVIDS2 · 2026-03-30T13:36:19Z

Thanks for pushing on multimodal ingestion — the capability is useful, but this one still has a couple of concrete blockers.

The test file uses bun:test, while this repo runs vitest in CI. Right now that means the PR fails immediately in the standard test matrix.
In src/server.ts, the object passed into storeObservation is inferred as type: string, but storeObservation expects a real ObservationType.
More importantly, the config path is too loose for the implementation: transcribeAudio() pulls the generic Memorix LLM API key via getLLMApiKey(), but the code then talks specifically to OpenAI/Groq Whisper transcription endpoints. In a valid Memorix setup that is configured for Anthropic or another non-Whisper provider, this can end up sending the wrong credential/provider combination to an incompatible API.

So I don't think this is just a test-fix PR — it needs one more pass on provider/config semantics as well.

feat: audio ingestion via Whisper API (memorix-ek1)

6e1369a

chatgpt-codex-connector bot reviewed Mar 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: audio ingestion via Whisper API#31

feat: audio ingestion via Whisper API#31
RaviTharuma wants to merge 1 commit intoAVIDS2:mainfrom
RaviTharuma:feature/memorix-ek1-audio-ingestion

RaviTharuma commented Mar 29, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Uh oh!

AVIDS2 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,173 @@
		import { describe, it, expect, afterEach, beforeEach } from 'bun:test';

Conversation

RaviTharuma commented Mar 29, 2026

Summary

What's included

Design decisions

Tests

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

AVIDS2 commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants