Skip to content

feat: image ingestion via Vision LLM#32

Open
RaviTharuma wants to merge 1 commit intoAVIDS2:mainfrom
RaviTharuma:feature/memorix-ybj-image-ingestion
Open

feat: image ingestion via Vision LLM#32
RaviTharuma wants to merge 1 commit intoAVIDS2:mainfrom
RaviTharuma:feature/memorix-ybj-image-ingestion

Conversation

@RaviTharuma
Copy link
Copy Markdown
Contributor

Summary

Adds an memorix_ingest_image MCP tool that describes images using a Vision LLM and stores the description as a memorix observation.

What's included

  • src/multimodal/image-loader.ts — Vision LLM client with:
    • Base64 encoding of local images
    • Configurable model and prompt
    • Support for PNG, JPEG, GIF, WebP, SVG formats
    • Automatic MIME type detection
  • src/server.ts — MCP tool registration (memorix_ingest_image)
  • tests/multimodal/image-loader.test.ts — 7 tests covering:
    • Image description extraction and observation storage
    • Unsupported format rejection
    • Missing file handling
    • Custom prompt passthrough
    • API error propagation
    • Correct base64 encoding

Design decisions

  • Uses existing LLM provider (callLLM from src/llm/provider.ts) — no new API configuration needed.
  • Stores description as discovery observation type with image metadata in facts.
  • No new npm dependencies — uses native fs for file reading, existing LLM infrastructure for vision.

Tests

7 pass, 0 fail

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d2c62ae393

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +45 to +49
let baseUrl = getLLMBaseUrl('https://api.openai.com/v1').replace(/\/+$/, '');
if (!baseUrl.endsWith('/v1')) baseUrl += '/v1';
const model = getLLMModel('gpt-4o');

const response = await fetch(`${baseUrl}/chat/completions`, {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Route vision calls by configured LLM provider

isLLMEnabled() allows this path when the runtime is configured for non-OpenAI providers, but callVisionLLM hardcodes OpenAI defaults (https://api.openai.com/v1, gpt-4o) and always posts to /chat/completions. In environments that auto-detect anthropic from ANTHROPIC_API_KEY, image ingestion will consistently fail against the wrong endpoint/model even though LLM mode is reported as enabled. This should use provider-aware config/routing (or the existing provider abstraction) so the request format matches the active provider.

Useful? React with 👍 / 👎.

@AVIDS2
Copy link
Copy Markdown
Owner

AVIDS2 commented Mar 30, 2026

Useful feature direction, but I see two blockers before merge.

  1. The PR currently fails CI because the test file is written against bun:test, while this repo runs vitest.
  2. The implementation assumes an OpenAI-compatible /chat/completions Vision endpoint for whatever getLLMBaseUrl() / getLLMModel() return. That is not safe for the current Memorix config model: a valid installation can be configured for anthropic, and Anthropic is handled through a different API shape elsewhere in the codebase. So right now this tool can be configured according to Memorix but still call the wrong protocol at runtime.

I think the implementation needs to either:

  • go through the existing LLM abstraction in a provider-safe way, or
  • explicitly constrain itself to OpenAI-compatible providers and validate that up front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants