Goal
Allow the agent to consume and reason over images as first-class inputs.
Proposed Scope
- Add image ingestion, vision-capable model routing, and multimodal prompt/tool plumbing.
Acceptance Criteria
- Users can provide images and receive grounded responses/actions using visual context.
Target Date
Goal
Allow the agent to consume and reason over images as first-class inputs.
Proposed Scope
Acceptance Criteria
Target Date