Skip to content

[Refactor] Separate CV agent from SAM server with tool-based image analysis #7

@pskeshu

Description

@pskeshu

Description

Refactor the SAM server to separate computer vision capabilities into a dedicated CV agent that can reason about image analysis tasks.

Current State

  • sam_server.py couples SAM segmentation with visualization
  • Detection pipeline: brightness → SAM → Claude verification
  • No autonomous CV reasoning

Requirements

1. CV Agent Architecture

  • Receives image + intent from orchestrator
  • Autonomously selects appropriate CV tools
  • Returns structured results with confidence

2. Tool Categories

  • Classical CV: thresholding, morphology, edge detection, blob analysis
  • SAM Tools: segmentation, mask refinement, point/box prompts
  • VLM Tools: Claude Vision queries, object detection prompts
  • Measurement: area, intensity, shape metrics

3. Agent Capabilities

  • Multi-step analysis (e.g., "find dim embryos" → enhance contrast → detect → filter by brightness)
  • Explain reasoning in results
  • Suggest alternative approaches if initial method fails

4. Clean SAM Server

  • Pure segmentation service (stateless)
  • Remove visualization code
  • Simple API: image + prompt → masks

Technical Approach

  • Create gently/agent/cv_agent.py with tool-using Claude instance
  • Move classical CV to gently/cv/ module
  • Simplify backend/sam_server.py to pure SAM inference
  • CV agent uses SAM server as one of its tools

Key Files

  • backend/sam_server.py
  • gently/agent/sam_detection.py
  • New: gently/agent/cv_agent.py
  • New: gently/cv/classical.py, gently/cv/measurement.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions