Skip to content

[Feature] Self-reflection agent: periodic meta-evaluation and curated prompt/memory updates (Hermes-inspired) #477

@senamakel

Description

@senamakel

Summary

Add a self-reflection agent (or scheduled core routine) that periodically evaluates recent agent behavior, extracts durable lessons, and updates curated prompt and memory artifacts—similar in spirit to Hermes Agent (Nous Research), which uses a closed loop of task execution, self-evaluation checkpoints, and writes to skill/memory markdown so behavior improves across sessions without model weight updates.

User-visible outcome: the assistant steadily adapts phrasing, tool-usage patterns, and stored preferences within safe bounds, with explicit guardrails and optional user review.

Problem

Today OpenHuman’s agent personality and domain context live in static prompt bundles under src/openhuman/agent/prompts/ and structured memory (e.g. Neocortex / recall paths). There is no first-class meta-loop that:

  • Reflects on what worked or failed after tool runs or conversations
  • Consolidates repeated corrections (“always cite sources,” “prefer short answers”) into durable files
  • Aligns long-horizon behavior with user and org norms without manual prompt edits every time

Power users and teams hit the same friction repeatedly; product iteration on prompts is manual and decoupled from runtime evidence.

Solution (optional)

Inspiration (Hermes-style):

  • Checkpoint cadence — After N tool calls, time window, or explicit “reflect” RPC, run a reflection pass (separate system prompt or small model call) that outputs structured notes: successes, failures, user corrections, proposed memory/prompt deltas.
  • Write targets — Allow updates only to whitelisted paths (e.g. user-scoped memory, optional USER.md-like overlays, not unchecked core IDENTITY.md unless feature-flagged and reviewed).
  • Layering — Keep shipped defaults immutable; persist learnings in workspace or user overlay files merged at load time (similar to how packaged vs remote config already exists for AI config).

OpenHuman-specific scope:

  • Core (openhuman) — Scheduler or hook after agent turns; JSON-RPC or internal job to run reflection; validation and atomic writes to allowed storage.
  • Prompt pipeline — Document merge order: base prompts + user reflection overlay + session memory.
  • Safety — Opt-in per workspace; diff/size limits; no secrets in reflection output; optional human approval before applying file patches; audit log of what changed and when.
  • Not in scope for v1 — Rewriting Rust code, changing model weights, or unbounded self-modification of the entire prompt tree.

Tradeoffs: Token cost for reflection passes; risk of prompt drift—mitigated by whitelisting, caps, and review.

Acceptance criteria

  • Reflection trigger — Configurable cadence (e.g. every N tool calls, daily, or manual) invokes a reflection routine with access to a bounded recent trace (redacted).
  • Structured output — Reflection produces machine-parseable proposals (e.g. memory inserts, prompt patch hints) validated against a schema.
  • Controlled writes — Only allowed stores/files are updated; core shipped prompts remain the default unless a deliberate overlay mechanism exists.
  • User control — Opt-in, disable, reset overlay, and inspect what changed (changelog or UI in app settings).
  • Safety — No logging or persisting secrets; reflection prompts and outputs redact tokens/credentials.
  • Docs — Short architecture note (reflection loop, merge order, Hermes-style analogy) in docs/ or AGENTS.md as appropriate.

Related

  • Hermes Agent (Nous Research): github.com/NousResearch/hermes-agent, docs.
  • OpenHuman prompts: src/openhuman/agent/prompts/, memory / recall pipeline, capability catalog src/openhuman/about_app/ if user-visible behavior changes.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions