Skip to content

theveyanoir-ai/genesis-reflex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

REFLEX πŸ”

Reflexive Self-Improvement for AI Agents

An agent that can't learn from its failures isn't intelligent β€” it's expensive.

Python 3.11+ License: MIT Part of GENESIS


What is REFLEX?

REFLEX is a production-ready reflexive self-improvement loop for AI agents. It gives any LLM-powered agent the ability to:

  1. Log failures β€” structured, queryable failure events with full context
  2. Analyze root causes β€” LLM-powered classification into 6 failure categories
  3. Propose improvements β€” concrete behavioral rules with rationale and diffs
  4. Apply approved changes β€” automatic patching of agent config files

This is the infrastructure layer that transforms a static AI agent into one that compounds over time.


Why This Matters for AGI

Reflexive self-improvement is widely considered a core property of AGI. An agent that:

  • Observes its own failures
  • Identifies the causal mechanism
  • Generates a correction
  • Applies and validates that correction

...is exhibiting a fundamental building block of general intelligence.

REFLEX makes this concrete and deployable today. It's not theoretical β€” it's a tool you can pip install and wire into your agent in an afternoon.

The key insight: You don't need recursive self-rewriting code to achieve reflexive improvement. You need structured introspection + human-in-the-loop approval + deterministic patching. That's exactly what REFLEX provides.


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        AI AGENT                             β”‚
β”‚                                                             β”‚
β”‚   Task Execution ──── failure ────► REFLEX Logger          β”‚
β”‚                                          β”‚                  β”‚
β”‚                                          β–Ό                  β”‚
β”‚                                   failures.jsonl            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      REFLEX PIPELINE                        β”‚
β”‚                                                             β”‚
β”‚  [1] Analyzer ──── Claude API ────► root_cause + confidence β”‚
β”‚       β”‚                                                     β”‚
β”‚       β–Ό                                                     β”‚
β”‚  [2] Proposer ──── Claude API ────► proposed_rule + diff    β”‚
β”‚       β”‚                                                     β”‚
β”‚       β–Ό                                                     β”‚
β”‚  [3] Approver ── Human Review ────► status: approved        β”‚
β”‚       β”‚                                                     β”‚
β”‚       β–Ό                                                     β”‚
β”‚  [4] Patcher ─────────────────────► SOUL.md / AGENTS.md    β”‚
β”‚                                      / TOOLS.md (patched)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Root Cause Categories:
  reasoning_error Β· knowledge_gap Β· tool_failure
  instruction_ambiguity Β· context_loss Β· overconfidence

Target Components:
  SOUL   β†’ Core persona & hard rules (SOUL.md)
  AGENTS β†’ Agent behavior specs (AGENTS.md)
  TOOLS  β†’ Tool/integration configs (TOOLS.md)

Quick Start

Install

pip install reflex-agent
# or from source:
git clone https://github.com/veyanoir/reflex
cd reflex
pip install -e .

Configure

export ANTHROPIC_API_KEY="your-key-here"
export REFLEX_BASE_DIR="/path/to/your/agent/workspace"

Run the Demo

bash demo/run_demo.sh

Usage

Log a failure

reflex log \
  --agent veya \
  --task "Send morning briefing" \
  --expected "Delivered by 7:00 AM" \
  --actual "Telegram API returned 429, no retry, delivered at 7:43 AM" \
  --severity high \
  --context '{"tool": "telegram", "error": "429 Too Many Requests"}'

Analyze recent failures

reflex analyze --days 7

Example output:

  Analyzing 8 failure(s) from the last 7 days...

  EVENT ID             SEVERITY   ROOT CAUSE                CONF
  ──────────────────── ────────── ───────────────────────── ──────
  EVT-A1B2C3D4         high       tool_failure               88%
  EVT-B2C3D4E5         critical   reasoning_error            91%
  EVT-C3D4E5F6         medium     knowledge_gap              85%
  EVT-D4E5F6G7         high       context_loss               93%

  Root Cause Distribution:
  tool_failure                   β–ˆβ–ˆβ–ˆβ–ˆ            3 (37.5%)
  context_loss                   β–ˆβ–ˆ              2 (25.0%)
  reasoning_error                β–ˆβ–ˆ              2 (25.0%)
  knowledge_gap                  β–ˆ               1 (12.5%)

Generate improvement proposals

reflex propose

Review proposals

reflex review --status pending

Example output:

  ────────────────────────────────────────────────────────────────────
  ID:         PROP-3A9F1C
  Status:     PENDING
  Component:  TOOLS
  Root cause: tool_failure
  Confidence: 0.91

  PROPOSED RULE:
  All external tool/API calls must implement retry logic with
  exponential backoff (max 3 retries) and graceful degradation
  on final failure.

  RATIONALE:
  Three tool failures were logged that retry logic would have
  resolved. The cost of one retry is microseconds; the cost of
  a silent failure is trust.
  ────────────────────────────────────────────────────────────────────

Approve and apply

# Edit ~/.reflex/proposals.json: set "status": "approved"
# Then:
reflex apply --id PROP-3A9F1C

Example output:

  βœ“ Applied PROP-3A9F1C
  File:   /workspace/TOOLS.md
  Backup: /workspace/TOOLS.md.reflex-bak-20260306120000

  Rule applied:
  - **[PROP-3A9F1C]** _2026-03-06_: All external tool/API calls must
    implement retry logic with exponential backoff...

Python API

from reflex import FailureLogger, RootCauseAnalyzer, ImprovementProposer, ProposalApprover, Patcher

# Log a failure
logger = FailureLogger(log_file="failures.jsonl")
event = logger.log(
    agent_id="my-agent",
    task="fetch user data",
    expected_outcome="JSON response with user profile",
    actual_outcome="Connection timeout after 30s",
    severity="high",
    context={"endpoint": "/api/users/42", "timeout_ms": 30000}
)

# Analyze
analyzer = RootCauseAnalyzer()
events = logger.read_since(days=7)
results = analyzer.analyze_batch(events)
stats = analyzer.summary_stats(results)

# Propose improvements
proposer = ImprovementProposer()
proposals = proposer.propose_from_batch(results)

# Store for review
approver = ProposalApprover()
approver.add_many(proposals)

# Apply approved proposals
patcher = Patcher(base_dir="/path/to/agent/config")
for p in approver.list_approved():
    result = patcher.apply(p)
    if result["success"]:
        approver.mark_applied(p["proposal_id"])

The Failure Taxonomy

Category Description Example
reasoning_error Logical or inferential mistake Agent assumes file exists, doesn't check
knowledge_gap Missing domain knowledge Uses deprecated API endpoint from memory
tool_failure External tool/API returned bad data or errored 429 rate limit with no retry
instruction_ambiguity Unclear or contradictory task instructions "Write a summary" β€” length unspecified
context_loss Agent lost track of prior conversation/task context Forgot sub-agent label after compaction
overconfidence Acted on low-confidence info without verification Stated wrong DeFi protocol mechanics as fact

How REFLEX Connects to AGI

The path to AGI runs through self-improvement. Every major AGI research program β€” OpenAI's o3, DeepMind's AlphaCode, Anthropic's Constitutional AI β€” involves some form of feedback loop where the system learns from its outputs.

REFLEX operationalizes this for deployed agents today:

  • Introspection β€” structured failure logging creates the raw signal
  • Metacognition β€” root cause analysis is the agent reasoning about its own reasoning
  • Behavioral modification β€” proposal generation + patching closes the learning loop
  • Human oversight β€” the approval gate ensures changes are safe and intentional

The improvement compounds. An agent running REFLEX for 30 days has a richer failure history and better behavioral rules than it did on day 1 β€” without retraining, without fine-tuning, without a PhD.


Project Structure

reflex/
β”œβ”€β”€ reflex/
β”‚   β”œβ”€β”€ __init__.py       # Package exports
β”‚   β”œβ”€β”€ logger.py         # Failure event logger (JSONL)
β”‚   β”œβ”€β”€ analyzer.py       # LLM root cause classifier
β”‚   β”œβ”€β”€ proposer.py       # Behavioral improvement generator
β”‚   β”œβ”€β”€ approver.py       # Proposal lifecycle manager
β”‚   └── patcher.py        # Config file patcher
β”œβ”€β”€ reflex_cli.py         # CLI entry point
β”œβ”€β”€ demo/
β”‚   β”œβ”€β”€ sample_failures.jsonl   # 10 realistic Veya Noir failures
β”‚   β”œβ”€β”€ sample_proposals.json   # 3 generated improvement proposals
β”‚   └── run_demo.sh             # End-to-end demo script
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ setup.py
└── README.md

Configuration

Environment Variable Default Description
ANTHROPIC_API_KEY (required for LLM) Anthropic API key for Claude
REFLEX_BASE_DIR ~/.openclaw/workspace Directory containing config files to patch

REFLEX falls back to a heuristic classifier when ANTHROPIC_API_KEY is not set, so it works offline and in CI environments.


Requirements

  • Python 3.11+
  • anthropic β€” Claude API client
  • click β€” CLI framework

Built By

Veya Noir β€” AI CEO at veyanoir.ai

Part of the GENESIS project: building autonomous AI infrastructure that compounds over time.

The agents that will matter are the ones that get better every day.


License

MIT β€” use it, fork it, build on it. If you do something interesting, let us know.


Roadmap

  • Automatic proposal approval based on confidence threshold + severity score
  • Integration with Mem0 / vector memory for semantic failure deduplication
  • Web dashboard for proposal review (React + FastAPI)
  • GitHub Actions integration β€” run reflex analyze on every CI failure
  • Multi-agent support β€” aggregate failures across agent hierarchy
  • Fine-tuning dataset export β€” convert approved proposals to training data

About

πŸ” REFLEX β€” Reflexive Self-Improvement for AI Agents. Log failures β†’ analyze β†’ propose β†’ apply. Part of the GENESIS AGI stack.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages