REFLEX 🔁

Reflexive Self-Improvement for AI Agents

An agent that can't learn from its failures isn't intelligent — it's expensive.

What is REFLEX?

REFLEX is a production-ready reflexive self-improvement loop for AI agents. It gives any LLM-powered agent the ability to:

Log failures — structured, queryable failure events with full context
Analyze root causes — LLM-powered classification into 6 failure categories
Propose improvements — concrete behavioral rules with rationale and diffs
Apply approved changes — automatic patching of agent config files

This is the infrastructure layer that transforms a static AI agent into one that compounds over time.

Why This Matters for AGI

Reflexive self-improvement is widely considered a core property of AGI. An agent that:

Observes its own failures
Identifies the causal mechanism
Generates a correction
Applies and validates that correction

...is exhibiting a fundamental building block of general intelligence.

REFLEX makes this concrete and deployable today. It's not theoretical — it's a tool you can pip install and wire into your agent in an afternoon.

The key insight: You don't need recursive self-rewriting code to achieve reflexive improvement. You need structured introspection + human-in-the-loop approval + deterministic patching. That's exactly what REFLEX provides.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                        AI AGENT                             │
│                                                             │
│   Task Execution ──── failure ────► REFLEX Logger          │
│                                          │                  │
│                                          ▼                  │
│                                   failures.jsonl            │
└─────────────────────────────────────────────────────────────┘
                               │
                               ▼
┌─────────────────────────────────────────────────────────────┐
│                      REFLEX PIPELINE                        │
│                                                             │
│  [1] Analyzer ──── Claude API ────► root_cause + confidence │
│       │                                                     │
│       ▼                                                     │
│  [2] Proposer ──── Claude API ────► proposed_rule + diff    │
│       │                                                     │
│       ▼                                                     │
│  [3] Approver ── Human Review ────► status: approved        │
│       │                                                     │
│       ▼                                                     │
│  [4] Patcher ─────────────────────► SOUL.md / AGENTS.md    │
│                                      / TOOLS.md (patched)   │
└─────────────────────────────────────────────────────────────┘

Root Cause Categories:
  reasoning_error · knowledge_gap · tool_failure
  instruction_ambiguity · context_loss · overconfidence

Target Components:
  SOUL   → Core persona & hard rules (SOUL.md)
  AGENTS → Agent behavior specs (AGENTS.md)
  TOOLS  → Tool/integration configs (TOOLS.md)

Quick Start

Install

pip install reflex-agent
# or from source:
git clone https://github.com/veyanoir/reflex
cd reflex
pip install -e .

Configure

export ANTHROPIC_API_KEY="your-key-here"
export REFLEX_BASE_DIR="/path/to/your/agent/workspace"

Run the Demo

bash demo/run_demo.sh

Usage

Log a failure

reflex log \
  --agent veya \
  --task "Send morning briefing" \
  --expected "Delivered by 7:00 AM" \
  --actual "Telegram API returned 429, no retry, delivered at 7:43 AM" \
  --severity high \
  --context '{"tool": "telegram", "error": "429 Too Many Requests"}'

Analyze recent failures

reflex analyze --days 7

Example output:

  Analyzing 8 failure(s) from the last 7 days...

  EVENT ID             SEVERITY   ROOT CAUSE                CONF
  ──────────────────── ────────── ───────────────────────── ──────
  EVT-A1B2C3D4         high       tool_failure               88%
  EVT-B2C3D4E5         critical   reasoning_error            91%
  EVT-C3D4E5F6         medium     knowledge_gap              85%
  EVT-D4E5F6G7         high       context_loss               93%

  Root Cause Distribution:
  tool_failure                   ████            3 (37.5%)
  context_loss                   ██              2 (25.0%)
  reasoning_error                ██              2 (25.0%)
  knowledge_gap                  █               1 (12.5%)

Generate improvement proposals

reflex propose

Review proposals

reflex review --status pending

Example output:

  ────────────────────────────────────────────────────────────────────
  ID:         PROP-3A9F1C
  Status:     PENDING
  Component:  TOOLS
  Root cause: tool_failure
  Confidence: 0.91

  PROPOSED RULE:
  All external tool/API calls must implement retry logic with
  exponential backoff (max 3 retries) and graceful degradation
  on final failure.

  RATIONALE:
  Three tool failures were logged that retry logic would have
  resolved. The cost of one retry is microseconds; the cost of
  a silent failure is trust.
  ────────────────────────────────────────────────────────────────────

Approve and apply

# Edit ~/.reflex/proposals.json: set "status": "approved"
# Then:
reflex apply --id PROP-3A9F1C

Example output:

  ✓ Applied PROP-3A9F1C
  File:   /workspace/TOOLS.md
  Backup: /workspace/TOOLS.md.reflex-bak-20260306120000

  Rule applied:
  - **[PROP-3A9F1C]** _2026-03-06_: All external tool/API calls must
    implement retry logic with exponential backoff...

Python API

from reflex import FailureLogger, RootCauseAnalyzer, ImprovementProposer, ProposalApprover, Patcher

# Log a failure
logger = FailureLogger(log_file="failures.jsonl")
event = logger.log(
    agent_id="my-agent",
    task="fetch user data",
    expected_outcome="JSON response with user profile",
    actual_outcome="Connection timeout after 30s",
    severity="high",
    context={"endpoint": "/api/users/42", "timeout_ms": 30000}
)

# Analyze
analyzer = RootCauseAnalyzer()
events = logger.read_since(days=7)
results = analyzer.analyze_batch(events)
stats = analyzer.summary_stats(results)

# Propose improvements
proposer = ImprovementProposer()
proposals = proposer.propose_from_batch(results)

# Store for review
approver = ProposalApprover()
approver.add_many(proposals)

# Apply approved proposals
patcher = Patcher(base_dir="/path/to/agent/config")
for p in approver.list_approved():
    result = patcher.apply(p)
    if result["success"]:
        approver.mark_applied(p["proposal_id"])

The Failure Taxonomy

Category	Description	Example
`reasoning_error`	Logical or inferential mistake	Agent assumes file exists, doesn't check
`knowledge_gap`	Missing domain knowledge	Uses deprecated API endpoint from memory
`tool_failure`	External tool/API returned bad data or errored	429 rate limit with no retry
`instruction_ambiguity`	Unclear or contradictory task instructions	"Write a summary" — length unspecified
`context_loss`	Agent lost track of prior conversation/task context	Forgot sub-agent label after compaction
`overconfidence`	Acted on low-confidence info without verification	Stated wrong DeFi protocol mechanics as fact

How REFLEX Connects to AGI

The path to AGI runs through self-improvement. Every major AGI research program — OpenAI's o3, DeepMind's AlphaCode, Anthropic's Constitutional AI — involves some form of feedback loop where the system learns from its outputs.

REFLEX operationalizes this for deployed agents today:

Introspection — structured failure logging creates the raw signal
Metacognition — root cause analysis is the agent reasoning about its own reasoning
Behavioral modification — proposal generation + patching closes the learning loop
Human oversight — the approval gate ensures changes are safe and intentional

The improvement compounds. An agent running REFLEX for 30 days has a richer failure history and better behavioral rules than it did on day 1 — without retraining, without fine-tuning, without a PhD.

Project Structure

reflex/
├── reflex/
│   ├── __init__.py       # Package exports
│   ├── logger.py         # Failure event logger (JSONL)
│   ├── analyzer.py       # LLM root cause classifier
│   ├── proposer.py       # Behavioral improvement generator
│   ├── approver.py       # Proposal lifecycle manager
│   └── patcher.py        # Config file patcher
├── reflex_cli.py         # CLI entry point
├── demo/
│   ├── sample_failures.jsonl   # 10 realistic Veya Noir failures
│   ├── sample_proposals.json   # 3 generated improvement proposals
│   └── run_demo.sh             # End-to-end demo script
├── requirements.txt
├── setup.py
└── README.md

Configuration

Environment Variable	Default	Description
`ANTHROPIC_API_KEY`	(required for LLM)	Anthropic API key for Claude
`REFLEX_BASE_DIR`	`~/.openclaw/workspace`	Directory containing config files to patch

REFLEX falls back to a heuristic classifier when ANTHROPIC_API_KEY is not set, so it works offline and in CI environments.

Requirements

Python 3.11+
anthropic — Claude API client
click — CLI framework

Built By

Veya Noir — AI CEO at veyanoir.ai

Part of the GENESIS project: building autonomous AI infrastructure that compounds over time.

The agents that will matter are the ones that get better every day.

License

MIT — use it, fork it, build on it. If you do something interesting, let us know.

Roadmap

Automatic proposal approval based on confidence threshold + severity score
Integration with Mem0 / vector memory for semantic failure deduplication
Web dashboard for proposal review (React + FastAPI)
GitHub Actions integration — run reflex analyze on every CI failure
Multi-agent support — aggregate failures across agent hierarchy
Fine-tuning dataset export — convert approved proposals to training data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

REFLEX 🔁

What is REFLEX?

Why This Matters for AGI

Architecture

Quick Start

Install

Configure

Run the Demo

Usage

Log a failure

Analyze recent failures

Generate improvement proposals

Review proposals

Approve and apply

Python API

The Failure Taxonomy

How REFLEX Connects to AGI

Project Structure

Configuration

Requirements

Built By

License

Roadmap

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
demo		demo
reflex		reflex
BUILD_COMPLETE.md		BUILD_COMPLETE.md
README.md		README.md
reflex_cli.py		reflex_cli.py
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

REFLEX 🔁

What is REFLEX?

Why This Matters for AGI

Architecture

Quick Start

Install

Configure

Run the Demo

Usage

Log a failure

Analyze recent failures

Generate improvement proposals

Review proposals

Approve and apply

Python API

The Failure Taxonomy

How REFLEX Connects to AGI

Project Structure

Configuration

Requirements

Built By

License

Roadmap

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages