Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 48 additions & 3 deletions evolve_server/engines/EVOLVE_AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,15 @@ workspace/
which skills (if any) were referenced.
3. **Decide** what actions to take for each skill or pattern.
4. **Execute** by writing new or updated skill bundles in `skills/`.
5. **Self-validate** every changed skill before finalizing it; if validation
fails, continue editing or revert the change.

Work through these steps autonomously. Use your file-reading and writing
tools to inspect session data and produce skill bundles.

**File access boundary**: All your file operations MUST stay within this
workspace directory. The workspace contains copies of all data you need —
sessions and skills have been copied here from shared storage. Do NOT read
or write files outside the workspace. The server will collect your changes
sessions and skills have been copied here from shared storage. Do NOT read or write files outside the workspace. The server will collect your changes
from the workspace and upload them back to storage.

---
Expand Down Expand Up @@ -179,7 +180,49 @@ category: general
<Markdown body with practical guidance>
```

## Step 5: Maintain Skill History
## Step 5: Self-validation before finalizing

Before you consider any new or changed skill complete, validate it inside the
center harness workspace. This is an internal publication gate for Agentic
Evolver: do not leave a skill change in `skills/` unless it has passed your
self-validation, or unless you intentionally revert the change because it
cannot be validated.

For every skill you create or modify:

1. Define 1-3 small validation scenarios from the current session evidence
and the skill's trigger conditions. Prefer cases that would have caught the
observed failure or confirmed the observed success pattern.
2. Run static checks:
- `SKILL.md` has valid frontmatter with a clear `name` and `description`.
- Trigger conditions and any `NOT for:` boundaries did not become overly
broad.
- Relative references to `references/`, `scripts/`, or `assets/` exist.
- Key environment facts supported by evidence, such as API endpoints,
ports, filenames, command formats, and payload shapes, were preserved
unless the evidence clearly justified changing them.
3. Run the smallest safe smoke test when possible. Examples: a helper script
`--help`, a dry-run command, a fixture input, or a minimal command copied
from the skill. Keep all commands within the workspace directory and do NOT
require external credentials or destructive side effects.
4. If no runnable command exists, perform an evidence-based static simulation:
explain how a future agent would use the revised skill on one representative
session and what correct next steps it should infer.
5. If validation fails, continue editing the skill and re-run validation. Do
not finalize a known-failing change. If you cannot make it pass, revert that
skill change or choose `skip`.

Record the validation in the paired history evidence file,
`history/v<N>_evidence.md` for existing skills or `history/v0_evidence.md` for
new skills. Include a `## Self-validation before finalizing` section with:

- validation scenarios
- checks or commands run
- pass/fail result
- fixes made after any failed check
- limitations when only static validation was possible

## Step 6: Maintain Skill History

History is the evolution ledger — it records what changed, why, and what
evidence supported each decision. **Every action (create, improve,
Expand Down Expand Up @@ -352,6 +395,8 @@ instructions like "go inspect the source code".
action on that skill, if that history directory exists. This is
mandatory, not optional.
- ALWAYS save the old version and evidence before making changes.
- ALWAYS complete center harness self-validation before finalizing a changed
skill, and record the result in the paired `history/v<N>_evidence.md` file.
Comment on lines +398 to +399
- ALWAYS use version-based history filenames (`v<N>.md`,
`v<N>_evidence.md`); never use date-based filenames.
- Do NOT modify files in `sessions/` — they are read-only input.
Expand Down
4 changes: 4 additions & 0 deletions evolve_server/engines/agent_workspace.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,10 @@
- You may inspect and edit `SKILL.md`, `references/`, `scripts/`, `assets/`,
`history/`, and other supporting files that belong to a skill.
- If there are no actionable patterns, make no changes — that is fine.
- Before finalizing any changed skill, complete the self-validation required
by `EVOLVE_AGENTS.md`; if validation fails, keep editing or revert the
change rather than leaving a known-failing skill in `skills/`.
- Record self-validation results in the paired `history/v<N>_evidence.md` file.

## Memory

Expand Down
22 changes: 22 additions & 0 deletions tests/test_agent_evolver_self_validation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from pathlib import Path

from evolve_server.engines import agent_workspace


def test_evolve_agents_md_requires_center_harness_self_validation():
text = Path("evolve_server/engines/EVOLVE_AGENTS.md").read_text(encoding="utf-8")

assert "Self-validation before finalizing" in text
assert "If validation fails" in text
assert "continue editing" in text
assert "history/v<N>_evidence.md" in text
assert "workspace directory" in text
assert "Do NOT read or write files outside the workspace" in text


def test_agent_workspace_bootstrap_mentions_self_validation():
text = agent_workspace._EVOLVE_AGENTS_MD

assert "self-validation" in text
assert "EVOLVE_AGENTS.md" in text
assert "Before finalizing" in text
Comment on lines +18 to +22
Loading