Integrate Anthropic harness-engineering observations into the methodology by EsatanGW · Pull Request #16 · EsatanGW/agent-protocol

EsatanGW · 2026-04-30T12:11:42Z

Summary

Closes seven canonical-methodology gaps surfaced by cross-referencing Anthropic's Harness Design for Long-Running Agentic Applications (Mar 2026) and the existing OpenAI Harness engineering: leveraging Codex in an agent-first world (Feb 2026) against this repo.

All seven entries land in [Unreleased]; no version bump. +331 / −1 across 13 files (1 new file).

What changed (seven gaps closed)

Critical — P1 / P2 / P3

	Where	What
P1	`docs/ai-operating-contract.md §12`	New Context anxiety failure mode — premature task-shrink under perceived context pressure, parallel in structure to §11 verbal-completion illusion. Explicit prefer Context Reset over Compaction remedy. Glossary entry + cross-refs in `resumption-protocol.md §Step 2b` (outgoing-session symmetric rule) and `ai-project-memory.md §Pre-compression protection list`.
P2	`docs/multi-agent-handoff.md §Acceptance criteria as a Sprint Contract`	Planner-side time-axis discipline catching unverifiable AC at write-time, not at Implementer's egress self-check. Two named rules — Reviewer-anticipation (Planner imagines Reviewer's audit) and Reverse-shape (AC text reverse-shapes Implementer choices). Three pre-handoff self-check questions.
P3	`docs/harness-evolution-discipline.md` (new file) + `mode-decision-tree.md` row + index registrations	Per-material-model-release re-evaluation of canonical methodology components whose load-bearing-ness depends on a specific model-class failure mode. Sibling to `anti-entropy-discipline.md` (which excludes canonical methodology by design). Four-step procedure (map / re-test / classify / record). Companion `mode-decision-tree.md` row makes sweep-backed canonical retirement Lean-eligible — first canonical-component weight-shedding path.

Enrichments — 1d / 1e / B2 / 1f

	Where	What
1d	`multi-agent-handoff.md §Single-agent anti-collusion rule §Why this rule exists`	Preamble naming self-evaluation bias as the underlying behavioural failure the structural rule enforces against. Explicit "this is not prompt-engineerable away" with the Tool-permission matrix's no-write row as the load-bearing form.
1e	folded into P2 §Reverse-shape rule	AC text reverse-shapes Implementer choices (parallel to `agent-persona-discipline.md`'s observation that medium reverse-shapes persona).
B2	`multi-agent-handoff.md §Capability gating by risk level`	"Risk is one axis; capability frontier is another" callout. Risk-axis is encoded mechanical enforcement boundary; capability-frontier is the Planner-judgement signal alongside it. Matrix's gating column is a floor, not a ceiling.
1f	`mechanical-enforcement-discipline.md §Boundary with non-mechanical evaluation`	First canonical three-evaluator comparison: Mechanical / Application-driven / Agentic Reviewer audit, with layering (floor / bridge / ceiling) and a Planner-side allocation rule at Phase 3 (each AC to the cheapest evaluator that catches its failure shape). New anti-pattern: routing by familiarity rather than by failure shape. Companion: `multi-agent-handoff.md §Reviewer §Must not do` gains a row pointing back.

Discipline

Tool-agnostic. No vendor / model names in normative content (per CLAUDE.md §2).
No schema impact. No new manifest fields, no new evidence enums, no new role definitions.
SoT-consistent. All seven entries cite existing fields and rules; new normative content stays inside the file-role-map.md SoT discipline. New file registered in docs/README.md Tier-3 and docs/file-role-map.md.
Forced Full mode per CLAUDE.md §Mode implication (canonical methodology edits at L1+).

Test plan

Local CI validators (mirrors of .github/workflows/validate.yml jobs):

Out of scope (deferred)

Two lower-priority items from the original cross-reference were deliberately not addressed:

2b (feature-at-a-time sprint cadence) — Anthropic's sprint construct is time-bounded; the repo's design philosophy is scope-driven (surfaces, clusters, modes). Adding time-axis cadence would create tension with existing decomposition philosophy without proportional value.
2e (Sprint Contract as persistent artifact) — Task Prompt is intentionally non-persisted (multi-agent-handoff.md §Task Prompt structure); persisting it would split the SoT against the Manifest-as-state-snapshot discipline. Already a "design choice, not gap" in the original analysis.

🤖 Generated with Claude Code

…logy Closes seven canonical-methodology gaps surfaced by cross-referencing Anthropic's "Harness Design for Long-Running Agentic Applications" (Mar 2026) against the existing repo. All seven entries land in [Unreleased]; no version bump. Critical additions (P1 / P2 / P3): - §12 Context anxiety in docs/ai-operating-contract.md: names the intra-session premature-task-shrink failure, parallel in structure to §11 verbal-completion illusion. Distinguished from §4 (cross-session fact loss) and §3 (evidence-before-completion). Explicit "prefer Context Reset over Compaction" remedy. Glossary entry + cross-refs in resumption-protocol.md §Step 2b (outgoing-session symmetric rule) and ai-project-memory.md §Pre-compression protection list (rescue list is not the trigger condition). - §Acceptance criteria as a Sprint Contract in docs/multi-agent-handoff.md: Planner-side time-axis discipline that catches unverifiable AC at write-time, not at Implementer's egress self-check. Two named rules — Reviewer-anticipation (Planner imagines Reviewer's audit) and Reverse-shape (AC text reverse-shapes Implementer choices). Three pre-handoff self-check questions. Catches the Anthropic-style sprint-contract concern without breaking role-separation. - docs/harness-evolution-discipline.md (new file): per-material-model- release re-evaluation of canonical methodology components whose load- bearing-ness depends on a specific model-class failure mode. Sibling to anti-entropy-discipline.md (which excludes canonical methodology by design). Four-step procedure (map / re-test / classify / record). Companion mode-decision-tree.md row makes sweep-backed canonical retirement Lean-eligible (symmetric to existing project-local row), giving the methodology its first canonical-component weight-shedding path. Registered in docs/README.md Tier-3 + docs/file-role-map.md. Enrichments (1d / 1e / B2 / 1f): - §Single-agent anti-collusion rule §Why this rule exists: preamble naming the underlying behavioural failure (self-evaluation bias) the structural rule enforces against. Explicit "this is not prompt- engineerable away" with the Tool-permission matrix's no-write row as the load-bearing form. - §Capability gating by risk level: "Risk is one axis; capability frontier is another" callout. Risk-axis is the encoded mechanical enforcement boundary; capability-frontier is the Planner-judgement signal alongside it. Matrix's gating column is a floor, not a ceiling. - §Boundary with non-mechanical evaluation in docs/mechanical-enforcement-discipline.md: first canonical three-evaluator comparison (Mechanical / Application-driven / Agentic Reviewer audit) with layering (floor / bridge / ceiling) and a Planner-side allocation rule at Phase 3 (each AC to the cheapest evaluator that catches its failure shape). New anti-pattern: routing by familiarity rather than by failure shape. Companion edit: multi-agent-handoff.md §Reviewer §Must not do gains a "spend audit attention on what a mechanical check should have caught" row. Discipline: - No vendor / model names in normative content (model-agnostic). - No new schema fields, no new manifest enums, no new role definitions. - All seven entries cite existing fields and rules; new normative content stays inside the file-role-map.md SoT discipline. - Local CI validators all pass (internal-links, legacy-terms, summary-drift, role-consistency, schema-syntax, changelog-drift, template-conformance, cluster-disjointness, schema-drift). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

EsatanGW merged commit 6478dfe into main Apr 30, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Anthropic harness-engineering observations into the methodology#16

Integrate Anthropic harness-engineering observations into the methodology#16
EsatanGW merged 1 commit intomainfrom
claude/anthropic-harness-engineering-integration

EsatanGW commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EsatanGW commented Apr 30, 2026

Summary

What changed (seven gaps closed)

Critical — P1 / P2 / P3

Enrichments — 1d / 1e / B2 / 1f

Discipline

Test plan

Out of scope (deferred)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant