Skip to content

Integrate Anthropic harness-engineering observations into the methodology#16

Merged
EsatanGW merged 1 commit intomainfrom
claude/anthropic-harness-engineering-integration
Apr 30, 2026
Merged

Integrate Anthropic harness-engineering observations into the methodology#16
EsatanGW merged 1 commit intomainfrom
claude/anthropic-harness-engineering-integration

Conversation

@EsatanGW
Copy link
Copy Markdown
Owner

Summary

Closes seven canonical-methodology gaps surfaced by cross-referencing Anthropic's Harness Design for Long-Running Agentic Applications (Mar 2026) and the existing OpenAI Harness engineering: leveraging Codex in an agent-first world (Feb 2026) against this repo.

All seven entries land in [Unreleased]; no version bump. +331 / −1 across 13 files (1 new file).

What changed (seven gaps closed)

Critical — P1 / P2 / P3

Where What
P1 docs/ai-operating-contract.md §12 New Context anxiety failure mode — premature task-shrink under perceived context pressure, parallel in structure to §11 verbal-completion illusion. Explicit prefer Context Reset over Compaction remedy. Glossary entry + cross-refs in resumption-protocol.md §Step 2b (outgoing-session symmetric rule) and ai-project-memory.md §Pre-compression protection list.
P2 docs/multi-agent-handoff.md §Acceptance criteria as a Sprint Contract Planner-side time-axis discipline catching unverifiable AC at write-time, not at Implementer's egress self-check. Two named rules — Reviewer-anticipation (Planner imagines Reviewer's audit) and Reverse-shape (AC text reverse-shapes Implementer choices). Three pre-handoff self-check questions.
P3 docs/harness-evolution-discipline.md (new file) + mode-decision-tree.md row + index registrations Per-material-model-release re-evaluation of canonical methodology components whose load-bearing-ness depends on a specific model-class failure mode. Sibling to anti-entropy-discipline.md (which excludes canonical methodology by design). Four-step procedure (map / re-test / classify / record). Companion mode-decision-tree.md row makes sweep-backed canonical retirement Lean-eligible — first canonical-component weight-shedding path.

Enrichments — 1d / 1e / B2 / 1f

Where What
1d multi-agent-handoff.md §Single-agent anti-collusion rule §Why this rule exists Preamble naming self-evaluation bias as the underlying behavioural failure the structural rule enforces against. Explicit "this is not prompt-engineerable away" with the Tool-permission matrix's no-write row as the load-bearing form.
1e folded into P2 §Reverse-shape rule AC text reverse-shapes Implementer choices (parallel to agent-persona-discipline.md's observation that medium reverse-shapes persona).
B2 multi-agent-handoff.md §Capability gating by risk level "Risk is one axis; capability frontier is another" callout. Risk-axis is encoded mechanical enforcement boundary; capability-frontier is the Planner-judgement signal alongside it. Matrix's gating column is a floor, not a ceiling.
1f mechanical-enforcement-discipline.md §Boundary with non-mechanical evaluation First canonical three-evaluator comparison: Mechanical / Application-driven / Agentic Reviewer audit, with layering (floor / bridge / ceiling) and a Planner-side allocation rule at Phase 3 (each AC to the cheapest evaluator that catches its failure shape). New anti-pattern: routing by familiarity rather than by failure shape. Companion: multi-agent-handoff.md §Reviewer §Must not do gains a row pointing back.

Discipline

  • Tool-agnostic. No vendor / model names in normative content (per CLAUDE.md §2).
  • No schema impact. No new manifest fields, no new evidence enums, no new role definitions.
  • SoT-consistent. All seven entries cite existing fields and rules; new normative content stays inside the file-role-map.md SoT discipline. New file registered in docs/README.md Tier-3 and docs/file-role-map.md.
  • Forced Full mode per CLAUDE.md §Mode implication (canonical methodology edits at L1+).

Test plan

Local CI validators (mirrors of .github/workflows/validate.yml jobs):

  • internal-links — no broken relative links across 189 markdown files
  • legacy-terms — no legacy terms outside allow-list
  • summary-drift — 53 docs, no TL;DR-vs-body drift
  • role-consistency — role contract consistent across SoT + 9 mirrors (8 invariants)
  • schema-syntax — 4 schemas valid
  • changelog-driftCHANGELOG.jsonCHANGELOG.md in sync
  • template-conformance — 7 manifest examples valid
  • cluster-disjointness — self-test passes
  • schema-drift — generated .json matches .yaml
  • CHANGELOG.json — valid JSON syntax

Out of scope (deferred)

Two lower-priority items from the original cross-reference were deliberately not addressed:

  • 2b (feature-at-a-time sprint cadence) — Anthropic's sprint construct is time-bounded; the repo's design philosophy is scope-driven (surfaces, clusters, modes). Adding time-axis cadence would create tension with existing decomposition philosophy without proportional value.
  • 2e (Sprint Contract as persistent artifact) — Task Prompt is intentionally non-persisted (multi-agent-handoff.md §Task Prompt structure); persisting it would split the SoT against the Manifest-as-state-snapshot discipline. Already a "design choice, not gap" in the original analysis.

🤖 Generated with Claude Code

…logy

Closes seven canonical-methodology gaps surfaced by cross-referencing
Anthropic's "Harness Design for Long-Running Agentic Applications" (Mar
2026) against the existing repo. All seven entries land in [Unreleased];
no version bump.

Critical additions (P1 / P2 / P3):

- §12 Context anxiety in docs/ai-operating-contract.md: names the
  intra-session premature-task-shrink failure, parallel in structure to
  §11 verbal-completion illusion. Distinguished from §4 (cross-session
  fact loss) and §3 (evidence-before-completion). Explicit
  "prefer Context Reset over Compaction" remedy. Glossary entry +
  cross-refs in resumption-protocol.md §Step 2b (outgoing-session
  symmetric rule) and ai-project-memory.md §Pre-compression protection
  list (rescue list is not the trigger condition).

- §Acceptance criteria as a Sprint Contract in
  docs/multi-agent-handoff.md: Planner-side time-axis discipline that
  catches unverifiable AC at write-time, not at Implementer's egress
  self-check. Two named rules — Reviewer-anticipation (Planner imagines
  Reviewer's audit) and Reverse-shape (AC text reverse-shapes
  Implementer choices). Three pre-handoff self-check questions.
  Catches the Anthropic-style sprint-contract concern without breaking
  role-separation.

- docs/harness-evolution-discipline.md (new file): per-material-model-
  release re-evaluation of canonical methodology components whose load-
  bearing-ness depends on a specific model-class failure mode. Sibling
  to anti-entropy-discipline.md (which excludes canonical methodology by
  design). Four-step procedure (map / re-test / classify / record).
  Companion mode-decision-tree.md row makes sweep-backed canonical
  retirement Lean-eligible (symmetric to existing project-local row),
  giving the methodology its first canonical-component weight-shedding
  path. Registered in docs/README.md Tier-3 + docs/file-role-map.md.

Enrichments (1d / 1e / B2 / 1f):

- §Single-agent anti-collusion rule §Why this rule exists: preamble
  naming the underlying behavioural failure (self-evaluation bias) the
  structural rule enforces against. Explicit "this is not prompt-
  engineerable away" with the Tool-permission matrix's no-write row as
  the load-bearing form.

- §Capability gating by risk level: "Risk is one axis; capability
  frontier is another" callout. Risk-axis is the encoded mechanical
  enforcement boundary; capability-frontier is the Planner-judgement
  signal alongside it. Matrix's gating column is a floor, not a
  ceiling.

- §Boundary with non-mechanical evaluation in
  docs/mechanical-enforcement-discipline.md: first canonical
  three-evaluator comparison (Mechanical / Application-driven / Agentic
  Reviewer audit) with layering (floor / bridge / ceiling) and a
  Planner-side allocation rule at Phase 3 (each AC to the cheapest
  evaluator that catches its failure shape). New anti-pattern: routing
  by familiarity rather than by failure shape. Companion edit:
  multi-agent-handoff.md §Reviewer §Must not do gains a "spend audit
  attention on what a mechanical check should have caught" row.

Discipline:

- No vendor / model names in normative content (model-agnostic).
- No new schema fields, no new manifest enums, no new role definitions.
- All seven entries cite existing fields and rules; new normative
  content stays inside the file-role-map.md SoT discipline.
- Local CI validators all pass (internal-links, legacy-terms,
  summary-drift, role-consistency, schema-syntax, changelog-drift,
  template-conformance, cluster-disjointness, schema-drift).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@EsatanGW EsatanGW merged commit 6478dfe into main Apr 30, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant