diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md b/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md index 5edb184..31ad9f5 100644 --- a/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md +++ b/reconstruction-guardrails/verification-and-native-test-oracles/NODE.md @@ -15,6 +15,7 @@ This subdomain captures cross-cutting knowledge about how the observed Claude Co Relevant leaves: +- **[parity-capability-matrix.md](parity-capability-matrix.md)** — Which capability families are blocking for parity, which are extension-level, and what evidence bar each family must clear before a rebuild can claim success. - **[reconstruction-target-and-evidence-boundary.md](reconstruction-target-and-evidence-boundary.md)** — How source-snapshot evidence and later released-binary evidence can both inform the tree without collapsing into one false versionless parity claim. - **[test-framework-overview.md](test-framework-overview.md)** — The layered shape of the current test system, including the visible tier model and the boundary between confirmed and inferred runner details. - **[real-cli-e2e-scenario-corpus.md](real-cli-e2e-scenario-corpus.md)** — A live-observed black-box scenario set for validating whether a rebuild behaves like a real Claude Code CLI across startup, headless runs, session continuity, structured I/O, and diagnostics. diff --git a/reconstruction-guardrails/verification-and-native-test-oracles/parity-capability-matrix.md b/reconstruction-guardrails/verification-and-native-test-oracles/parity-capability-matrix.md new file mode 100644 index 0000000..d8bbf7d --- /dev/null +++ b/reconstruction-guardrails/verification-and-native-test-oracles/parity-capability-matrix.md @@ -0,0 +1,111 @@ +--- +title: "Parity Capability Matrix" +owners: [bingran-you] +soft_links: + - /reconstruction-guardrails/rebuild-phasing.md + - /reconstruction-guardrails/verification-and-acceptance-strategy.md + - /reconstruction-guardrails/verification-and-native-test-oracles/reconstruction-target-and-evidence-boundary.md + - /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md + - /product-surface/NODE.md + - /platform-services/NODE.md + - /integrations/NODE.md + - /collaboration-and-agents/NODE.md + - /ui-and-experience/NODE.md +--- + +# Parity Capability Matrix + +The tree already explains many Claude Code behaviors in prose. What it still needed was a reusable answer to a narrower execution question: **which capability families are blocking for parity, which are extension-level, and what evidence bar must each family clear before a rebuild can claim success?** + +This matrix is that answer. It is not a file inventory and not a rewrite backlog by repository. It is a capability-first go/no-go frame for reconstruction and evaluation work. + +## Scope boundary + +This leaf covers: + +- the capability bands a serious Claude Code rebuild must track +- the minimum evidence bar for each band +- the difference between "usable", "high-confidence parity", and "100% parity claim" + +It intentionally does not re-document: + +- detailed subsystem contracts already captured in domain leaves +- runner-specific implementation tasks for any one rewrite repository +- exact command scripts or fixture contents + +## Status vocabulary + +The matrix uses these tree-readiness labels: + +- **contract-covered**: the tree already captures the behavior family well enough to guide implementation +- **partially executable**: the tree has useful acceptance guidance, but still lacks enough runnable or directly reusable verification assets to make the family cheap to prove +- **not yet executable**: the tree has broad prose or partial coverage, but still needs a tighter acceptance asset or parity bundle before it can act like a TCK for that family + +## Capability bands + +| Band | Capability family | Minimum accepted surface | Evidence minimum | Current tree state | +| --- | --- | --- | --- | --- | +| `P0` | Root startup and interactive shell | trusted workspace entry, persistent REPL shell, prompt loop, basic progress/error feedback | tree contract + automated shell coverage + real CLI comparison | `contract-covered`, `partially executable` | +| `P0` | Headless and structured I/O | `-p`, JSON output, stream-json init/events, schema output, mode gating | tree contract + deterministic protocol tests + real CLI comparison | `contract-covered`, `partially executable` | +| `P0` | Core tools and permission model | core read/edit/search/shell flows, allow/deny/ask semantics, end-to-end approval routing | tree contract + regression/integration lanes + at least one approval-focused e2e harness | `contract-covered`, `partially executable` | +| `P0` | Session persistence, resume, and compaction | session creation, directory-scoped continuation, resume, fork, compaction, rehydration | tree contract + persistence-aware tests + real CLI comparison | `contract-covered`, `partially executable` | +| `P0` | Auth, provider routing, trust, and settings posture | workspace trust, provider-specific auth, config layering, policy-sensitive capability hydration | tree contract + integration coverage + named-provider runtime comparison | `contract-covered`, `partially executable` | +| `P1` | Skills, MCP, and plugins | skill loading, command projection, MCP server lifecycle, plugin management, trust-sensitive runtime effects | tree contract + integration coverage + command-level runtime comparison | `contract-covered`, `partially executable` | +| `P1` | Agents, tasks, and background work | built-in agent surfaces, local worker lifecycle, task visibility, steering, verification-agent semantics | tree contract + lifecycle tests + interactive runtime comparison | `contract-covered`, `not yet executable` | +| `P1` | Maintenance and operational surfaces | `doctor`, install, update, token/setup, diagnostics, state round-trip | tree contract + command-level comparison + state-path verification | `contract-covered`, `partially executable` | +| `P2` | Remote and multi-surface execution | `ssh`, direct connect, bridge/remote-control, reconnect, transport-sensitive permission routing | tree contract + transport e2e harnesses + environment-matched runtime comparison | `contract-covered`, `not yet executable` | +| `P2` | Specialized user surfaces | voice, companion, IDE/browser-adjacent surfaces, feature-gated differentiation | tree contract + targeted acceptance flows + version-aware runtime comparison | `contract-covered`, `not yet executable` | + +## Go/no-go rules + +Equivalent behavior should preserve these claim thresholds: + +- **not ready for parity claims** if any `P0` family is missing or only partially described at the contract layer +- **usable daily local rewrite** only after all `P0` families have both implementation coverage and at least one real-client comparison bundle for the chosen target line +- **high-confidence parity milestone** only after all relevant `P0` and selected `P1` families for the milestone are backed by automated evidence and state-path checks +- **100% parity claim** only after all `P0`, `P1`, and target-relevant `P2` families are satisfied for one explicitly named reconstruction target + +The last line matters. A rebuild cannot honestly claim "100%" while leaving remote, bridge, maintenance, or specialized surfaces out of scope unless the target itself excluded them. + +## Reconstruction planning rule + +Use the matrix together with [rebuild-phasing.md](../rebuild-phasing.md): + +- `P0` maps to the minimum phases that turn the system into a real Claude Code-like product +- `P1` covers extension and daily-operations families that strongly affect practical parity +- `P2` covers the high-risk, transport-heavy, or feature-gated families where shallow demo parity is especially misleading + +This keeps the tree from treating every missing leaf as equally urgent. + +## Verification planning rule + +Use the matrix together with [verification-and-acceptance-strategy.md](../verification-and-acceptance-strategy.md): + +- the acceptance strategy says **how** to prove parity +- this matrix says **which capability families must clear that proof bar before a milestone can use a stronger label** + +If a milestone note says "Claude Code parity" but does not identify which matrix rows are actually green, the claim is too vague. + +## Why this is still not a full executable TCK + +The matrix is a control surface, not the final runnable suite. + +It deliberately does not pretend the tree already contains: + +- every fixture corpus +- every golden transcript +- every provider-specific replay asset +- every transport harness needed for `P2` + +What it does provide is the missing review frame for deciding whether a proposed rewrite milestone is: + +- below parity bar +- locally convincing for a named capability band +- or strong enough to claim end-to-end equivalence + +## Failure modes + +- **coverage prose illusion**: many leaves exist, but nobody can say which parity-critical capability families are actually proven +- **milestone inflation**: a rewrite calls itself "Claude Code-compatible" after clearing only a subset of `P0` families +- **band collapse**: maintenance, remote, and specialized surfaces are treated as optional forever even when the target parity claim includes them +- **versionless completion**: all matrix rows are discussed abstractly, but no row is tied to one explicit reconstruction target and runtime posture