Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ This subdomain captures cross-cutting knowledge about how the observed Claude Co

Relevant leaves:

- **[parity-capability-matrix.md](parity-capability-matrix.md)** — Which capability families are blocking for parity, which are extension-level, and what evidence bar each family must clear before a rebuild can claim success.
- **[reconstruction-target-and-evidence-boundary.md](reconstruction-target-and-evidence-boundary.md)** — How source-snapshot evidence and later released-binary evidence can both inform the tree without collapsing into one false versionless parity claim.
- **[test-framework-overview.md](test-framework-overview.md)** — The layered shape of the current test system, including the visible tier model and the boundary between confirmed and inferred runner details.
- **[real-cli-e2e-scenario-corpus.md](real-cli-e2e-scenario-corpus.md)** — A live-observed black-box scenario set for validating whether a rebuild behaves like a real Claude Code CLI across startup, headless runs, session continuity, structured I/O, and diagnostics.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
title: "Parity Capability Matrix"
owners: [bingran-you]
soft_links:
- /reconstruction-guardrails/rebuild-phasing.md
- /reconstruction-guardrails/verification-and-acceptance-strategy.md
- /reconstruction-guardrails/verification-and-native-test-oracles/reconstruction-target-and-evidence-boundary.md
- /reconstruction-guardrails/verification-and-native-test-oracles/test-lane-coverage-map.md
- /product-surface/NODE.md
- /platform-services/NODE.md
- /integrations/NODE.md
- /collaboration-and-agents/NODE.md
- /ui-and-experience/NODE.md
---

# Parity Capability Matrix

The tree already explains many Claude Code behaviors in prose. What it still needed was a reusable answer to a narrower execution question: **which capability families are blocking for parity, which are extension-level, and what evidence bar must each family clear before a rebuild can claim success?**

This matrix is that answer. It is not a file inventory and not a rewrite backlog by repository. It is a capability-first go/no-go frame for reconstruction and evaluation work.

## Scope boundary

This leaf covers:

- the capability bands a serious Claude Code rebuild must track
- the minimum evidence bar for each band
- the difference between "usable", "high-confidence parity", and "100% parity claim"

It intentionally does not re-document:

- detailed subsystem contracts already captured in domain leaves
- runner-specific implementation tasks for any one rewrite repository
- exact command scripts or fixture contents

## Status vocabulary

The matrix uses these tree-readiness labels:

- **contract-covered**: the tree already captures the behavior family well enough to guide implementation
- **partially executable**: the tree has useful acceptance guidance, but still lacks enough runnable or directly reusable verification assets to make the family cheap to prove
- **not yet executable**: the tree has broad prose or partial coverage, but still needs a tighter acceptance asset or parity bundle before it can act like a TCK for that family

## Capability bands

| Band | Capability family | Minimum accepted surface | Evidence minimum | Current tree state |
| --- | --- | --- | --- | --- |
| `P0` | Root startup and interactive shell | trusted workspace entry, persistent REPL shell, prompt loop, basic progress/error feedback | tree contract + automated shell coverage + real CLI comparison | `contract-covered`, `partially executable` |
| `P0` | Headless and structured I/O | `-p`, JSON output, stream-json init/events, schema output, mode gating | tree contract + deterministic protocol tests + real CLI comparison | `contract-covered`, `partially executable` |
| `P0` | Core tools and permission model | core read/edit/search/shell flows, allow/deny/ask semantics, end-to-end approval routing | tree contract + regression/integration lanes + at least one approval-focused e2e harness | `contract-covered`, `partially executable` |
| `P0` | Session persistence, resume, and compaction | session creation, directory-scoped continuation, resume, fork, compaction, rehydration | tree contract + persistence-aware tests + real CLI comparison | `contract-covered`, `partially executable` |
| `P0` | Auth, provider routing, trust, and settings posture | workspace trust, provider-specific auth, config layering, policy-sensitive capability hydration | tree contract + integration coverage + named-provider runtime comparison | `contract-covered`, `partially executable` |
| `P1` | Skills, MCP, and plugins | skill loading, command projection, MCP server lifecycle, plugin management, trust-sensitive runtime effects | tree contract + integration coverage + command-level runtime comparison | `contract-covered`, `partially executable` |
| `P1` | Agents, tasks, and background work | built-in agent surfaces, local worker lifecycle, task visibility, steering, verification-agent semantics | tree contract + lifecycle tests + interactive runtime comparison | `contract-covered`, `not yet executable` |
| `P1` | Maintenance and operational surfaces | `doctor`, install, update, token/setup, diagnostics, state round-trip | tree contract + command-level comparison + state-path verification | `contract-covered`, `partially executable` |
| `P2` | Remote and multi-surface execution | `ssh`, direct connect, bridge/remote-control, reconnect, transport-sensitive permission routing | tree contract + transport e2e harnesses + environment-matched runtime comparison | `contract-covered`, `not yet executable` |
| `P2` | Specialized user surfaces | voice, companion, IDE/browser-adjacent surfaces, feature-gated differentiation | tree contract + targeted acceptance flows + version-aware runtime comparison | `contract-covered`, `not yet executable` |

## Go/no-go rules

Equivalent behavior should preserve these claim thresholds:

- **not ready for parity claims** if any `P0` family is missing or only partially described at the contract layer
- **usable daily local rewrite** only after all `P0` families have both implementation coverage and at least one real-client comparison bundle for the chosen target line
- **high-confidence parity milestone** only after all relevant `P0` and selected `P1` families for the milestone are backed by automated evidence and state-path checks
- **100% parity claim** only after all `P0`, `P1`, and target-relevant `P2` families are satisfied for one explicitly named reconstruction target

The last line matters. A rebuild cannot honestly claim "100%" while leaving remote, bridge, maintenance, or specialized surfaces out of scope unless the target itself excluded them.

## Reconstruction planning rule

Use the matrix together with [rebuild-phasing.md](../rebuild-phasing.md):

- `P0` maps to the minimum phases that turn the system into a real Claude Code-like product
- `P1` covers extension and daily-operations families that strongly affect practical parity
- `P2` covers the high-risk, transport-heavy, or feature-gated families where shallow demo parity is especially misleading

This keeps the tree from treating every missing leaf as equally urgent.

## Verification planning rule

Use the matrix together with [verification-and-acceptance-strategy.md](../verification-and-acceptance-strategy.md):

- the acceptance strategy says **how** to prove parity
- this matrix says **which capability families must clear that proof bar before a milestone can use a stronger label**

If a milestone note says "Claude Code parity" but does not identify which matrix rows are actually green, the claim is too vague.

## Why this is still not a full executable TCK

The matrix is a control surface, not the final runnable suite.

It deliberately does not pretend the tree already contains:

- every fixture corpus
- every golden transcript
- every provider-specific replay asset
- every transport harness needed for `P2`

What it does provide is the missing review frame for deciding whether a proposed rewrite milestone is:

- below parity bar
- locally convincing for a named capability band
- or strong enough to claim end-to-end equivalence

## Failure modes

- **coverage prose illusion**: many leaves exist, but nobody can say which parity-critical capability families are actually proven
- **milestone inflation**: a rewrite calls itself "Claude Code-compatible" after clearing only a subset of `P0` families
- **band collapse**: maintenance, remote, and specialized surfaces are treated as optional forever even when the target parity claim includes them
- **versionless completion**: all matrix rows are discussed abstractly, but no row is tied to one explicit reconstruction target and runtime posture
Loading