Skip to content

Revive shared operator surface cutover#208

Merged
MacAttak merged 3 commits intomainfrom
work/operator-surface-revival
Apr 22, 2026
Merged

Revive shared operator surface cutover#208
MacAttak merged 3 commits intomainfrom
work/operator-surface-revival

Conversation

@MacAttak
Copy link
Copy Markdown
Contributor

Summary

This PR revives the meaningful operator-surface and support-surface work that was left behind on #205, but rebases it onto current main as a clean replacement branch.

It keeps the developer-facing improvements across Codex, Claude Code, and Opencode, while dropping the low-signal audit/research churn from the earlier branch.

What changed

  • adopt the shared operator/support surface across adapter command docs and runtime hooks
  • wire Codex, Claude Code, and Opencode through the same work-in-progress summary model
  • preserve quality-correction replay in Codex session start and avoid inactive-work summary reads
  • harden the operator-surface evals so Opencode coverage is conditional on bun availability
  • add explicit regression coverage for warning-cap behavior and support-surface status cards
  • update the Opencode plugin shell regression to match the shared summary refactor

Why this replaces #205

#205 still contains useful user-facing work, but its branch is now out of date and carries merge-conflict/review noise. This PR preserves the value from that branch in a state that is ready for normal review and merge.

Verification

  • python -m pytest evals/tests/test_operator_surface_visibility.py evals/tests/test_operator_surface_status_card.py -v
  • bash tests/test-support-surface-cutover-docs.sh
  • env SPECWRIGHT_CLAUDE_BUILD_MODE=smoke bash tests/test-claude-code-build.sh
  • bash tests/test-codex-hooks.sh
  • bash tests/test-opencode-plugin.sh
  • pre-push test-suite

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 22, 2026

Eval Smoke Results

✅ Smoke evals clean — 0 regressions, 4 improvements

Eval Pass Rate Duration Tokens (input+output) Verdict
grader-function-tests 1.00 (+0.00) 403ms (-1091ms) 0 (+0) improved
structural-handoff-template 1.00 (+0.00) 30ms (-13ms) 0 (+0) improved
structural-skill-validation 1.00 (+0.00) 4331ms (-130ms) 0 (+0) improved
structural-state-enforcement 1.00 (+0.00) 79ms (-12ms) 0 (+0) improved
workflow-yaml-validation 1.00 (+0.00) 1314ms (+14ms) 0 (+0) ok

Posted by Specwright eval-smoke workflow. This comment is updated on each push.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 22, 2026

Review: PR 208 — Revive shared operator surface cutover

Overall: solid. The core refactoring is well-executed — consolidating the three adapters' summary-rendering into renderWorkInProgressSummary removes ~40 lines of duplicated inline-formatting per hook, the Codex hook gains quality-correction replay parity with Claude Code, and the support-surface vocabulary tests give good regression coverage for the operator/adapter consistency story. A few things to address before merge:


Issues

Medium — silent warning truncation

MAX_RENDERED_WARNING_LINES = 2 silently drops any third-or-later warnings in formatWarningLines. The new test confirms the cap works, but it also confirms operators receive no signal that warnings were dropped. If buildStatusCard ever produces three real warnings (e.g. degraded-root-resolution + runtime-fallback + a third), the third disappears without a trace. Consider appending a line like … and N more warnings — run /sw-status for full detail when the pre-filter list exceeds the cap.

Low — hardcoded deduplication codes are a fragile contract

formatWarningLines suppresses warnings by matching code strings ('missing-closeout', 'branch-mismatch', prefix 'approval-') to avoid echoing what branch/closeout/approval lines already show. There's no shared constant or enum between buildStatusCard and this filter, so a code rename in the card won't surface here as a test failure — it'll silently start duplicating the warning. Worth defining the codes in one place (e.g. a SUPPRESSED_WARNING_CODES set exported alongside the rest of the module) or at least adding a comment naming the contract.

Low — gatesSummary empty-string edge case

`${indent}Gates: ${work.gatesSummary ?? summary?.card?.gates?.summary ?? 'No gates recorded yet.'}`

?? is nullish-only, so if work.gatesSummary is '' (empty string), the fallback to the card is skipped and the operator sees a blank Gates: line instead of the card value or the default string. Probably not reachable in practice, but if normalizeActiveWork can ever return an empty-string gatesSummary this will be invisible. Using || instead of ?? here (or gatesSummary || null before the chain) would make the intent explicit.


Positives

  • Codex quality-correction replay (continuation snapshot parsing + --- Quality Corrections --- injection) fills the only remaining parity gap between the three adapters. The test coverage is good — happy path, missing section, and the shipped-work fast-exit case all covered.
  • Conditional bun guard in test_primary_operator_surfaces_show_branch_approval_and_next_without_closeout is consistent with the rest of the file's pattern and doesn't inflate required infrastructure.
  • renderOperationalWarningLines is not exported — lock/ownership/shipping warnings stay encapsulated inside renderWorkInProgressSummary rather than leaking into callers. Right call.
  • test-support-surface-cutover-docs.sh coverage marker approach is consistent and the string-literal assertions across all eight command wrapper files give cheap but effective vocabulary regression coverage.
  • Minor wording inconsistency between CLAUDE.md ("next command") and AGENTS.md ("next action") in the Operator Notes sections — not a blocker but worth aligning when touching those files next.

})
.map((warning) => normalizeString(warning?.summary))
.filter(Boolean)
.slice(0, MAX_RENDERED_WARNING_LINES)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cap silently discards any warnings beyond index 1. Operators have no way to know there were more. Consider tracking the pre-cap count and appending an overflow indicator when it exceeds MAX_RENDERED_WARNING_LINES:

Suggested change
.slice(0, MAX_RENDERED_WARNING_LINES)
.slice(0, MAX_RENDERED_WARNING_LINES);
if (mapped.length < significant.length) {
mapped.push(`${indent}… and ${significant.length - mapped.length} more — run /sw-status for full detail`);
}
return mapped.map((summary) => `${indent}WARNING: ${summary}`);

(Adjust variable names to match the actual refactor — the point is that the final slice should preserve a count so the caller can emit an overflow hint.)

return false;
}

return !['missing-closeout', 'branch-mismatch'].includes(code);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These code strings ('missing-closeout', 'branch-mismatch', 'approval-' prefix) form an implicit contract with buildStatusCard. A code rename in the card won't produce a test failure — it will silently re-introduce the duplicate warnings this filter is meant to prevent. Consider declaring a shared constant or at minimum a comment that names where these codes originate so the contract is visible:

// Codes produced by buildStatusCard that have dedicated rendering sections above.
// Keep in sync with specwright-status-card.mjs.
const SUPPRESSED_WARNING_CODES = new Set(['missing-closeout', 'branch-mismatch']);

and using code.startsWith('approval-') || SUPPRESSED_WARNING_CODES.has(code) in the filter.

`${indent}Unit: ${work.workId} (${work.status})`,
work.unitId ? `${indent}Active Unit: ${work.unitId}` : null,
`${indent}Progress: ${work.completedCount}/${work.totalCount} tasks`,
`${indent}Gates: ${work.gatesSummary ?? summary?.card?.gates?.summary ?? 'No gates recorded yet.'}`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?? is nullish-only, so an empty-string gatesSummary (e.g. from a partially-written workflow record) bypasses both fallbacks and renders as Gates: . Probably unreachable today, but || would make the intent explicit and handle the edge case:

Suggested change
`${indent}Gates: ${work.gatesSummary ?? summary?.card?.gates?.summary ?? 'No gates recorded yet.'}`,
`${indent}Gates: ${work.gatesSummary || summary?.card?.gates?.summary || 'No gates recorded yet.'}`,

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00cc60332b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

lockWarning,
ownershipWarning,
shippingWarning,
const operatorSummary = loadOperatorSurfaceSummary(stateInfo, work);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard status-card load to avoid blank startup summaries

This new call can throw when operator-surface artifacts are malformed (for example, if stage-report.md resolves to a directory or another unreadable entry), because loadOperatorSurfaceSummary reads closeout/approval files from disk and propagates filesystem errors. In Codex this exception is caught only by the outer catch, which exits silently, so the session-start hook produces no work-in-progress output at all; before this commit, Codex still showed the basic summary even when closeout artifacts were bad. Add a local fallback around this load so Codex continues to print core progress details when status-card inputs are corrupt.

Useful? React with 👍 / 👎.

@MacAttak MacAttak merged commit 37d22d9 into main Apr 22, 2026
13 checks passed
@MacAttak MacAttak deleted the work/operator-surface-revival branch April 22, 2026 10:28
@github-actions github-actions Bot mentioned this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant