Conversation
Eval Smoke Results✅ Smoke evals clean — 0 regressions, 4 improvements
Posted by Specwright eval-smoke workflow. This comment is updated on each push. |
Review: PR 208 — Revive shared operator surface cutoverOverall: solid. The core refactoring is well-executed — consolidating the three adapters' summary-rendering into IssuesMedium — silent warning truncation
Low — hardcoded deduplication codes are a fragile contract
Low — `${indent}Gates: ${work.gatesSummary ?? summary?.card?.gates?.summary ?? 'No gates recorded yet.'}`
Positives
|
| }) | ||
| .map((warning) => normalizeString(warning?.summary)) | ||
| .filter(Boolean) | ||
| .slice(0, MAX_RENDERED_WARNING_LINES) |
There was a problem hiding this comment.
The cap silently discards any warnings beyond index 1. Operators have no way to know there were more. Consider tracking the pre-cap count and appending an overflow indicator when it exceeds MAX_RENDERED_WARNING_LINES:
| .slice(0, MAX_RENDERED_WARNING_LINES) | |
| .slice(0, MAX_RENDERED_WARNING_LINES); | |
| if (mapped.length < significant.length) { | |
| mapped.push(`${indent}… and ${significant.length - mapped.length} more — run /sw-status for full detail`); | |
| } | |
| return mapped.map((summary) => `${indent}WARNING: ${summary}`); |
(Adjust variable names to match the actual refactor — the point is that the final slice should preserve a count so the caller can emit an overflow hint.)
| return false; | ||
| } | ||
|
|
||
| return !['missing-closeout', 'branch-mismatch'].includes(code); |
There was a problem hiding this comment.
These code strings ('missing-closeout', 'branch-mismatch', 'approval-' prefix) form an implicit contract with buildStatusCard. A code rename in the card won't produce a test failure — it will silently re-introduce the duplicate warnings this filter is meant to prevent. Consider declaring a shared constant or at minimum a comment that names where these codes originate so the contract is visible:
// Codes produced by buildStatusCard that have dedicated rendering sections above.
// Keep in sync with specwright-status-card.mjs.
const SUPPRESSED_WARNING_CODES = new Set(['missing-closeout', 'branch-mismatch']);and using code.startsWith('approval-') || SUPPRESSED_WARNING_CODES.has(code) in the filter.
| `${indent}Unit: ${work.workId} (${work.status})`, | ||
| work.unitId ? `${indent}Active Unit: ${work.unitId}` : null, | ||
| `${indent}Progress: ${work.completedCount}/${work.totalCount} tasks`, | ||
| `${indent}Gates: ${work.gatesSummary ?? summary?.card?.gates?.summary ?? 'No gates recorded yet.'}`, |
There was a problem hiding this comment.
?? is nullish-only, so an empty-string gatesSummary (e.g. from a partially-written workflow record) bypasses both fallbacks and renders as Gates: . Probably unreachable today, but || would make the intent explicit and handle the edge case:
| `${indent}Gates: ${work.gatesSummary ?? summary?.card?.gates?.summary ?? 'No gates recorded yet.'}`, | |
| `${indent}Gates: ${work.gatesSummary || summary?.card?.gates?.summary || 'No gates recorded yet.'}`, |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00cc60332b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| lockWarning, | ||
| ownershipWarning, | ||
| shippingWarning, | ||
| const operatorSummary = loadOperatorSurfaceSummary(stateInfo, work); |
There was a problem hiding this comment.
Guard status-card load to avoid blank startup summaries
This new call can throw when operator-surface artifacts are malformed (for example, if stage-report.md resolves to a directory or another unreadable entry), because loadOperatorSurfaceSummary reads closeout/approval files from disk and propagates filesystem errors. In Codex this exception is caught only by the outer catch, which exits silently, so the session-start hook produces no work-in-progress output at all; before this commit, Codex still showed the basic summary even when closeout artifacts were bad. Add a local fallback around this load so Codex continues to print core progress details when status-card inputs are corrupt.
Useful? React with 👍 / 👎.
Summary
This PR revives the meaningful operator-surface and support-surface work that was left behind on #205, but rebases it onto current
mainas a clean replacement branch.It keeps the developer-facing improvements across Codex, Claude Code, and Opencode, while dropping the low-signal audit/research churn from the earlier branch.
What changed
bunavailabilityWhy this replaces #205
#205 still contains useful user-facing work, but its branch is now out of date and carries merge-conflict/review noise. This PR preserves the value from that branch in a state that is ready for normal review and merge.
Verification
python -m pytest evals/tests/test_operator_surface_visibility.py evals/tests/test_operator_surface_status_card.py -vbash tests/test-support-surface-cutover-docs.shenv SPECWRIGHT_CLAUDE_BUILD_MODE=smoke bash tests/test-claude-code-build.shbash tests/test-codex-hooks.shbash tests/test-opencode-plugin.shtest-suite