Skip to content

Add the slim staged workflow chain#8

Open
guangyucoder wants to merge 16 commits intomainfrom
slim
Open

Add the slim staged workflow chain#8
guangyucoder wants to merge 16 commits intomainfrom
slim

Conversation

@guangyucoder
Copy link
Copy Markdown
Owner

Context

The slim branch replaces heavier workflow machinery with a smaller staged chain built around implement/review/doc-fix prompts, bounded workpad markers, and orchestrator-driven routing.

TL;DR

Add the slim staged workflow chain and automate implement-to-review handoff.

Summary

  • add stage-specific workflows, marker parsing, stage routing, and closeout gates for the slim implement/review/doc-fix loop
  • wire orchestrator, workspace, and tracker updates for single-turn staged runs plus workpad-based handoff
  • add hook-driven post-implement validation, SSH/live-E2E support, and broad Elixir coverage for the slim flow

Alternatives

  • Keep all handoff logic in WORKFLOW.md, but that leaves more repetitive behavior in the prompt and drifts from harness behavior.
  • Build a generic hook or pipeline engine, but that felt too heavy for the current slim experiment.

Test Plan

  • make -C elixir all currently fails on existing Credo debt on slim, including pre-existing StageCloseout / StageOrchestrator findings
  • mix test
  • mix test test/symphony_elixir/core_test.exs test/symphony_elixir/workspace_and_config_test.exs test/symphony_elixir/prompt_builder_test.exs

frantic-openai and others added 16 commits March 10, 2026 08:41
#### Context

The config layer had grown around NimbleOptions and a large getter surface. This branch moves parsing to an Ecto schema and simplifies callers to read typed nested settings directly.

#### TL;DR

*Replace the bespoke config access layer with an Ecto-backed schema and typed settings access.*

#### Summary

- Replace the old config extraction path with an Ecto embedded schema and defaults
- Collapse `Config` down to `settings/0`, `settings!/0`, and a small runtime helper surface
- Update callers and tests to read nested config structs directly instead of one-off getter wrappers
- Keep the repo green under format, lint, coverage, and dialyzer by adding the missing schema specs and tests

#### Alternatives

- Keep NimbleOptions and shrink the wrapper layer incrementally, but that would preserve the custom parser shape
- Keep the strict refactor without extra schema tests, but that left `make -C elixir all` red on coverage and dialyzer

#### Test Plan

- [x] `make -C elixir all`
- [x] `cd elixir && mix test test/symphony_elixir/workspace_and_config_test.exs --warnings-as-errors`
#### Context

The schema and tests now treat tracker state lists as YAML lists and only lowercase
state-limit keys, but `SPEC.md` still documented older and stricter behavior.

#### TL;DR

*Align `SPEC.md` with the current workflow config and state normalization behavior.*

#### Summary

- document tracker `active_states` and `terminal_states` as YAML string lists
- update the cheat sheet defaults to use list examples instead of CSV strings
- remove trim-based state normalization language from the spec

#### Alternatives

- restore support for comma-separated tracker state strings in the schema
- keep trim-based normalization in the spec and change the implementation

#### Test Plan

- [x] `make -C elixir all`
- [x] Docs-only diff reviewed against current schema and tests
#### Context

Symphony Elixir had a few correctness gaps around retry scheduling, refresh coalescing,
workspace path handling, Codex turn policy forwarding, and running-issue refreshes.

#### TL;DR

*Stabilize Symphony Elixir orchestration, workspace safety, and Codex policy passthrough.*

#### Summary

- Ignore stale retry timers and coalesce superseded poll ticks in the orchestrator.
- Canonicalize workspace paths and clarify last-known-good workflow reload behavior.
- Pass explicit Codex turn sandbox policies through unchanged and add integration coverage.
- Paginate Linear by-id issue-state refreshes so running issue reconciliation sees all IDs.

#### Alternatives

- Keep local validation and rewriting of explicit turn sandbox policies, but that drifted from
  Codex semantics and broke documented policy types.
- Keep single-request by-id issue refreshes, but that truncates after 50 issues and can stop
  healthy workers during reconciliation.

#### Test Plan

- [x] `make -C elixir all`
- [x] Targeted regression coverage for AppServer turn policy passthrough and Linear by-id pagination
#### Context

Add a real end-to-end path for Symphony Elixir that exercises Linear and
Codex together, and make it easy to invoke explicitly from one Make target.

#### TL;DR

*Add a real live Linear/Codex E2E test plus a `make e2e` entrypoint.*

#### Summary

- Add an opt-in live Elixir test that creates a real Linear project and issue.
- Run a real Codex turn and verify both workspace output and Linear side effects.
- Require Codex to post a Linear comment and close the issue before completion.
- Add `make e2e` with clear failures for missing `LINEAR_API_KEY` or `codex`.
- Document the new E2E entrypoint and behavior in the Elixir README.

#### Alternatives

- Keep using ad hoc `mix test` commands and environment toggles.
- Keep shimming Linear or post-processing issue state locally.
- Those keep the happy path less reproducible and do not prove Codex can
  mutate Linear itself during the turn.

#### Test Plan

- [x] `make -C elixir all`
- [x] `env -u LINEAR_API_KEY make -C elixir e2e`
- [x] `LINEAR_API_KEY=$(tr -d '\r\n' < ~/.linear_api_key) SYMPHONY_RUN_LIVE_E2E=1 mix test test/symphony_elixir/live_e2e_test.exs`
#### Context

Symphony needs to run tickets on SSH workers, keep the live E2E reliable, and preserve correct remote path semantics.

#### TL;DR

*Add SSH workers, Docker-backed SSH live E2E coverage, per-host caps, and correct worker-side `~` resolution.*

#### Summary

- Add SSH worker execution, remote workspace handling, and SSH app-server launch support in Elixir.
- Add live E2E coverage for both local workers and SSH workers, with Docker-backed SSH workers by default.
- Add an optional shared `worker.max_concurrent_agents_per_host` cap and document the SSH extension.
- Keep `workspace.root` raw so SSH workers resolve `~` on the worker, while local paths expand at local use sites.

#### Alternatives

- Keep expanding `workspace.root` in config and carry duplicate raw path state, but that complicates the config model.
- Require only absolute remote workspace roots, but that makes real worker setup less ergonomic and less representative.

#### Test Plan

- [x] `make -C elixir all`
- [x] `cd elixir && env -u SYMPHONY_LIVE_SSH_WORKER_HOSTS LINEAR_API_KEY="$(tr -d '\r\n' < ~/.linear_api_key)" SYMPHONY_RUN_LIVE_E2E=1 mix test test/symphony_elixir/live_e2e_test.exs:128`
#### Context

The Codex app-server merges stderr into stdout (`stderr_to_stdout`).
Non-JSON diagnostic lines written by Codex during healthy turns were
incorrectly emitted as `:malformed` protocol events in the Symphony UI.

<img width="2496" height="446" alt="Screenshot 2026-03-12 at 4 27 20 PM"
src="https://github.com/user-attachments/assets/3d85fa5b-c0d9-4079-87c6-850c7ccf7ee2"
/>


#### TL;DR

*Only emit `:malformed` events for JSON-like protocol frames (lines
starting with `{`); silently log all other non-JSON stream output.*

#### Summary

- Gate `:malformed` event emission on a new
`protocol_message_candidate?/1` check — only lines starting with `{`
- Preserve full logging of all non-JSON stderr noise via
`log_non_json_stream_line`
- Fix existing "Capture stderr log" test to assert no `:malformed` event
is emitted for stderr noise
- Add regression test verifying truncated JSON-like frames still surface
as `:malformed`

#### Alternatives

- Separate stderr from stdout with a dedicated pipe: more invasive
change, unnecessary given the simple heuristic
- Suppress all non-JSON output silently: would lose valuable diagnostic
logging

#### Test Plan

- [x] `make -C elixir all`
- [x] `mise exec -- mix format lib/symphony_elixir/codex/app_server.ex
test/symphony_elixir/app_server_test.exs`
- [x] Additional: `MIX_ENV=test mise exec -- mix run --no-start -e
'Application.put_env(:symphony_elixir, :workflow_file_path,
System.fetch_env!("SYMPHONY_TEST_WORKFLOW")); Mix.Task.run("test",
["test/symphony_elixir/app_server_test.exs"])'`

Co-authored-by: Codex <noreply@openai.com>
#### Context

SSH failover currently happens inside `AgentRunner`, which can bypass
per-host caps and rerun a ticket on a second host after the first host
already started setup.

#### TL;DR

*Keep each worker run on one SSH host and let the orchestrator own
retries.*

#### Summary

- Remove `AgentRunner`'s internal cross-host failover loop
- Keep a worker lifetime pinned to one selected SSH host
- Add a regression test proving startup failure on one host does not
fall through to another

#### Alternatives

- Keep internal failover and classify retryable errors, but that
duplicates orchestrator scheduling logic
- Only patch per-host cap enforcement, but invisible cross-host reruns
would still risk duplicate side effects

#### Test Plan

- [x] `make -C elixir all`
- [x] `cd elixir && mise exec -- mix test
test/symphony_elixir/core_test.exs:1167
test/symphony_elixir/core_test.exs:1237
test/symphony_elixir/core_test.exs:1337 --seed 0`
## Summary
Pin floating external GitHub Actions workflow refs to immutable SHAs.

## Why
See the rationale doc:
https://docs.google.com/document/d/1qOURCNx2zszQ0uWx7Fj5ERu4jpiYjxLVWBWgKa2wTsA/edit?tab=t.0

## Validation
- `rg -n --pcre2
"uses:\s*(?!\./)(?!docker://)[^#\n]+@(?![0-9a-f]{40}(?:\s+#.*)?$)\S+"
.github/workflows`
- `git diff --check`
- `git diff --stat -- .github/workflows`
Slim redesign to replace the current fork's ~2500 lines of machine-layer
(DispatchResolver / Closeout / unit system) with ~240 lines of marker-based
stage routing + 4 closeout gates + 3 WORKFLOW.md files.

Upstream delta targeted at ~18 lines (PromptBuilder workflow_path opt).
When opts[:workflow_path] is provided, load that file explicitly via
Workflow.load/1 instead of Workflow.current(), and raise on blank body
instead of falling back to the default implement prompt. When absent,
existing behavior (Workflow.current + default_prompt fallback) is preserved.

Enables stage-specific WORKFLOW-*.md files in the upcoming StageOrchestrator.
See docs/design/stage-workflow-chain.md §2.7.

4/4 tests pass.
WORKFLOW-review.md: second-opinion reviewer prompt. Reads diff, optional
visual review on frontend paths, writes code-review marker in Linear
workpad BEGIN/END section. Must not commit; pre-marker requires clean
working tree + HEAD unchanged from stage start.

WORKFLOW-docfix.md: docs sweeper prompt. Only *.md / docs/**. no-updates
path writes marker immediately; updated path commits first (docs: prefix)
then writes marker with post-commit HEAD. Pre-marker requires clean
working tree.

Both files contain the shared Marker Contract (STRICT) block per §3.1.
See docs/design/stage-workflow-chain.md §3.2 and §3.3.
Parses symphony-marker fenced blocks inside the <!-- SYMPHONY-MARKERS-BEGIN -->
/ END region of a Linear workpad comment. Silently drops invalid blocks
(bad YAML, missing required field, wrong value). Archived blocks
(symphony-marker-archived) and blocks outside the region are ignored.

Public API:
- parse/2: returns valid markers in text order, filtered by issue_identifier
- review_pending?/1: last (review-request|code-review) in current round is a review-request
- latest_code_review/1: last code-review in current round
- latest_review_sha/1: reviewed_sha of latest CLEAN code-review
- docs_checked_matches_review?/1: last docs-checked in current round matches latest clean review sha

Schema: common fields (kind, round_id, stage_round, reviewed_sha, issue_identifier)
+ per-kind (code-review verdict, docs-checked docfix_outcome). Optional
code-review findings parsed best-effort — malformed findings do not drop
the whole marker.

See docs/design/stage-workflow-chain.md §2.5.

18/18 tests pass.
Summary:
- add an optional after_implement hook and workspace helper for
  stage-aware post-implement checks
- append review-request markers automatically after implement turns
  when the hook passes and the workspace is clean
- align WORKFLOW.md and tests with machine-owned marker handoff, and
  fix prompt_builder formatting plus marker_parser specs

Rationale:
- move the repetitive implement-to-review handoff into the harness so
  the prompt can stay focused on engineering work
- keep the change narrow by reusing the existing orchestrator exit
  path instead of introducing another execution pipeline
- leave implement failures in the active loop with workpad narration
  rather than forcing Rework for transient validation failures

Tests:
- mix test (279 tests, 0 failures, 2 skipped)
- make -C elixir all (currently fails on existing Credo debt on slim,
  including pre-existing StageCloseout/StageOrchestrator findings)

Co-authored-by: Codex <codex@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants