Skip to content

Promote dev → main: agent card + fleet-health filter + ceremony fixes (v0.7.22 candidate)#463

Closed
mabry1985 wants to merge 33 commits intomainfrom
dev
Closed

Promote dev → main: agent card + fleet-health filter + ceremony fixes (v0.7.22 candidate)#463
mabry1985 wants to merge 33 commits intomainfrom
dev

Conversation

@mabry1985
Copy link
Copy Markdown

@mabry1985 mabry1985 commented Apr 21, 2026

Promotes the post-#418 work since v0.7.21:

PR Description
#427 wire alert.* executors + startup validator
#428 stop world.state.# leak from ceremony snapshots
#429 rollcall skill source
#431 wire ceremony skill executors
#432 wire 5 pr-remediator skills
#433 register protomaker A2A agent
#434 drop overscoped subscribesTo from protomaker
#435 rename web_search → searxng_search in ava.yaml
#436 disable fleet_agent_stuck loop (mitigation)
#451 disable 4 more action.* loops (mitigation)
#452 per-action cooldown infrastructure (closes #437)
#454 route 4 ceremonies, disable 3 broken
#455 pre-dispatch target registry guard (closes #444)
#456 honor enabled:false on initial ceremony load (closes #453)
#457 wrap subscribe-spy push callbacks to return void (CI green)
#460 filter outcomes from synthetic actors (closes #459)
#462 agent card advertises canonical A2A endpoint (closes #461)

All landed and verified live on :dev (workstacean container) since 2026-04-20. Discord ops silent for the duration (cooldown + registry guard holding).

This is HITL-gated per protocol — operator approval required before merge.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added GitHub issues monitoring with severity tracking and automatic triage actions.
    • Added per-action cooldown mechanism to prevent redundant dispatches.
    • Enhanced environment configuration with new URL resolution variables for A2A discovery.
  • Bug Fixes

    • Corrected extension URIs across A2A specifications.
    • Fixed ceremony state topic isolation to prevent world-state namespace conflicts.
  • Documentation

    • Updated configuration guides and API references.
    • Enhanced agent skill and extension documentation.
  • Chores

    • Version bumped to 0.7.21.

github-actions Bot and others added 30 commits April 16, 2026 22:31
Acknowledges main's squash commit as an ancestor to prevent phantom conflicts on the next dev→main promotion.
All 27 references to https://protolabs.ai/a2a/ext/* changed to
https://proto-labs.ai/a2a/ext/* to match the actual domain. These
URIs are opaque identifiers (not published specs today) but should
reference a domain we own.

Breaking: external agents (Quinn, protoPen) whose cards declare the
old URI will stop matching the registry until they update. Filed on
Quinn to update her card.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Both services decommissioned. Containers stopped + removed.
Only reference in protoWorkstacean was the rollcall script.

Note: homelab-iac/stacks/ai/docker-compose.yml still has a
worldmonitor network reference at line 521 + service at line 833.
Needs separate cleanup in that repo.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ls (#411)

Two changes:

1. Replace the basic `web_search` tool (5 results, hardcoded engines)
   with `searxng_search` — adapted from rabbit-hole.io's full-surface
   SearXNG integration. New capabilities:
   - Category routing: general, news, science, it
   - Time range filtering: day, week, month, year
   - Bang syntax: !wp (Wikipedia), !scholar, !gh (GitHub)
   - Infoboxes, direct answers, suggestions in response
   - Configurable max_results (default 10, was 5)

   Updated in both bus-tools.ts (@protolabsai/sdk pattern) and
   deep-agent-executor.ts (LangChain pattern).

2. Give Ava three fleet health tools she was missing:
   - get_ci_health — CI success rates across repos
   - get_pr_pipeline — open PRs, conflicts, staleness
   - get_incidents — security/ops incidents

   Ava can now answer fleet health questions directly instead of
   always delegating to Quinn.

Ava's tool count: 10 → 13. Tool rename: web_search → searxng_search
(greenfield, no backward compat alias).

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
protoAgent is the new GitHub Template repo that replaces per-agent
A2A bootstrapping. Registers it as an active dev project owned by
Quinn, matching the shape of existing entries. Plane / GitHub
webhook / Discord provisioning remain TODO — those integrations
aren't configured in this deployment, so the onboard plugin
skipped them.

Co-authored-by: Josh <artificialcitizens@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Acknowledges main's squash commit as an ancestor to prevent phantom conflicts on the next dev→main promotion.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…e bug (#415)

Ava agent audit + overhaul:
- Tools: 10 → 22 (direct observation, propose_config_change, incident reporting)
- Skills: 3 → 7 (debug_ci_failures, fleet_incident_response, downshift_models, investigate_orphaned_skills)
- System prompt rewritten: self-improvement instructions, escalation policy, GOAP-dispatch playbook
- DeepAgentExecutor now applies skill-level systemPromptOverride (goal_proposal, diagnose_pr_stuck)
- Fix ceremony loader bug: disabled ceremonies were filtered out, preventing hot-reload from cancelling timers
- Clean up board.pr-audit.yaml (remove spurious action field, restore schedule, keep disabled)
- Update docs: README, deep-agent runtime, agent-skills reference, self-improving loop

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-approve (#417)

Two root causes prevented PRs from being auto-merged:

1. Dispatch gap: tier_0 short-circuit in ActionDispatcherPlugin completed
   all actions immediately without dispatching to agent.skill.request.
   Every action in actions.yaml is tier_0, so the fireAndForget path
   (which publishes the skill request) was unreachable dead code.
   Fix: tier_0 now falls through when meta.fireAndForget is set.

2. Approval gap: readyToMerge requires reviewState=approved, but
   auto-approve only covered dependabot/renovate/promote:/chore(deps.
   Human PRs, release PRs (chore(release), and github-actions PRs
   all lacked approved reviews and sat indefinitely.
   Fix: added app/github-actions to authors, chore(release, chore:,
   docs( to safe title prefixes.

Additionally, PrRemediatorPlugin now self-dispatches remediation on
every world.state.updated tick — checking for readyToMerge, dirty,
failingCi, and changesRequested PRs directly from cached domain data.
This removes the dependency on GOAP dispatch reaching the plugin via
pr.remediate.* topics (which were never published in production after
Arc 1.4 removed meta.topic routing).

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ss fleet (#419)

Adds a github_issues domain that polls /repos/{repo}/issues?state=open
for all managed projects and classifies by label (critical, bug,
enhancement). Three GOAP goals enforce issue hygiene:

  - issues.zero_critical (critical severity, max: 0)
  - issues.zero_bugs (high severity, max: 0)
  - issues.total_low (medium severity, max: 5)

Each goal has a matching alert action and a triage dispatch action
that invokes Ava's new issue_triage skill. The skill instructs Ava
to resolve, convert to board features, delegate, or close issues
with rationale — driving toward zero open issues across all repos.

Domain polls every 5 minutes (issue velocity is low, GitHub rate
limits are a concern with 6+ repos).

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
)

Two enhancements to reach issue zero:

manage_board list (#247):
  - Added GET /api/board/features/list endpoint proxying to Studio
  - Added "list" action to manage_board tool with status filter
  - Ava can now query "show me all blocked features" directly

a2a.trace extension (#359):
  - New langfuse-trace extension stamps a2a.trace metadata on all
    outbound A2A dispatches (traceId, callerAgent, skill, project)
  - Quinn reads this to link Langfuse traces across agent boundaries
  - Registered at startup alongside cost/confidence/blast extensions

Closes #247, closes #359.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: manage_board list action (#247) + a2a.trace extension (#359)

Two enhancements to reach issue zero:

manage_board list (#247):
  - Added GET /api/board/features/list endpoint proxying to Studio
  - Added "list" action to manage_board tool with status filter
  - Ava can now query "show me all blocked features" directly

a2a.trace extension (#359):
  - New langfuse-trace extension stamps a2a.trace metadata on all
    outbound A2A dispatches (traceId, callerAgent, skill, project)
  - Quinn reads this to link Langfuse traces across agent boundaries
  - Registered at startup alongside cost/confidence/blast extensions

Closes #247, closes #359.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(pr-remediator): case-insensitive auto-approve prefix matching

"Promote dev to main" titles start with capital P, but the prefix
check was case-sensitive against "promote:". Now lowercases the
title before matching so both "promote:" and "Promote" patterns
are caught.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…) (#427)

Closes the structural gap where 6+ tier_0 fire-and-forget alert skills had
no registered executor, causing SkillDispatcherPlugin to log "No executor
found" and silently drop the dispatch on every GOAP planning cycle.

- AlertSkillExecutorPlugin registers FunctionExecutors for all 24 bare
  alert.* actions in workspace/actions.yaml. Each translates the dispatch
  into a structured message.outbound.discord.alert event consumed by the
  existing WorldEngineAlertPlugin webhook routing.
- validate-action-executors.ts cross-checks the loaded ActionRegistry
  against the live ExecutorRegistry at startup. Surfaces every gap as a
  HIGH-severity Discord alert (goal platform.skills_unwired) and a loud
  console.error. Set WORKSTACEAN_STRICT_WIRING=1 to crash startup instead.
- action.issues_triage_bugs already routes correctly via meta.agentId=ava
  to the existing DeepAgentExecutor for Ava's issue_triage skill — no
  duplicate wiring needed (greenfield).

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CeremonyStateExtension was publishing { domain, data } envelopes on
`world.state.snapshot` after every ceremony completion. GoalEvaluatorPlugin
subscribes to `world.state.#`, treated the malformed payload as a WorldState,
and emitted a "Selector ... not found" violation for every loaded goal on
every ceremony tick (the cluster of 25+ violations at each :15/:30 boundary
in the live container logs). All listed selectors (flow.efficiency.ratio,
services.discord.connected, agent_health.agentCount, etc.) actually exist
in the producer output — the goals are correct.

Changes:
- Move ceremony snapshot publish to `ceremony.state.snapshot` (off the
  world.state.# namespace). Leaves the existing CeremoniesState shape and
  consumers unchanged.
- Goal evaluator: defensive payload shape check. Reject single-domain
  envelopes ({ domain, data }) and other non-WorldState payloads loud-once
  instead of generating one violation per goal.
- Goal evaluator: startup selector validator. After the first valid world
  state arrives, walk every loaded goal's selector and HIGH-log any that
  doesn't resolve. Re-armed on goals.reload / config.reload so drift caught
  by future hot-reloads also surfaces.
- Tests: regression guard that CeremonyStateExtension does not publish on
  world.state.#; goal evaluator ignores malformed payloads; validator
  catches an intentionally broken selector.

Closes #424

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…429)

The Claude Code skill was calling the homelab-iac copy of agent-rollcall.sh,
which had drifted from this repo's copy. The in-repo script knows about
the in-process DeepAgent runtime (Ava, protoBot, Tuner) and the current
A2A fleet; the homelab-iac copy still probed for the archived ava-agent
container and the deprecated protoaudio/protovoice services.

Single source of truth: this repo. The homelab-iac copy was separately
synced in homelab-iac@64e8dcf.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix: extension URIs use proto-labs.ai (not protolabs.ai) (#407)

All 27 references to https://protolabs.ai/a2a/ext/* changed to
https://proto-labs.ai/a2a/ext/* to match the actual domain. These
URIs are opaque identifiers (not published specs today) but should
reference a domain we own.

Breaking: external agents (Quinn, protoPen) whose cards declare the
old URI will stop matching the registry until they update. Filed on
Quinn to update her card.




* chore(release): bump to v0.7.20 (#408)



* chore: remove protoaudio + protovoice from agent rollcall (#410)

Both services decommissioned. Containers stopped + removed.
Only reference in protoWorkstacean was the rollcall script.

Note: homelab-iac/stacks/ai/docker-compose.yml still has a
worldmonitor network reference at line 521 + service at line 833.
Needs separate cleanup in that repo.




* feat: upgrade web_search → searxng_search + give Ava fleet health tools (#411)

Two changes:

1. Replace the basic `web_search` tool (5 results, hardcoded engines)
   with `searxng_search` — adapted from rabbit-hole.io's full-surface
   SearXNG integration. New capabilities:
   - Category routing: general, news, science, it
   - Time range filtering: day, week, month, year
   - Bang syntax: !wp (Wikipedia), !scholar, !gh (GitHub)
   - Infoboxes, direct answers, suggestions in response
   - Configurable max_results (default 10, was 5)

   Updated in both bus-tools.ts (@protolabsai/sdk pattern) and
   deep-agent-executor.ts (LangChain pattern).

2. Give Ava three fleet health tools she was missing:
   - get_ci_health — CI success rates across repos
   - get_pr_pipeline — open PRs, conflicts, staleness
   - get_incidents — security/ops incidents

   Ava can now answer fleet health questions directly instead of
   always delegating to Quinn.

Ava's tool count: 10 → 13. Tool rename: web_search → searxng_search
(greenfield, no backward compat alias).




* chore(projects): register protoAgent in projects.yaml (#414)

protoAgent is the new GitHub Template repo that replaces per-agent
A2A bootstrapping. Registers it as an active dev project owned by
Quinn, matching the shape of existing entries. Plane / GitHub
webhook / Discord provisioning remain TODO — those integrations
aren't configured in this deployment, so the onboard plugin
skipped them.




* chore(release): bump to v0.7.21 (#413)



* feat(ava): expand helm toolset, wire GOAP skills, fix ceremony disable bug (#415)

Ava agent audit + overhaul:
- Tools: 10 → 22 (direct observation, propose_config_change, incident reporting)
- Skills: 3 → 7 (debug_ci_failures, fleet_incident_response, downshift_models, investigate_orphaned_skills)
- System prompt rewritten: self-improvement instructions, escalation policy, GOAP-dispatch playbook
- DeepAgentExecutor now applies skill-level systemPromptOverride (goal_proposal, diagnose_pr_stuck)
- Fix ceremony loader bug: disabled ceremonies were filtered out, preventing hot-reload from cancelling timers
- Clean up board.pr-audit.yaml (remove spurious action field, restore schedule, keep disabled)
- Update docs: README, deep-agent runtime, agent-skills reference, self-improving loop




* fix(pr-remediator): close dispatch gap — self-dispatch + broaden auto-approve (#417)

Two root causes prevented PRs from being auto-merged:

1. Dispatch gap: tier_0 short-circuit in ActionDispatcherPlugin completed
   all actions immediately without dispatching to agent.skill.request.
   Every action in actions.yaml is tier_0, so the fireAndForget path
   (which publishes the skill request) was unreachable dead code.
   Fix: tier_0 now falls through when meta.fireAndForget is set.

2. Approval gap: readyToMerge requires reviewState=approved, but
   auto-approve only covered dependabot/renovate/promote:/chore(deps.
   Human PRs, release PRs (chore(release), and github-actions PRs
   all lacked approved reviews and sat indefinitely.
   Fix: added app/github-actions to authors, chore(release, chore:,
   docs( to safe title prefixes.

Additionally, PrRemediatorPlugin now self-dispatches remediation on
every world.state.updated tick — checking for readyToMerge, dirty,
failingCi, and changesRequested PRs directly from cached domain data.
This removes the dependency on GOAP dispatch reaching the plugin via
pr.remediate.* topics (which were never published in production after
Arc 1.4 removed meta.topic routing).




* feat(goap): issue_zero domain + goals — track open GitHub issues across fleet (#419)

Adds a github_issues domain that polls /repos/{repo}/issues?state=open
for all managed projects and classifies by label (critical, bug,
enhancement). Three GOAP goals enforce issue hygiene:

  - issues.zero_critical (critical severity, max: 0)
  - issues.zero_bugs (high severity, max: 0)
  - issues.total_low (medium severity, max: 5)

Each goal has a matching alert action and a triage dispatch action
that invokes Ava's new issue_triage skill. The skill instructs Ava
to resolve, convert to board features, delegate, or close issues
with rationale — driving toward zero open issues across all repos.

Domain polls every 5 minutes (issue velocity is low, GitHub rate
limits are a concern with 6+ repos).




* feat: manage_board list action (#247) + a2a.trace extension (#359) (#420)

Two enhancements to reach issue zero:

manage_board list (#247):
  - Added GET /api/board/features/list endpoint proxying to Studio
  - Added "list" action to manage_board tool with status filter
  - Ava can now query "show me all blocked features" directly

a2a.trace extension (#359):
  - New langfuse-trace extension stamps a2a.trace metadata on all
    outbound A2A dispatches (traceId, callerAgent, skill, project)
  - Quinn reads this to link Langfuse traces across agent boundaries
  - Registered at startup alongside cost/confidence/blast extensions

Closes #247, closes #359.




* fix(pr-remediator): case-insensitive auto-approve prefix matching (#421)

* feat: manage_board list action (#247) + a2a.trace extension (#359)

Two enhancements to reach issue zero:

manage_board list (#247):
  - Added GET /api/board/features/list endpoint proxying to Studio
  - Added "list" action to manage_board tool with status filter
  - Ava can now query "show me all blocked features" directly

a2a.trace extension (#359):
  - New langfuse-trace extension stamps a2a.trace metadata on all
    outbound A2A dispatches (traceId, callerAgent, skill, project)
  - Quinn reads this to link Langfuse traces across agent boundaries
  - Registered at startup alongside cost/confidence/blast extensions

Closes #247, closes #359.



* fix(pr-remediator): case-insensitive auto-approve prefix matching

"Promote dev to main" titles start with capital P, but the prefix
check was case-sensitive against "promote:". Now lowercases the
title before matching so both "promote:" and "Promote" patterns
are caught.



---------




---------

Co-authored-by: Josh Mabry <31560031+mabry1985@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Josh <artificialcitizens@gmail.com>
…th_discord (#431)

Adds CeremonySkillExecutorPlugin — registers FunctionExecutors that bridge
GOAP `ceremony.*` actions to the matching `ceremony.<id>.execute` topic
CeremonyPlugin already listens for. Without this bridge,
SkillDispatcherPlugin dropped every dispatch with "No executor found …"
and (post-#427) emitted HIGH platform.skills_unwired alerts every cycle.

Mirrors the alert-skill-executor-plugin pattern from #427 — explicit
action→ceremony id mapping, install order matters (after registry,
before skill-dispatcher).

Partial fix for #430.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…432)

Closes the structural gap where 5 actions in workspace/actions.yaml route
to handlers in PrRemediatorPlugin but had no registered executor:
  - action.pr_update_branch    → pr.remediate.update_branch
  - action.pr_merge_ready      → pr.remediate.merge_ready
  - action.pr_fix_ci           → pr.remediate.fix_ci
  - action.pr_address_feedback → pr.remediate.address_feedback
  - action.dispatch_backmerge  → pr.backmerge.dispatch

Before this change SkillDispatcherPlugin logged "No executor found" and
dropped the dispatch every GOAP cycle. After PR #427's startup validator
the same gap raised platform.skills_unwired HIGH every tick.

Wiring follows the AlertSkillExecutorPlugin pattern from #427:
  - PrRemediatorSkillExecutorPlugin registers FunctionExecutors that
    publish on the existing pr-remediator subscription topics, keeping
    "bus is the contract" — no plugin holds a reference to the other.
  - Executors are fire-and-forget per actions.yaml meta. They return a
    successful SkillResult immediately; pr-remediator's handler runs
    asynchronously on the bus subscription.
  - Install order matches alert-skill-executor: AFTER ExecutorRegistry
    construction, BEFORE skill-dispatcher.

For action.pr_merge_ready specifically, the meta.hitlPolicy
(ttlMs: 1800000, onTimeout: approve) is now honoured. The executor
forwards meta into the trigger payload; _handleMergeReady extracts it
via _extractHitlPolicy and passes it to _emitHitlApproval, which
populates HITLRequest.{ttlMs, onTimeout}. HITLPlugin already auto-
publishes a synthetic approve response when onTimeout=approve fires.

Closes part of #430. Ceremony + protoMaker actions ship in a separate PR.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
protoMaker (apps/server in protoLabsAI/ava) has been A2A-ready for a while —
serves agent-card.json with 10 skills including the two referenced by
unwired GOAP actions:
  - action.protomaker_triage_blocked → skill board_health
  - action.protomaker_start_auto_mode → skill auto_mode

Both actions targeted [protomaker], but no agent named "protomaker" was
registered, so the dispatcher couldn't route. Adding the entry closes
the routing gap; A2AExecutor's existing target-matching does the rest.

Endpoint: http://automaker-server:3008/a2a (verified from inside the
workstacean container with AVA_API_KEY → JSON-RPC 2.0 response).

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#433 added `subscribesTo: message.inbound.github.#` to the new protomaker
agent registration, copy-pasted from quinn's pattern. That was wrong:
protoMaker is reached via explicit GOAP `targets: [protomaker]` dispatches
(action.protomaker_triage_blocked, action.protomaker_start_auto_mode), not
as a broadcast inbound listener.

Quinn already subscribes to all GitHub inbound and dispatches `bug_triage`
on protoMaker's behalf. Having protomaker subscribe to the same broadcast
topic is one of the contributing paths to the duplicate-triage spam loop
filed as protoLabsAI/protoMaker#3503 (the root cause is Quinn's handler
not being idempotent — but this cleanup removes one extra firing path).

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…me (#435)

PR #411 renamed the tool from web_search to searxng_search in
src/agent-runtime/tools/bus-tools.ts (line 393), but ava.yaml still
declared the old name. Result: at startup the runtime warns
"agent ava declares unknown tools: web_search" and Ava ends up with no
search capability — when asked, she explicitly responds "I don't have a
searxng_search or web_search tool in my current toolkit."

This is the config side of the half-finished rename that PR #411 missed.
After this lands and workstacean restarts, Ava's toolkit should include
searxng_search and the unknown-tools warning should be empty for her.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…up (#436)

Two GOAP actions on goal fleet.no_agent_stuck were spamming Discord
because they had no `effects` and no cooldown:

  - alert.fleet_agent_stuck → posts a Discord alert
  - action.fleet_incident_response → dispatches Ava to file an incident,
    page on-call, and pause routing

Observed 2026-04-20: when auto-triage-sweep hit 100% failure rate on
bug_triage (cascading from the non-idempotent handler in
protoLabsAI/protoMaker#3503), GOAP re-fired both actions every planning
cycle. Ava filed INC-003 through INC-009 in ~30 seconds, each posting to
Discord. The pause routing succeeded but the rolling 1h failure rate
metric doesn't drop instantly, so the goal stayed violated and the loop
kept re-firing.

Disabling both actions until proper dedup lands. Reinstate when:
  1. action.fleet_incident_response gains a cooldown OR an `effects`
     marker that satisfies the goal for the cooldown window
  2. Ava's fleet_incident_response skill checks for an existing open
     incident on the same agent before filing a new one
  3. alert.fleet_agent_stuck gains per-agent rate limiting

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (#451)

After #436 disabled action.fleet_incident_response, observed similar
loop spam from FOUR more action.* dispatches that share the same
architectural bug (no cooldown, no satisfying effects → re-fire every
GOAP cycle while goal stays violated):

  action.fleet_investigate_orphaned_skills  — Ava posts orphaned-skill
                                               diagnosis to Discord ops
                                               on every cycle (8+ posts
                                               in 1 min observed)
  action.issues_triage_critical             — ~447 fires in 30 min
  action.issues_triage_bugs                 — ~447 fires in 30 min;
                                               compounds with #3503
                                               by re-triaging same issues
  action.fleet_downshift_models             — same pattern when cost
                                               exceeds budget

ALL share the same pattern as the alert.* actions and the original
fleet_incident_response: effects: [] + no cooldownMs + persistent goal
violation = infinite re-fire.

Mitigation only — Ava temporarily can't auto-investigate orphaned
skills or auto-triage new GitHub issues. Reinstate when issue #437
ships action-level cooldown (meta.cooldownMs) or proper effects-with-TTL.
The alert.* siblings (24 of them) also have this bug but are
non-impacting today because DISCORD_WEBHOOK_ALERTS is unset and
WorldEngineAlertPlugin drops them silently — fixing #437 will cover
both at the dispatcher level.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GOAP actions with `effects: []` re-fired every planning cycle (~3s) while
their goal stayed violated. Two prod fires (PR #436, PR #451) had to
disable 6 actions before this lands. Restoring autonomy.

ActionDispatcherPlugin now honors `meta.cooldownMs` on every action. When
an action with a positive cooldownMs fires, the dispatcher records its
timestamp; subsequent dispatches of the same action id within the window
are dropped BEFORE the WIP queue and BEFORE the executor. Single chokepoint
covers both alert.* (FunctionExecutor → Discord) and action.* (DeepAgent /
A2A) paths. Drops log a fail-fast diagnostic with action id, age, and
remaining window, plus bump a new `cooldown_dropped` telemetry event.

Cooldown bucket is keyed on action id alone — per-target keying isn't
needed because each GOAP action targets one situation. Greenfield shape:
absence of `meta.cooldownMs` means "no cooldown" naturally, no flag.

Defaults applied to workspace/actions.yaml:
  - alert.*                          → 15 min
  - action.* with skillHint          → 30 min (agent work is expensive)
  - action.pr_*                      → 5 min  (remediation must stay responsive)
  - ceremony.*                       → 30 min (treated as skill dispatch)
  - action.dispatch_backmerge        → none (in-handler per-repo cooldown
                                              in pr-remediator stays authoritative)

Re-enables the 6 actions disabled by PRs #436 and #451:
alert.fleet_agent_stuck, action.fleet_incident_response,
action.fleet_downshift_models, action.fleet_investigate_orphaned_skills,
action.issues_triage_critical, action.issues_triage_bugs.

Tests:
  - action-dispatcher unit tests cover: blocks repeats within window;
    A's cooldown does not affect B; window expiry admits next dispatch;
    absent cooldownMs and cooldownMs<=0 mean no throttling.
  - End-to-end test spam-publishes 100 violations of fleet.no_skill_orphaned
    and asserts exactly 1 dispatch reaches the executor.
  - bun test: 1023 / 1023 pass.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…roken (#454)

Audit of all 10 workspace/ceremonies/*.yaml against the live agent skill
registry surfaced 7 with skill-target mismatches that fail every fire:

ROUTED to correct skill owner:
  board.cleanup     skill=board_audit    targets [all] → [quinn]
  board.health      skill=board_health   targets [all] → [protomaker]
  daily-standup     skill=board_audit    targets [ava] → [quinn]
  health-check      skill=board_audit    targets [ava] → [quinn]

DISABLED (no agent advertises the skill):
  agent-health      skill=health_check   — no health_check anywhere
  board.retro       skill=pattern_analysis — no pattern_analysis anywhere
  service-health    skill=health_check   — same as agent-health

Quinn owns board_audit (and bug_triage / pr_review / qa_report).
Protomaker owns board_health (and 9 other apps/server skills).
The two `health_check`-keyed ceremonies were redundant anyway — the
agent_health and services world-state domains poll the same data every
60s and expose it on /api/agent-health and /api/services.

Re-enable the disabled three with a real skill if a periodic ANNOUNCE
to Discord is wanted (small `*_health_report` skill on protobot would
do it).

Workspace is bind-mounted, so the live container picks this up on
restart with no rebuild.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The GOAP planner could dispatch a skill to a target agent (e.g.
auto-triage-sweep, user) where the named target wasn't in the live
ExecutorRegistry. The dispatcher fired anyway, the executor errored
404-style, and the failure cascaded into stuck work items + duplicate
incident filings (INC-003 through INC-018, ~93 work items in error
state). The cooldown work in #437 / #452 masked the symptom but the
structural gap remained.

ActionDispatcherPlugin now takes the shared ExecutorRegistry handle and
runs `_admitOrTargetUnresolved` immediately after the cooldown check and
BEFORE the WIP queue / executor. Same chokepoint pattern as cooldown:

  - target absent (skill-routed)            → admit
  - target = "all" (broadcast sentinel)     → admit
  - target matches a registration agentName → admit
  - target unresolvable                     → drop, log loud, telemetry
                                              bump `target_unresolved`

Drops surface action id, the unresolvable target, AND the agents that
DO exist so the routing mistake is immediately diagnosable. The only
opt-out is the broadcast sentinel "all" — there is no flag, no
enabled-bool. Greenfield-strict shape.

Wiring: src/index.ts passes the shared executorRegistry into the
dispatcher factory. Test fixtures that don't exercise target routing
omit the registry and the check is skipped — so existing tests stay
green without modification.

Audit of workspace/actions.yaml: only `protomaker` appears as an active
agentId target (twice). That agent is registered in workspace/agents.yaml.
The historical bad targets (`auto-triage-sweep`, `user`) were removed by
prior cleanup; this PR ensures any future regression fails closed.

Tests added in src/plugins/action-dispatcher-plugin.test.ts:
  - admits when target is registered
  - drops when target is unregistered + bumps `target_unresolved`
  - drops on mixed-target intent (single-target shape today via meta.agentId)
  - admits when target is the "all" broadcast sentinel
  - admits when meta.agentId is absent (skill-routed dispatch)
  - admits when no registry is wired (legacy test fixtures)

bun test: 1029 / 1029 pass.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #415 fixed the hot-reload path so flipping a ceremony to disabled
cancelled its timer, but the initial-load path still added every YAML
entry (disabled or not) to the in-memory registry. Two consequences:

  1. External `ceremony.<id>.execute` triggers (from
     CeremonySkillExecutorPlugin's GOAP bridge) found the disabled
     ceremony in the registry and fired it anyway.
  2. After hot-reload flipped a ceremony enabled→disabled, the entry
     stayed in the registry — same external-trigger leak.

Fix: filter `enabled === false` at every place that lands a ceremony in
the registry (initial install, hot-reload new-file path, hot-reload
changed-file path). Disabled ceremonies are loaded by the YAML parser
(so the changed-file path can detect a flip) but never reach the
registry, never schedule a timer, and cannot be resurrected by an
external trigger. Operators see a `Skipping disabled ceremony: <id>`
log line for each skip — fail-loud per project convention.

Greenfield: no flag, no toggle. enabled:false means disabled
everywhere.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mabry1985 and others added 3 commits April 20, 2026 16:08
13 TS2322 errors snuck through #452 and #455 because the CI test job
has been failing on type-check while build-and-push (the gate that
actually publishes :dev) is a separate workflow that runs on push.
Result: main/dev were green for container publish even though tsc
--noEmit was returning exit 2. Not visible in PR merge gates either
because test.conclusion=failure + build-and-push.conclusion=skipping
still resolved to a mergeable state.

Pattern of the 13 errors:

  bus.subscribe(T, "spy", (m) => requests.push(m));
                                  ^^^^^^^^^^^^^^^^
  Array.push() returns number, but the subscribe callback expects
  void | Promise<void>. Fix: wrap the body in a block so the arrow
  returns void:

  bus.subscribe(T, "spy", (m) => { requests.push(m); });

Applied via sed-style regex across both test files. 1029 tests still
pass (bun test). `bun run tsc --noEmit` now exits 0.

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… whitelist (closes #459) (#460)

AgentFleetHealthPlugin now takes an optional ExecutorRegistry (mirrors the
#455 wiring on ActionDispatcherPlugin). On each inbound autonomous.outcome,
`systemActor` is checked against `executorRegistry.list().map(r => r.agentName)`:

- Registered agent (ava, quinn, protomaker, ...) → aggregated in agents[]
  (existing shape, no behavior change).
- Anything else (pr-remediator, auto-triage-sweep, goap, user, ...) →
  routed to a separate systemActors[] bucket. No longer pollutes
  agentCount / maxFailureRate1h / orphanedSkillCount, so Ava's sitreps
  stop surfacing plugin names as "stuck agents".

First time a synthetic actor is seen, the plugin emits a one-time
console.warn naming the actor + skill (fail-fast and loud, per policy)
so operators know what's being filtered. No flag — same greenfield /
chokepoint discipline as #437 (cooldown) and #444 (target guard).

Scope note on `_default`: this plugin keys on outcome `systemActor`,
not registry `agentName`. `_default` only appears in /api/agent-health
(the registry-driven view). Nothing currently publishes an outcome with
`systemActor: "_default"`, so it doesn't reach agents[] here. If it
ever did, the new whitelist would drop it to systemActors[] — the right
outcome.

Verification plan (post-deploy):
  curl -s -X POST http://localhost:3000/v1/chat/completions \\
    -H 'Content-Type: application/json' \\
    -d '{"model":"ava","messages":[{"role":"user","content":"fleet sitrep"}]}'

Expected: no `pr-remediator`, `auto-triage-sweep`, or `user` in agents[].

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oard URL (#462)

The card at /.well-known/agent-card.json was advertising a URL like
http://ava:8081/a2a — host-mapped to the Astro dashboard, which 404s on
/a2a. Spec-compliant clients (@a2a-js/sdk and friends) doing card
discovery → POST to card.url could not reach the actual A2A endpoint;
the voice agent team papered over this by switching to the
/v1/chat/completions shim.

Fix: derive the card's `url` from variables that describe where the A2A
endpoint actually lives.

  1. WORKSTACEAN_PUBLIC_BASE_URL (e.g. https://ava.proto-labs.ai) →
     ${publicBase}/a2a. The canonical Cloudflare-fronted URL for
     external/Tailscale callers.
  2. Otherwise, http://${WORKSTACEAN_INTERNAL_HOST ?? "workstacean"}:${WORKSTACEAN_HTTP_PORT}/a2a
     — docker-network service name + the actual API port.

Also populate `additionalInterfaces` with the JSON-RPC transport at the
same URL so spec-compliant clients can pick deterministically. Drop the
WORKSTACEAN_BASE_URL coupling in the card builder — that variable
remains the externally-reachable URL stamped into A2A push-notification
callbacks (different concern, separate documentation).

Closes #461

Co-authored-by: Automaker <automaker@localhost>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

📝 Walkthrough

Walkthrough

This PR implements comprehensive fixes for multiple critical issues (#437, #444, #453, #459) involving action deduplication, executor validation, ceremony lifecycle management, and fleet health reporting. Core changes include: action-level cooldown throttling (cooldownMs), pre-dispatch executor registry validation, disabled-ceremony enforcement during initial load, synthetic-actor filtering in fleet health, extension URI hostname corrections, agent-card URL resolution separation, and new executor plugins for alerts/ceremonies/PR remediator skills. Workspace definitions are updated to reflect routing changes, new agents (protomaker), and GitHub issue tracking integration.

Changes

Cohort / File(s) Summary
Extension URI hostname corrections
src/executor/extensions/{blast,confidence,cost,effect-domain,hitl-mode}.ts, src/executor/task-tracker.ts, docs/extensions/*
Updated all A2A extension URI constants and documentation from protolabs.ai to proto-labs.ai hostname.
Action-level cooldown mechanism
src/planner/types/action.ts, src/schemas/yaml-schemas.ts, src/plugins/action-dispatcher-plugin.ts, src/plugins/action-dispatcher-plugin.test.ts, src/telemetry/telemetry-service.ts, workspace/actions.yaml
Added per-action meta.cooldownMs field for dispatch throttling; implemented _admitOrCooldown gate and lastDispatchedAt tracking in dispatcher; added cooldown_dropped telemetry event.
Pre-dispatch executor registry validation
src/planner/validate-action-executors.ts, src/planner/__tests__/validate-action-executors.test.ts, src/plugins/action-dispatcher-plugin.ts, src/plugins/action-dispatcher-plugin.test.ts
Introduced validateActionExecutors, findUnwiredActions, and UnwiredActionsError to cross-check actions against live executor registry; added _admitOrTargetUnresolved dispatch gate with target_unresolved telemetry.
Agent fleet health synthetic actor filtering
src/plugins/agent-fleet-health-plugin.ts, src/plugins/agent-fleet-health-plugin.test.ts
Split outcome aggregation into real-agent (agentWindows) and synthetic-actor (systemActorWindows) buckets; added SystemActorOutcomeSummary and systemActors[] to snapshot; updated _isRegisteredAgent() whitelist using executor registry.
Ceremony disabled-state enforcement
src/loaders/ceremonyYamlLoader.ts, src/loaders/__tests__/ceremonyYamlLoader.test.ts, src/plugins/CeremonyPlugin.ts, src/plugins/__tests__/CeremonyPlugin.test.ts
Modified loader to return disabled ceremonies with enabled: false instead of filtering; added initial-load and hot-reload guards in CeremonyPlugin to skip scheduling disabled entries; added test coverage for enabled↔disabled transitions.
Ceremony state topic separation
src/world/extensions/CeremonyStateExtension.ts, src/world/extensions/__tests__/CeremonyStateExtension.test.ts, docs/reference/ceremony-plugin.md
Changed publication topic from world.state.snapshot to ceremony.state.snapshot; added regression test ensuring ceremony state doesn't leak into world-state namespace.
Agent-card URL resolution restructuring
src/api/agent-card.ts, src/api/__tests__/agent-card.test.ts, src/config/env.ts, .env.dist, docs/reference/env-vars.md, docs/reference/http-api.md
Separated WORKSTACEAN_BASE_URL (A2A callback only) from new WORKSTACEAN_PUBLIC_BASE_URL (canonical A2A discovery) and WORKSTACEAN_INTERNAL_HOST (docker-network fallback); updated card URL resolution to use resolveA2aUrl() and include additionalInterfaces.
Alert skill executor plugin
src/plugins/alert-skill-executor-plugin.ts, src/plugins/__tests__/alert-skill-executor-plugin.test.ts
Added new AlertSkillExecutorPlugin registering FunctionExecutor handlers for alert.* GOAP skills; publishes to message.outbound.discord.alert topic with severity/headline metadata.
Ceremony skill executor plugin
src/plugins/ceremony-skill-executor-plugin.ts, src/plugins/__tests__/ceremony-skill-executor-plugin.test.ts
Added CeremonySkillExecutorPlugin bridging GOAP ceremony.* actions to ceremony.<id>.execute trigger events via BusMessage.
PR remediator skill executor plugin
src/plugins/pr-remediator-skill-executor-plugin.ts, src/plugins/__tests__/pr-remediator-skill-executor-plugin.test.ts
Added PrRemediatorSkillExecutorPlugin routing GOAP action.pr_* and action.dispatch_backmerge skills to PR remediator bus topics with fire-and-forget dispatch.
PR remediator enhancements
lib/plugins/pr-remediator.ts, src/pr-remediator.test.ts
Added auto-approve patterns, continuous self-dispatch from cached state, per-action HITL policy extraction/forwarding, and policy-aware expiry in merge-ready error path.
Goal evaluator payload validation
src/plugins/goal_evaluator_plugin.ts, __tests__/goal_evaluator_plugin.test.ts
Added _looksLikeWorldState() type guard, selector validation (_validateLoadedGoalSelectors), and defensive payload inspection to reject malformed world.state.# events.
Bus tools expansion
src/agent-runtime/tools/bus-tools.ts
Added list action to manage_board tool with GET endpoint; replaced web_search with searxng_search supporting optional category, time_range, max_results parameters and richer response fields.
DeepAgent executor refinements
src/executor/executors/deep-agent-executor.ts
Updated searxng_search tool schema; implemented per-skill systemPromptOverride resolution in execute() for world-state prompt injection.
Langfuse trace extension
src/executor/extensions/langfuse-trace.ts
Added new extension URI (https://proto-labs.ai/a2a/ext/trace-v1) and registerLangfuseTraceExtension() stamping trace metadata (traceId, callerAgent, skill, project) into context.
API board features list endpoint
src/api/board.ts
Added GET /api/board/features/list route proxying to Studio's feature list API with query-param forwarding and error handling.
GitHub issues polling API
src/api/github.ts
Added GET /api/github-issues endpoint aggregating open issues per project with label/timestamp/author metadata and zero-fallback behavior.
Root startup wiring
src/index.ts
Added registration of executor plugins (alert, ceremony, PR remediator skill executors) and Langfuse trace extension; wired ExecutorRegistry into ActionDispatcher and AgentFleetHealth; added startup validation via validateActionExecutors() with optional strict mode; registered github_issues polling domain.
Deep-agent runtime documentation
docs/integrations/runtimes/deep-agent.ts
Updated system prompt injection behavior, executor tool groups, and example ava.yaml configuration (model, maxTurns, tools/skills).
Agent skills schema
docs/reference/agent-skills.md
Removed chain.{agent,skill} fields; added systemPromptOverride optional field; restructured documentation with explicit name/description/keywords/systemPromptOverride table.
Workspace agent/project/ceremony configuration
workspace/agents.yaml, workspace/agents/ava.yaml, workspace/projects.yaml, workspace/ceremonies/*.yaml
Added protomaker agent; expanded Ava's tools/skills for orchestration/observation/act; added protoagent project; disabled ceremonies (agent-health, board.pr-audit, board.retro, service-health); routed ceremonies to specific agents (quinn for board.cleanup, board.health, daily-standup, health-check).
GOAP actions and goals
workspace/actions.yaml, workspace/goals.yaml
Added meta.cooldownMs across all actions (alerts 15min, skill dispatch 30min, PR remediation 5min, ceremonies 30min); added "Issue Zero" actions for GitHub critical/bug/total issue thresholds; added three new goals (issues.zero_critical, issues.zero_bugs, issues.total_low).
Documentation updates
README.md, .claude/commands/rollcall.md, docs/explanation/self-improving-loop.md
Updated to reflect DeepAgent executor (not ProtoSdkExecutor), expanded Ava operational skills, new A2A extension URIs, environment variable semantics, and self-improvement loop with chronic-failure-driven goal/config proposals.
Package versions
package.json, dashboard/package.json
Bumped both to version 0.7.21.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • protoquinn

Poem

🐰 Cooldowns, guards, and synthetic whispers—
The dispatcher now knows which actors matter,
No more duplicate incidents clattering,
Each action waits its turn, ceremonies sleep clean,
And fleet health sees only the real in between!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch dev

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (4)
src/executor/extensions/hitl-mode.ts (1)

23-23: Consider centralizing the extension URI base to avoid host drift.

You’re updating this hostname in multiple extension files; a shared constant/helper would reduce future mismatches.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/executor/extensions/hitl-mode.ts` at line 23, HITL_MODE_URI is hardcoded
causing host drift risk; extract the common host into a shared constant (e.g.,
EXTENSION_BASE_URI) in a central module (like an extensions/constants or config
file) and rebuild HITL_MODE_URI by concatenating that base with the path (e.g.,
`${EXTENSION_BASE_URI}/a2a/ext/hitl-mode-v1`); update the symbol HITL_MODE_URI
in src/executor/extensions/hitl-mode.ts to import and use the centralized base
and change other extension files to import the same constant so all extension
URIs are composed from one authoritative source.
src/planner/__tests__/end-to-end-loop.test.ts (1)

146-152: Make the async assertion condition-based to reduce flakiness.

The fixed setTimeout(50) can intermittently fail on slower CI nodes. Prefer waiting until the expected condition is met (with an upper timeout).

Proposed stabilization diff
-    await new Promise((r) => setTimeout(r, 50));
+    const deadline = Date.now() + 1000;
+    while (skillRequests.length < 1 && Date.now() < deadline) {
+      await new Promise((r) => setTimeout(r, 10));
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/planner/__tests__/end-to-end-loop.test.ts` around lines 146 - 152,
Replace the fixed sleep with a condition-based wait: instead of await new
Promise((r) => setTimeout(r, 50)), poll/wait until the expected condition is
true (skillRequests has length 1) with a reasonable upper timeout (e.g., 1–2s)
to avoid flaky CI; target the assertion around skillRequests in this test (and
consider also checking planner inFlightGoals or ActionDispatcher cooldown state
if relevant) so the test only proceeds when skillRequests.length === 1 or the
timeout is reached.
src/plugins/agent-fleet-health-plugin.ts (1)

320-334: Consider caching the registered agent list for performance.

_isRegisteredAgent is called on every outcome event and iterates executorRegistry.list() each time. For high-throughput scenarios, this could become a bottleneck.

♻️ Optional: Cache known agents periodically
+  private knownAgentsCache: Set<string> = new Set();
+  private knownAgentsCacheTime = 0;
+  private static readonly CACHE_TTL_MS = 10_000; // 10 seconds

   private _isRegisteredAgent(actor: string): boolean {
     if (!this.executorRegistry) return true;
-    const known = this.executorRegistry
-      .list()
-      .map(r => r.agentName)
-      .filter((n): n is string => typeof n === "string" && n.length > 0);
-    return known.includes(actor);
+    const now = Date.now();
+    if (now - this.knownAgentsCacheTime > AgentFleetHealthPlugin.CACHE_TTL_MS) {
+      this.knownAgentsCache = new Set(
+        this.executorRegistry
+          .list()
+          .map(r => r.agentName)
+          .filter((n): n is string => typeof n === "string" && n.length > 0)
+      );
+      this.knownAgentsCacheTime = now;
+    }
+    return this.knownAgentsCache.has(actor);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugins/agent-fleet-health-plugin.ts` around lines 320 - 334, The
_isRegisteredAgent method currently calls this.executorRegistry.list() on every
outcome event which is expensive; cache the known agent names in a Set (e.g.,
this._knownAgentNames: Set<string>) and change _isRegisteredAgent to check that
Set instead of re-listing; populate/update the Set from executorRegistry.list()
when the plugin initializes, and refresh it either on executorRegistry change
events (if available) or on a short timer/periodic interval to keep it in sync
with the registry.
src/planner/__tests__/validate-action-executors.test.ts (1)

141-166: Verify mock usage aligns with test guidelines.

The test uses mock() to create a minimal bus stub. While the guideline states "no mocks," this appears to be a lightweight interface stub rather than mocking implementation details or LLM calls. If strict adherence is required, consider using the same makeBus() pattern from the ceremony-skill-executor tests.

♻️ Optional: Use makeBus() pattern for consistency
-    const published: BusMessage[] = [];
-    const bus = {
-      subscribe: mock(() => "sub-id"),
-      unsubscribe: mock(() => {}),
-      publish: mock((_topic: string, msg: BusMessage) => { published.push(msg); }),
-      topics: () => [],
-    };
+    function makeBus() {
+      const published: BusMessage[] = [];
+      return {
+        published,
+        subscribe(_topic: string, _name: string, _handler: (msg: BusMessage) => void) {
+          return `sub-${Math.random()}`;
+        },
+        unsubscribe(_id: string) {},
+        publish(_topic: string, msg: BusMessage) { published.push(msg); },
+        topics() { return []; },
+      };
+    }
+    const bus = makeBus();

-    validateActionExecutors(actions, executors, { bus: bus as never });
+    validateActionExecutors(actions, executors, { bus: bus as never });

-    const alerts = published.filter(m => m.topic === "message.outbound.discord.alert");
+    const alerts = bus.published.filter(m => m.topic === "message.outbound.discord.alert");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/planner/__tests__/validate-action-executors.test.ts` around lines 141 -
166, Replace the ad-hoc mock bus stub with the shared test bus factory used
elsewhere (e.g., makeBus) so the test follows the “no mocks” guideline: locate
the test in validate-action-executors.test.ts that constructs the bus object
(used by validateActionExecutors) and swap the mock-based
subscribe/unsubscribe/publish/topics implementation for a makeBus() instance
that records published messages (so you can still inspect published messages),
keeping assertions on ActionRegistry, ExecutorRegistry, noopExecutor, and the
published array intact; ensure the created bus implements publish to push
BusMessage objects and exposes topics/subscribe/unsubscribe as the other tests
expect.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/commands/rollcall.md:
- Line 5: The command in the README uses a machine-specific absolute path ("bash
/home/josh/dev/protoWorkstacean/scripts/agent-rollcall.sh"); change it to a
repo-relative invocation (e.g., "bash ./scripts/agent-rollcall.sh" or
"scripts/agent-rollcall.sh") so the instruction is portable across machines and
consistent with the repo-local guidance elsewhere; update every occurrence of
that absolute path (the same string) to the repo-relative form.

In `@docs/integrations/runtimes/deep-agent.md`:
- Line 51: The docs YAML example still references the old tool name
"web_search"; update the example so the tool name is "searxng_search" (to match
the executor rename in deep-agent-executor.ts where the tool was renamed), and
ensure any agent "tools" arrays in the example use "searxng_search" instead of
"web_search" so the search tool is available at runtime.
- Line 116: The table entry currently lists the tool as `web_search` but the
implementation uses `searxng_search`; update the docs table row (the entry
containing `web_search` and the description "SearXNG `/search` | Quick web
search (5 results)") to replace the `web_search` symbol with `searxng_search` so
the documentation matches the runtime/tool name used by the code.

In `@src/api/github.ts`:
- Around line 466-468: The handler currently treats missing GITHUB_TOKEN as a
successful empty-result by returning Response.json({...}), which hides
data-unavailable state; change the early-return that checks repoList and
GITHUB_TOKEN so that when GITHUB_TOKEN is falsy you return a non-2xx response
(e.g., Response.json({ message: "GitHub auth missing", availability: false }, {
status: 503 })) or at minimum include an explicit availability flag
(availability: false) in the returned payload; update the check around repoList
and GITHUB_TOKEN and the Response.json call (symbols: GITHUB_TOKEN, repoList,
Response.json) so consumers can detect and gate on missing auth.
- Around line 489-491: The issues fetch currently calls ghApi once with
per_page=100 (const issues = await
ghApi(`/repos/${repo}/issues?state=open&per_page=100&sort=created&direction=asc`)
), so implement paginated fetching like the PR pipeline handler: loop over page
numbers adding `&page=${page}` to the request, collect each batch into the
existing issues array, break when the returned batch.length < 100 or when page
>= MAX_PAGES, and then use the accumulated issues to compute totalOpen and label
totals; use the same MAX_PAGES constant and ghApi function to guard against
runaway loops.

In `@src/plugins/__tests__/pr-remediator-skill-executor-plugin.test.ts`:
- Around line 13-31: Replace the custom makeBus() stub in the test with the real
InMemoryEventBus implementation: remove makeBus() and any as never casts and
instantiate InMemoryEventBus (or a small typed helper wrapper) where makeBus()
was used so the tests exercise real install()/publish()/subscribe() semantics;
update references in the test file (pr-remediator-skill-executor-plugin.test.ts)
for each occurrence (lines around the makeBus usage, and the other noted
occurrences) to use InMemoryEventBus and adjust any test helper types to match
its API (subscribe, unsubscribe, publish, topics, published inspection) so the
tests run against the actual in-memory bus contract instead of a partial mock.

In `@src/plugins/pr-remediator-skill-executor-plugin.ts`:
- Around line 56-69: install registers FunctionExecutor instances for each entry
in PR_REMEDIATOR_SKILL_TOPICS via registry.register but uninstall only clears
this.bus, leaving executors live; fix by tracking registrations created in
install (e.g., store the skills or registration handles from registry.register)
and then during uninstall iterate those tracked identifiers to call
registry.unregister (or registry.register with null/replace) to remove them;
also make install idempotent by checking for existing registrations for each
entry.skill before creating/registering a new FunctionExecutor (or unregistering
first) so repeated install/uninstall cycles do not leave stale executors.

In `@workspace/actions.yaml`:
- Around line 167-170: The meta entries that set skillHint to debug_ci_failures
(and the similar entries at the other locations mentioned) are missing an
explicit target selector, so dispatch may be claimed by any agent exposing the
same skill; edit those action blocks (the ones with meta.skillHint:
debug_ci_failures and the blocks for investigate_orphaned_skills / issue_triage)
to add an explicit target selector field that pins them to Ava (for example add
target: "ava" or executor: "Ava" following your repo's action schema) so
validation ensures these actions are routed only to Ava.

In `@workspace/projects.yaml`:
- Around line 99-105: The Discord config for the project defines empty values
for the discord.dev.channelId, discord.dev.webhook, discord.release.channelId,
and discord.release.webhook, which will disable notifications; either populate
these four fields with the correct channel IDs and webhook URLs or, if
empty-by-design, add a comment or link to a tracking issue next to the discord
block to signal intentional omission (modify the `discord` section in
projects.yaml, touching the `dev` and `release` entries and their
`channelId`/`webhook` keys).

---

Nitpick comments:
In `@src/executor/extensions/hitl-mode.ts`:
- Line 23: HITL_MODE_URI is hardcoded causing host drift risk; extract the
common host into a shared constant (e.g., EXTENSION_BASE_URI) in a central
module (like an extensions/constants or config file) and rebuild HITL_MODE_URI
by concatenating that base with the path (e.g.,
`${EXTENSION_BASE_URI}/a2a/ext/hitl-mode-v1`); update the symbol HITL_MODE_URI
in src/executor/extensions/hitl-mode.ts to import and use the centralized base
and change other extension files to import the same constant so all extension
URIs are composed from one authoritative source.

In `@src/planner/__tests__/end-to-end-loop.test.ts`:
- Around line 146-152: Replace the fixed sleep with a condition-based wait:
instead of await new Promise((r) => setTimeout(r, 50)), poll/wait until the
expected condition is true (skillRequests has length 1) with a reasonable upper
timeout (e.g., 1–2s) to avoid flaky CI; target the assertion around
skillRequests in this test (and consider also checking planner inFlightGoals or
ActionDispatcher cooldown state if relevant) so the test only proceeds when
skillRequests.length === 1 or the timeout is reached.

In `@src/planner/__tests__/validate-action-executors.test.ts`:
- Around line 141-166: Replace the ad-hoc mock bus stub with the shared test bus
factory used elsewhere (e.g., makeBus) so the test follows the “no mocks”
guideline: locate the test in validate-action-executors.test.ts that constructs
the bus object (used by validateActionExecutors) and swap the mock-based
subscribe/unsubscribe/publish/topics implementation for a makeBus() instance
that records published messages (so you can still inspect published messages),
keeping assertions on ActionRegistry, ExecutorRegistry, noopExecutor, and the
published array intact; ensure the created bus implements publish to push
BusMessage objects and exposes topics/subscribe/unsubscribe as the other tests
expect.

In `@src/plugins/agent-fleet-health-plugin.ts`:
- Around line 320-334: The _isRegisteredAgent method currently calls
this.executorRegistry.list() on every outcome event which is expensive; cache
the known agent names in a Set (e.g., this._knownAgentNames: Set<string>) and
change _isRegisteredAgent to check that Set instead of re-listing;
populate/update the Set from executorRegistry.list() when the plugin
initializes, and refresh it either on executorRegistry change events (if
available) or on a short timer/periodic interval to keep it in sync with the
registry.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2b05879d-8b80-404e-b0de-a13b082a908a

📥 Commits

Reviewing files that changed from the base of the PR and between 4e910d7 and 7ebbbf9.

📒 Files selected for processing (71)
  • .claude/commands/rollcall.md
  • .env.dist
  • README.md
  • __tests__/goal_evaluator_plugin.test.ts
  • dashboard/package.json
  • docs/explanation/self-improving-loop.md
  • docs/extensions/blast-v1.md
  • docs/extensions/confidence-v1.md
  • docs/extensions/cost-v1.md
  • docs/extensions/effect-domain-v1.md
  • docs/extensions/hitl-mode-v1.md
  • docs/guides/extend-an-a2a-agent.md
  • docs/integrations/runtimes/deep-agent.md
  • docs/reference/agent-skills.md
  • docs/reference/ceremony-plugin.md
  • docs/reference/env-vars.md
  • docs/reference/http-api.md
  • lib/plugins/pr-remediator.ts
  • package.json
  • src/agent-runtime/tools/bus-tools.ts
  • src/api/__tests__/agent-card.test.ts
  • src/api/agent-card.ts
  • src/api/board.ts
  • src/api/github.ts
  • src/config/env.ts
  • src/executor/executors/deep-agent-executor.ts
  • src/executor/extensions/blast.ts
  • src/executor/extensions/confidence.ts
  • src/executor/extensions/cost.ts
  • src/executor/extensions/effect-domain.ts
  • src/executor/extensions/hitl-mode.ts
  • src/executor/extensions/langfuse-trace.ts
  • src/executor/task-tracker.ts
  • src/index.ts
  • src/loaders/__tests__/ceremonyYamlLoader.test.ts
  • src/loaders/ceremonyYamlLoader.ts
  • src/planner/__tests__/end-to-end-loop.test.ts
  • src/planner/__tests__/validate-action-executors.test.ts
  • src/planner/types/action.ts
  • src/planner/validate-action-executors.ts
  • src/plugins/CeremonyPlugin.ts
  • src/plugins/__tests__/CeremonyPlugin.test.ts
  • src/plugins/__tests__/alert-skill-executor-plugin.test.ts
  • src/plugins/__tests__/ceremony-skill-executor-plugin.test.ts
  • src/plugins/__tests__/pr-remediator-skill-executor-plugin.test.ts
  • src/plugins/action-dispatcher-plugin.test.ts
  • src/plugins/action-dispatcher-plugin.ts
  • src/plugins/agent-fleet-health-plugin.test.ts
  • src/plugins/agent-fleet-health-plugin.ts
  • src/plugins/alert-skill-executor-plugin.ts
  • src/plugins/ceremony-skill-executor-plugin.ts
  • src/plugins/goal_evaluator_plugin.ts
  • src/plugins/pr-remediator-skill-executor-plugin.ts
  • src/pr-remediator.test.ts
  • src/schemas/yaml-schemas.ts
  • src/telemetry/telemetry-service.ts
  • src/world/extensions/CeremonyStateExtension.ts
  • src/world/extensions/__tests__/CeremonyStateExtension.test.ts
  • workspace/actions.yaml
  • workspace/agents.yaml
  • workspace/agents/ava.yaml
  • workspace/ceremonies/agent-health.yaml
  • workspace/ceremonies/board.cleanup.yaml
  • workspace/ceremonies/board.health.yaml
  • workspace/ceremonies/board.pr-audit.yaml
  • workspace/ceremonies/board.retro.yaml
  • workspace/ceremonies/daily-standup.yaml
  • workspace/ceremonies/health-check.yaml
  • workspace/ceremonies/service-health.yaml
  • workspace/goals.yaml
  • workspace/projects.yaml

## Steps

1. Run `bash /home/josh/dev/homelab-iac/scripts/agent-rollcall.sh`
1. Run `bash /home/josh/dev/protoWorkstacean/scripts/agent-rollcall.sh`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a repo-relative invocation instead of a machine-specific absolute path.

Line 5 hard-codes a local home directory and conflicts with the repo-local guidance in Line 11, making the command non-portable.

Suggested fix
-1. Run `bash /home/josh/dev/protoWorkstacean/scripts/agent-rollcall.sh`
+1. Run `bash scripts/agent-rollcall.sh`

Also applies to: 11-11

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/commands/rollcall.md at line 5, The command in the README uses a
machine-specific absolute path ("bash
/home/josh/dev/protoWorkstacean/scripts/agent-rollcall.sh"); change it to a
repo-relative invocation (e.g., "bash ./scripts/agent-rollcall.sh" or
"scripts/agent-rollcall.sh") so the instruction is portable across machines and
consistent with the repo-local guidance elsewhere; update every occurrence of
that absolute path (the same string) to the repo-relative form.

- get_incidents
- get_cost_summary
- get_confidence_summary
- web_search
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tool name mismatch: web_search should be searxng_search.

The executor code renamed this tool from web_search to searxng_search (see deep-agent-executor.ts line 265), but the YAML example still lists the old name. Agents referencing web_search in their tools array will not have the search tool available.

Proposed fix
-  - web_search
+  - searxng_search
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- web_search
- searxng_search
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/integrations/runtimes/deep-agent.md` at line 51, The docs YAML example
still references the old tool name "web_search"; update the example so the tool
name is "searxng_search" (to match the executor rename in deep-agent-executor.ts
where the tool was renamed), and ensure any agent "tools" arrays in the example
use "searxng_search" instead of "web_search" so the search tool is available at
runtime.

| `get_ceremonies` | `GET /api/ceremonies` | List ceremony definitions |
| `get_cost_summary` | `GET /api/cost-summaries` | Per-agent/skill cost: tokens, duration, dollars |
| `get_confidence_summary` | `GET /api/confidence-summaries` | Per-agent/skill calibration metrics |
| `web_search` | SearXNG `/search` | Quick web search (5 results) |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tool table also references old name web_search.

For consistency with the implementation, this should be searxng_search.

Proposed fix
-| `web_search` | SearXNG `/search` | Quick web search (5 results) |
+| `searxng_search` | SearXNG `/search` | Web search with category routing (default: 10 results) |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| `web_search` | SearXNG `/search` | Quick web search (5 results) |
| `searxng_search` | SearXNG `/search` | Web search with category routing (default: 10 results) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/integrations/runtimes/deep-agent.md` at line 116, The table entry
currently lists the tool as `web_search` but the implementation uses
`searxng_search`; update the docs table row (the entry containing `web_search`
and the description "SearXNG `/search` | Quick web search (5 results)") to
replace the `web_search` symbol with `searxng_search` so the documentation
matches the runtime/tool name used by the code.

Comment thread src/api/github.ts
Comment on lines +466 to +468
if (!repoList.length || !GITHUB_TOKEN) return Response.json({
totalOpen: 0, criticalOpen: 0, bugOpen: 0, enhancementOpen: 0, repos: [],
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t treat missing GitHub auth as “zero issues.”

When GITHUB_TOKEN is absent, this returns success with all counters at 0. That can falsely indicate healthy issue state instead of “data unavailable.” Return a non-2xx status or add an explicit availability flag so downstream goals can gate on freshness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/github.ts` around lines 466 - 468, The handler currently treats
missing GITHUB_TOKEN as a successful empty-result by returning
Response.json({...}), which hides data-unavailable state; change the
early-return that checks repoList and GITHUB_TOKEN so that when GITHUB_TOKEN is
falsy you return a non-2xx response (e.g., Response.json({ message: "GitHub auth
missing", availability: false }, { status: 503 })) or at minimum include an
explicit availability flag (availability: false) in the returned payload; update
the check around repoList and GITHUB_TOKEN and the Response.json call (symbols:
GITHUB_TOKEN, repoList, Response.json) so consumers can detect and gate on
missing auth.

Comment thread src/api/github.ts
Comment on lines +489 to +491
const issues = await ghApi(
`/repos/${repo}/issues?state=open&per_page=100&sort=created&direction=asc`,
) as Array<{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

GitHub REST API docs for "List repository issues": confirm pagination requirements, per_pagemaximum, and use ofpage/Link headers for full result sets.

💡 Result:

The GitHub REST API endpoint for "List repository issues" (GET /repos/{owner}/{repo}/issues) requires pagination for result sets exceeding the default of 30 items (max per_page=100). Use the per_page parameter (1-100, default 30) and page parameter (default 1) for offset-based pagination. For full result sets, parse the Link response header, which provides rel="next", "prev", "first", "last" URLs (omitted if all results fit on one page). Example Link: https://api.github.com/repos/.../issues?page=2&per_page=100; rel="next". Cursor-based pagination (before/after) may apply for very large datasets, but page/Link is standard.

Citations:


🏁 Script executed:

cd /root/codebase && wc -l src/api/github.ts

Repository: protoLabsAI/protoWorkstacean

Length of output: 136


🏁 Script executed:

cd /root/codebase && sed -n '480,500p' src/api/github.ts

Repository: protoLabsAI/protoWorkstacean

Length of output: 136


🏁 Script executed:

cd /root/codebase && sed -n '520,545p' src/api/github.ts

Repository: protoLabsAI/protoWorkstacean

Length of output: 136


🏁 Script executed:

cd /root/codebase && rg -n "ghApi.*issues" src/api/github.ts -A 3

Repository: protoLabsAI/protoWorkstacean

Length of output: 136


🏁 Script executed:

wc -l src/api/github.ts

Repository: protoLabsAI/protoWorkstacean

Length of output: 93


🏁 Script executed:

sed -n '480,500p' src/api/github.ts

Repository: protoLabsAI/protoWorkstacean

Length of output: 632


🏁 Script executed:

sed -n '520,545p' src/api/github.ts

Repository: protoLabsAI/protoWorkstacean

Length of output: 1238


🏁 Script executed:

rg -n "ghApi.*issues" src/api/github.ts -A 3

Repository: protoLabsAI/protoWorkstacean

Length of output: 54


🏁 Script executed:

rg -n "ghApi" src/api/github.ts -A 2 -B 1

Repository: protoLabsAI/protoWorkstacean

Length of output: 3431


🏁 Script executed:

sed -n '1,50p' src/api/github.ts

Repository: protoLabsAI/protoWorkstacean

Length of output: 1818


🏁 Script executed:

sed -n '500,547p' src/api/github.ts

Repository: protoLabsAI/protoWorkstacean

Length of output: 2097


🏁 Script executed:

grep -n "per_page=100" src/api/github.ts

Repository: protoLabsAI/protoWorkstacean

Length of output: 346


🏁 Script executed:

sed -n '138,160p' src/api/github.ts

Repository: protoLabsAI/protoWorkstacean

Length of output: 1053


🏁 Script executed:

sed -n '525,540p' src/api/github.ts

Repository: protoLabsAI/protoWorkstacean

Length of output: 601


🏁 Script executed:

rg -n "issues" src/api/github.ts -B 2 -A 2

Repository: protoLabsAI/protoWorkstacean

Length of output: 1997


🏁 Script executed:

sed -n '485,510p' src/api/github.ts

Repository: protoLabsAI/protoWorkstacean

Length of output: 1050


Implement pagination for issues query — currently only fetches first 100 results.

The issues endpoint at lines 489-491 fetches with per_page=100 but missing the page parameter and pagination loop. Repos with more than 100 open issues will be undercounted, skewing totalOpen and label totals.

Use the same pagination pattern as the PR pipeline handler (lines 138-160): loop through pages with a page parameter, break when batch.length < 100, and cap at MAX_PAGES to guard against runaway loops.

Suggested fix
-        const issues = await ghApi(
-          `/repos/${repo}/issues?state=open&per_page=100&sort=created&direction=asc`,
-        ) as Array<{
-          number: number;
-          title: string;
-          labels: Array<{ name: string }>;
-          created_at: string;
-          updated_at: string;
-          user: { login: string } | null;
-          pull_request?: unknown;
-        }>;
+        type IssueItem = {
+          number: number;
+          title: string;
+          labels: Array<{ name: string }>;
+          created_at: string;
+          updated_at: string;
+          user: { login: string } | null;
+          pull_request?: unknown;
+        };
+        const issues: IssueItem[] = [];
+        const MAX_PAGES = 5;
+        for (let page = 1; page <= MAX_PAGES; page++) {
+          const batch = await ghApi(
+            `/repos/${repo}/issues?state=open&per_page=100&page=${page}&sort=created&direction=asc`,
+          ) as IssueItem[];
+          if (batch.length === 0) break;
+          issues.push(...batch);
+          if (batch.length < 100) break;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/github.ts` around lines 489 - 491, The issues fetch currently calls
ghApi once with per_page=100 (const issues = await
ghApi(`/repos/${repo}/issues?state=open&per_page=100&sort=created&direction=asc`)
), so implement paginated fetching like the PR pipeline handler: loop over page
numbers adding `&page=${page}` to the request, collect each batch into the
existing issues array, break when the returned batch.length < 100 or when page
>= MAX_PAGES, and then use the accumulated issues to compute totalOpen and label
totals; use the same MAX_PAGES constant and ghApi function to guard against
runaway loops.

Comment on lines +13 to +31
function makeBus() {
const subs = new Map<string, Array<(msg: BusMessage) => void>>();
const published: BusMessage[] = [];
return {
published,
subscribe(topic: string, _name: string, handler: (msg: BusMessage) => void) {
if (!subs.has(topic)) subs.set(topic, []);
subs.get(topic)!.push(handler);
return `sub-${topic}-${Math.random()}`;
},
unsubscribe(_id: string) {},
publish(topic: string, msg: BusMessage) {
published.push(msg);
const handlers = subs.get(topic) ?? [];
for (const h of handlers) h(msg);
},
topics() { return []; },
};
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the real InMemoryEventBus here.

The bespoke makeBus() stub plus as never installs bypasses the actual bus contract, so these tests can go green while install() / publish() behavior drifts from the real runtime. Rebase these cases on InMemoryEventBus (or a typed helper built on top of it) instead of a partial mock.

As per coding guidelines, "Write unit tests with bun test against in-memory bus with no mocks and no LLM calls" and "Use TypeScript with strict mode for all application code".

Also applies to: 42-43, 87-88, 145-146, 163-164

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugins/__tests__/pr-remediator-skill-executor-plugin.test.ts` around
lines 13 - 31, Replace the custom makeBus() stub in the test with the real
InMemoryEventBus implementation: remove makeBus() and any as never casts and
instantiate InMemoryEventBus (or a small typed helper wrapper) where makeBus()
was used so the tests exercise real install()/publish()/subscribe() semantics;
update references in the test file (pr-remediator-skill-executor-plugin.test.ts)
for each occurrence (lines around the makeBus usage, and the other noted
occurrences) to use InMemoryEventBus and adjust any test helper types to match
its API (subscribe, unsubscribe, publish, topics, published inspection) so the
tests run against the actual in-memory bus contract instead of a partial mock.

Comment on lines +56 to +69
install(bus: EventBus): void {
this.bus = bus;
for (const entry of PR_REMEDIATOR_SKILL_TOPICS) {
const executor = new FunctionExecutor(async (req) => this._execute(req, entry));
this.registry.register(entry.skill, executor, { priority: 5 });
}
console.log(
`[pr-remediator-skill-executor] Registered ${PR_REMEDIATOR_SKILL_TOPICS.length} executor(s): ${PR_REMEDIATOR_SKILL_TOPICS.map(e => e.skill).join(", ")}`,
);
}

uninstall(): void {
this.bus = undefined;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

uninstall() leaves dead executors registered.

After uninstall(), the registry will still resolve these five skills, but each executor now only returns "not installed". That means dispatch and startup validation can still treat the skills as wired even though the plugin is gone. uninstall() should remove the registrations it added, or install() needs to replace them idempotently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugins/pr-remediator-skill-executor-plugin.ts` around lines 56 - 69,
install registers FunctionExecutor instances for each entry in
PR_REMEDIATOR_SKILL_TOPICS via registry.register but uninstall only clears
this.bus, leaving executors live; fix by tracking registrations created in
install (e.g., store the skills or registration handles from registry.register)
and then during uninstall iterate those tracked identifiers to call
registry.unregister (or registry.register with null/replace) to remove them;
also make install idempotent by checking for existing registrations for each
entry.skill before creating/registering a new FunctionExecutor (or unregistering
first) so repeated install/uninstall cycles do not leave stale executors.

Comment thread workspace/actions.yaml
Comment on lines 167 to +170
meta:
skillHint: debug_ci_failures
fireAndForget: true
cooldownMs: 1800000 # 30 min — agent skill dispatch (#437)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Route these skills explicitly to Ava.

These entries describe Ava-only work, but unlike the later investigate_orphaned_skills / issue_triage actions they leave target selection open. That means validation only proves that some executor exists for the skill, not that the dispatch lands on Ava, and it becomes ambiguous if another agent later exposes the same skill.

Suggested config fix
   meta:
     skillHint: debug_ci_failures
+    agentId: ava
     fireAndForget: true
     cooldownMs: 1800000  # 30 min — agent skill dispatch (`#437`)

   meta:
     skillHint: fleet_incident_response
+    agentId: ava
     fireAndForget: true
     cooldownMs: 1800000  # 30 min — agent skill dispatch, expensive (`#437`)

   meta:
     skillHint: downshift_models
+    agentId: ava
     fireAndForget: true
     cooldownMs: 1800000  # 30 min — agent skill dispatch (`#437`)

Also applies to: 572-575, 636-639

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@workspace/actions.yaml` around lines 167 - 170, The meta entries that set
skillHint to debug_ci_failures (and the similar entries at the other locations
mentioned) are missing an explicit target selector, so dispatch may be claimed
by any agent exposing the same skill; edit those action blocks (the ones with
meta.skillHint: debug_ci_failures and the blocks for investigate_orphaned_skills
/ issue_triage) to add an explicit target selector field that pins them to Ava
(for example add target: "ava" or executor: "Ava" following your repo's action
schema) so validation ensures these actions are routed only to Ava.

Comment thread workspace/projects.yaml
Comment on lines +99 to +105
discord:
dev:
channelId: ""
webhook: ""
release:
channelId: ""
webhook: ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Discord configuration is incomplete.

The channelId and webhook fields are empty strings, which will silently skip Discord notifications for this project. If this is intentional (awaiting Discord channel setup), consider adding a comment or tracking issue. Otherwise, ensure these are populated before release.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@workspace/projects.yaml` around lines 99 - 105, The Discord config for the
project defines empty values for the discord.dev.channelId, discord.dev.webhook,
discord.release.channelId, and discord.release.webhook, which will disable
notifications; either populate these four fields with the correct channel IDs
and webhook URLs or, if empty-by-design, add a comment or link to a tracking
issue next to the discord block to signal intentional omission (modify the
`discord` section in projects.yaml, touching the `dev` and `release` entries and
their `channelId`/`webhook` keys).

@protoquinn
Copy link
Copy Markdown

protoquinn Bot commented Apr 21, 2026

Auto-remediation verdict: decomposable — closing and proposing a split.

This PR has merge conflicts that cluster into distinct file groups. Rather than attempting a complex merge, the recommended path is to split this PR into smaller, focused PRs — one per file cluster.

Conflicting file clusters:

  • src/services/a2a/agentCard.ts
  • src/services/fleet-health/collector.ts
  • src/services/ceremonies/loader.ts
  • package.json

Evidence: This dev→main promotion PR bundles 3 independent concerns (agent card URL fix, fleet-health synthetic-actor filter, ceremony-loader enabled flag) across 33 commits of drift. The conflicts arise because main diverged with hotfixes touching overlapping files. Splitting into separate PRs per feature area (agent-card, fleet-health, ceremony) would isolate the conflicting files and allow each to merge independently.

Please re-cut this as smaller, focused PRs targeting each cluster independently. This avoids the conflict and makes each change easier to review.

Automated verdict by pr-remediator diagnose_pr_stuck skill.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment