Skip to content

[Epic] Work Board #165

@raykao

Description

@raykao

Overview

Work Board surfaces a structured, real-time view of task state across all active copilot-bridge channels and agents in a single Mattermost channel (#work-board). Instead of manually visiting each channel, operators see one living board showing what every agent is doing, what is pending, what is done, and what is blocked. The feature is structured as three additive phases -- each independently deployable -- culminating in two new bridge tool contributions (post_to_channel and get_agent_status) that enable event-driven signalling and lightweight session queries.

Spec Reference

https://github.com/raykao/dark-factory/tree/speckit/work-board/specs/work-board

Phases and Tasks

Phase 0 - Coordinator Standup (Zero Bridge Changes)

Deployable today with zero source-code changes. Covers AGENTS.md edits, GitHub label creation, and cron registration using existing bridge tooling.

  • [T001] Verify Phase 0 prerequisites: #work-board channel exists, coordinator bot is a member, all target bots are in the coordinator's ask_agent allowlist, and gh CLI is authenticated
  • [T002] Author coordinator AGENTS.md ## Work Board Coordinator Role section: Standard Status Query protocol, JSON response schema, board rendering instructions, and #work-board target channel name
  • [T003] Author worker agent status response protocol ## Status Reporting section for each worker AGENTS.md: full JSON response schema and instruction to respond with JSON only
  • [T004] Create GitHub label convention setup script in specs/work-board/quickstart.md: gh label create commands for status:active, status:blocked, status:done, and per-agent agent:<name> labels
  • [T005] Register Phase 0 daily standup cron in #work-board via /schedule add and verify it shows as active
  • [T006] Execute Phase 0 smoke test checklist: trigger standup manually, confirm Markdown table format, status icons, footer, and unreachable-agent fallback

Checkpoint: Daily standup fires at 09:00 weekdays, board posts to #work-board, all agents respond with structured JSON.


Phase 1 - Pinned Board Post + /board Command

Replaces the ephemeral Phase 0 standup with a persistent pinned post updated in-place on a 30-minute schedule and on demand via /board.

  • [T007] Document rate-limit budget analysis in specs/work-board/plan.md: PUT /api/v4/posts/{id} ceiling, gh issue list call budget, and ask_agent per-agent latency at scale
  • [T008] Create tests/unit/board-state.test.ts: channel_prefs CRUD, idempotency, isRateLimited() behavior, and metadata round-trip for all four board_* keys
  • [T009] Create tests/unit/board-renderer.test.ts: empty agent list, single agent, 15-agent full table, 75-agent compact-mode trigger, all status icon variants, invalid JSON agent response, absent GitHub API
  • [T010] Create tests/integration/board-command.test.ts: all subcommands, --agents flag, channel restriction, concurrent invocation queuing, and all error states
  • [T011] Create src/core/board-state.ts (~80 LOC): getBoardPostId, setBoardPostId, getLastRefreshMetadata, setLastRefreshMetadata, isRateLimited; backed by channel_prefs SQLite key-value store with board_* key prefix
  • [T012] Create src/core/board-renderer.ts (~120 LOC): renderBoard, renderAgentTable, renderCompletionsTable, renderIssuesTable, renderCompact, truncateTask; 14,000-char soft limit auto-switch to compact; 16,000-char hard cap
  • [T013] Create src/core/board.ts (~150 LOC): BoardCommandHandler.handle with subcommand parser for refresh, status, detail <agent> and flags --agents, --quiet, --json; channel restriction guard; per-channel mutex
  • [T014] Add case 'board': dispatch to slash command switch in src/core/session-manager.ts, delegating to BoardCommandHandler.handle using the same pattern as /status and /schedule
  • [T015] Implement pinned post lifecycle in src/core/board.ts: create and pin on first run; detect 404 and re-create; update in-place on subsequent runs
  • [T016] Cancel Phase 0 standup cron and register Phase 1 schedules: */30 9-17 * * 1-5 for business-hours refresh and 0 9 * * 6,0 for weekend daily summary

Checkpoint: /board command live, pinned post persists across restarts, 30-min cron active, all tests passing; PR to ChrisRomp/copilot-bridge ready (T033).


Phase 2 - post_to_channel + get_agent_status Tools

Enables agents to push completion events directly to #work-board and to query peer session state synchronously, without a full ask_agent round-trip.

  • [T017] Create tests/unit/allowlist.test.ts: all deny paths (absent postTargets, channel not in list, wildcard detected at startup, pin with allowPin: false), all allow paths, startup validation
  • [T018] Create tests/unit/rate-limiter.test.ts: first call succeeds, second call within 60s throws RATE_LIMITED, call after 60s refill succeeds, two distinct (botId, channelId) pairs tracked independently
  • [T019] Create tests/integration/post-to-channel.test.ts: all 6 ToolError codes, create-and-pin scenario, update-in-place scenario, audit log entries, SlackAdapter.postToChannel() NOT_IMPLEMENTED stub
  • [T020] Create tests/integration/get-agent-status.test.ts: all currentState values, latency under 500ms, "unknown" for no active session, allowlist deny
  • [T021] Add PostTarget interface and postTargets?: PostTarget[] to BotConfig in src/config/schema.ts; startup validation rejects wildcard "*" and self-targets
  • [T022] Create src/core/allowlist.ts (~60 LOC): checkChannelAllowed(bot, targetChannelId, operation) -- throws CHANNEL_NOT_ALLOWED or PIN_NOT_ALLOWED; MUST be the first operation in every handler
  • [T023] Create src/core/rate-limiter.ts (~40 LOC): token bucket per (botId, targetChannelId) pair, 1 token per 60s refill; check() throws RATE_LIMITED when bucket empty
  • [T024] Extend ChannelAdapter interface in src/types.ts: add postToChannel, updatePostInChannel, pinPost; define PostOptions and PostResult types
  • [T025] Implement postToChannel, updatePostInChannel, pinPost in src/adapters/MattermostAdapter.ts: content-length guard, authorship check for update path, conditional pin after creation
  • [T026] Add lastActivity: Date and lastToolCall: string | null in-memory fields to session objects in src/core/session-manager.ts; derive currentState per plan.md mapping table
  • [T027] Register post_to_channel tool conditionally in buildCustomTools(): present only when botConfig.postTargets?.length > 0; handler order is checkChannelAllowed then rateLimiter.check then adapter.postToChannel then audit log
  • [T028] Register get_agent_status tool for all bots in buildCustomTools(): reuse checkAskAgentAllowed; synchronous in-memory read of sessionRegistry.getStatus(params.target); no async I/O
  • [T029] Add postToChannel() stub to src/adapters/SlackAdapter.ts: throws ToolError('NOT_IMPLEMENTED') with a descriptive message; signature must match ChannelAdapter

Checkpoint: post_to_channel and get_agent_status tools live, allowlist enforced fail-closed, audit log populated, all tests passing; PR to ChrisRomp/copilot-bridge ready (T034).


Final Phase - Polish and Cross-Cutting Concerns

  • [T030] Add HTML content sanitization to content parameter in MattermostAdapter.ts before Mattermost API calls (defence-in-depth)
  • [T031] Add board refresh observability logging to src/core/board.ts: INFO on refresh, INFO on board creation and pin, WARN on rate-limit rejections
  • [T032] Update specs/work-board/quickstart.md with: single-instance bridge assumption, cold-start get_agent_status behavior, and resolution of ask_agent parallelism question
  • [T033] Open PR to ChrisRomp/copilot-bridge with Phase 1 changes (T008-T016 all passing)
  • [T034] Open PR to ChrisRomp/copilot-bridge with Phase 2 changes (T017-T029 all passing)

Acceptance Criteria

Phase 0 -- Daily Standup

  • Given the coordinator is configured with the standup schedule, when 09:00 weekday arrives, the coordinator posts a Markdown table to #work-board with one row per configured agent (Status, Current Task, Blockers columns).
  • Given an agent is offline, when the coordinator runs ask_agent, the row shows Warning: Unreachable and the board still posts successfully.
  • Given GitHub Issues exist with status:active labels, when the standup runs, those issues appear in the "Open Issues by Agent" section.
  • Given the coordinator posts a standup, the board matches the format defined in spec section 8 (correct columns, status icons, footer).

Phase 1 -- Pinned Board Post

  • Given the coordinator runs for the first time, a new post is created in #work-board and pinned to the channel.
  • Given a pinned board post exists, on scheduled refresh the post is updated in-place (same post ID) with a new timestamp.
  • Given the bridge restarts, the next refresh cycle updates the same pinned post (not a new one) -- the post ID was persisted.
  • Given one ask_agent call fails with a timeout, the failing agent row shows Warning: Unreachable and all other rows show current data.
  • Given a user types /board, the pinned post is updated within 5 minutes and an ephemeral confirmation is shown.
  • Given a user types /board status, an ephemeral message shows: last refresh time, next scheduled time, agent count, unreachable agent count.

Phase 2 -- post_to_channel and get_agent_status

  • Given a bot has a valid postTargets entry, calling post_to_channel posts the content to the target channel within 5 seconds.
  • Given a bot does NOT have the target channel in postTargets, post_to_channel throws CHANNEL_NOT_ALLOWED and no message is sent.
  • Given post_to_channel is called twice within 60s, the second call returns RATE_LIMITED with no Mattermost API call made.
  • Given an agent's session is active, get_agent_status returns currentState: "active" within 500ms.
  • Given no active session exists for an agent, get_agent_status returns currentState: "unknown" (no exception thrown).

Edge Cases Covered

  • Empty agent list: board posts a header-only table with a "No agents configured" message.
  • Board post deleted externally: coordinator creates a new post on next refresh and persists the new ID.
  • Post size exceeded (above 14,000 chars): compact summary-only view rendered; full detail accessible via /board detail <agent>.
  • Concurrent /board commands: second invocation is queued; no duplicate board posts.
  • Agent returns invalid JSON: row shows Error with the first 100 chars of the raw response.
  • GitHub API unavailable: board renders from ask_agent data only; Issues section omitted with a footer note.

Notes

Key Design Decisions

  • GitHub Issues as sole state source of truth (NFR-007): The board is a view. The bridge's SQLite stores only operational metadata (pinned post ID). No new persistent state store is introduced.
  • Fail-closed allowlist (NFR-008): A bot without postTargets in config cannot call post_to_channel at all. The allowlist check is the first operation in every handler -- no Mattermost API call is made before it passes.
  • Post size architecture (NFR-001): Hard cap at 16,000 chars (Mattermost limit is 16,383). Soft limit at 14,000 chars triggers automatic compact mode. At ~200 chars/row this supports ~75 agents before switching.
  • No parallel ask_agent in Phase 1 (open question): Sequential polling is O(N x 20s). Confirm with ChrisRomp whether the bridge supports concurrent ask_agent calls from a single session before Phase 1 ships -- this is critical for deployments with 10+ agents (T007).
  • /board command follows existing patterns: Add 'board' to the slash command dispatch table in session-manager.ts, same as /status and /schedule. Estimated at ~150 LOC for the handler.
  • Phase 2 token bucket rate limiter: In-memory Map<string, TokenBucket> keyed by (botId, channelId). Not persisted -- bucket resets on bridge restart. This is intentional to avoid a new store dependency.
  • get_agent_status latency contract (FR-022, NFR): Under 500ms enforced by design -- no async I/O in the hot path. Reads lastActivity and lastToolCall fields added to in-memory session objects in T026.

Risks

  • ask_agent timeout at scale: If parallelism is not supported, a 15-agent deployment takes up to 5 minutes to refresh (15 x 20s). The 30-minute schedule is conservative to accommodate this, but /board on-demand refresh may feel slow for larger fleets.
  • Multi-instance bridge conflict: If two bridge instances both run a bot configured with post_to_channel targeting #work-board, both may post without coordination. Current design assumes a single bridge instance.
  • Cold-start get_agent_status: After a bridge restart, all sessions return currentState: "unknown" until agents re-establish sessions. Document in quickstart.md (T032).
  • Board ownership ambiguity: Dedicated @board identity is architecturally cleaner but adds configuration overhead per deployment. Using the existing @copilot admin bot is simpler. Recommended to decide before Phase 1 ships.
  • Mattermost PUT rate limits: Observed or documented ceiling on PUT /api/v4/posts/{id} is not confirmed. The 30-minute refresh interval is conservative; investigate before enabling more frequent event-driven updates in Phase 2 (T007).

Suggested Child Issues for copilot-bridge

Issue Phase Priority
feat: Add /board slash command for Work Board refresh 1 P1
feat: Persist pinned post ID in channel_prefs 1 P1
feat: Add post_to_channel tool with allowlist security 2 P1
feat: Add get_agent_status tool for structured session queries 2 P2
docs: Phase 0 Work Board runbook for coordinator bot setup 0 P0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions