-
Notifications
You must be signed in to change notification settings - Fork 6
[Epic] Work Board #165
Description
Overview
Work Board surfaces a structured, real-time view of task state across all active copilot-bridge channels and agents in a single Mattermost channel (#work-board). Instead of manually visiting each channel, operators see one living board showing what every agent is doing, what is pending, what is done, and what is blocked. The feature is structured as three additive phases -- each independently deployable -- culminating in two new bridge tool contributions (post_to_channel and get_agent_status) that enable event-driven signalling and lightweight session queries.
Spec Reference
https://github.com/raykao/dark-factory/tree/speckit/work-board/specs/work-board
Phases and Tasks
Phase 0 - Coordinator Standup (Zero Bridge Changes)
Deployable today with zero source-code changes. Covers AGENTS.md edits, GitHub label creation, and cron registration using existing bridge tooling.
- [T001] Verify Phase 0 prerequisites:
#work-boardchannel exists, coordinator bot is a member, all target bots are in the coordinator'sask_agentallowlist, andghCLI is authenticated - [T002] Author coordinator
AGENTS.md## Work Board Coordinator Rolesection: Standard Status Query protocol, JSON response schema, board rendering instructions, and#work-boardtarget channel name - [T003] Author worker agent status response protocol
## Status Reportingsection for each workerAGENTS.md: full JSON response schema and instruction to respond with JSON only - [T004] Create GitHub label convention setup script in
specs/work-board/quickstart.md:gh label createcommands forstatus:active,status:blocked,status:done, and per-agentagent:<name>labels - [T005] Register Phase 0 daily standup cron in
#work-boardvia/schedule addand verify it shows as active - [T006] Execute Phase 0 smoke test checklist: trigger standup manually, confirm Markdown table format, status icons, footer, and unreachable-agent fallback
Checkpoint: Daily standup fires at 09:00 weekdays, board posts to #work-board, all agents respond with structured JSON.
Phase 1 - Pinned Board Post + /board Command
Replaces the ephemeral Phase 0 standup with a persistent pinned post updated in-place on a 30-minute schedule and on demand via
/board.
- [T007] Document rate-limit budget analysis in
specs/work-board/plan.md:PUT /api/v4/posts/{id}ceiling,gh issue listcall budget, andask_agentper-agent latency at scale - [T008] Create
tests/unit/board-state.test.ts:channel_prefsCRUD, idempotency,isRateLimited()behavior, and metadata round-trip for all fourboard_*keys - [T009] Create
tests/unit/board-renderer.test.ts: empty agent list, single agent, 15-agent full table, 75-agent compact-mode trigger, all status icon variants, invalid JSON agent response, absent GitHub API - [T010] Create
tests/integration/board-command.test.ts: all subcommands,--agentsflag, channel restriction, concurrent invocation queuing, and all error states - [T011] Create
src/core/board-state.ts(~80 LOC):getBoardPostId,setBoardPostId,getLastRefreshMetadata,setLastRefreshMetadata,isRateLimited; backed bychannel_prefsSQLite key-value store withboard_*key prefix - [T012] Create
src/core/board-renderer.ts(~120 LOC):renderBoard,renderAgentTable,renderCompletionsTable,renderIssuesTable,renderCompact,truncateTask; 14,000-char soft limit auto-switch to compact; 16,000-char hard cap - [T013] Create
src/core/board.ts(~150 LOC):BoardCommandHandler.handlewith subcommand parser forrefresh,status,detail <agent>and flags--agents,--quiet,--json; channel restriction guard; per-channel mutex - [T014] Add
case 'board':dispatch to slash command switch insrc/core/session-manager.ts, delegating toBoardCommandHandler.handleusing the same pattern as/statusand/schedule - [T015] Implement pinned post lifecycle in
src/core/board.ts: create and pin on first run; detect 404 and re-create; update in-place on subsequent runs - [T016] Cancel Phase 0 standup cron and register Phase 1 schedules:
*/30 9-17 * * 1-5for business-hours refresh and0 9 * * 6,0for weekend daily summary
Checkpoint: /board command live, pinned post persists across restarts, 30-min cron active, all tests passing; PR to ChrisRomp/copilot-bridge ready (T033).
Phase 2 - post_to_channel + get_agent_status Tools
Enables agents to push completion events directly to
#work-boardand to query peer session state synchronously, without a fullask_agentround-trip.
- [T017] Create
tests/unit/allowlist.test.ts: all deny paths (absentpostTargets, channel not in list, wildcard detected at startup,pinwithallowPin: false), all allow paths, startup validation - [T018] Create
tests/unit/rate-limiter.test.ts: first call succeeds, second call within 60s throwsRATE_LIMITED, call after 60s refill succeeds, two distinct(botId, channelId)pairs tracked independently - [T019] Create
tests/integration/post-to-channel.test.ts: all 6 ToolError codes, create-and-pin scenario, update-in-place scenario, audit log entries,SlackAdapter.postToChannel()NOT_IMPLEMENTED stub - [T020] Create
tests/integration/get-agent-status.test.ts: allcurrentStatevalues, latency under 500ms,"unknown"for no active session, allowlist deny - [T021] Add
PostTargetinterface andpostTargets?: PostTarget[]toBotConfiginsrc/config/schema.ts; startup validation rejects wildcard"*"and self-targets - [T022] Create
src/core/allowlist.ts(~60 LOC):checkChannelAllowed(bot, targetChannelId, operation)-- throwsCHANNEL_NOT_ALLOWEDorPIN_NOT_ALLOWED; MUST be the first operation in every handler - [T023] Create
src/core/rate-limiter.ts(~40 LOC): token bucket per(botId, targetChannelId)pair, 1 token per 60s refill;check()throwsRATE_LIMITEDwhen bucket empty - [T024] Extend
ChannelAdapterinterface insrc/types.ts: addpostToChannel,updatePostInChannel,pinPost; definePostOptionsandPostResulttypes - [T025] Implement
postToChannel,updatePostInChannel,pinPostinsrc/adapters/MattermostAdapter.ts: content-length guard, authorship check for update path, conditional pin after creation - [T026] Add
lastActivity: DateandlastToolCall: string | nullin-memory fields to session objects insrc/core/session-manager.ts; derivecurrentStateper plan.md mapping table - [T027] Register
post_to_channeltool conditionally inbuildCustomTools(): present only whenbotConfig.postTargets?.length > 0; handler order ischeckChannelAllowedthenrateLimiter.checkthenadapter.postToChannelthen audit log - [T028] Register
get_agent_statustool for all bots inbuildCustomTools(): reusecheckAskAgentAllowed; synchronous in-memory read ofsessionRegistry.getStatus(params.target); no async I/O - [T029] Add
postToChannel()stub tosrc/adapters/SlackAdapter.ts: throwsToolError('NOT_IMPLEMENTED')with a descriptive message; signature must matchChannelAdapter
Checkpoint: post_to_channel and get_agent_status tools live, allowlist enforced fail-closed, audit log populated, all tests passing; PR to ChrisRomp/copilot-bridge ready (T034).
Final Phase - Polish and Cross-Cutting Concerns
- [T030] Add HTML content sanitization to
contentparameter inMattermostAdapter.tsbefore Mattermost API calls (defence-in-depth) - [T031] Add board refresh observability logging to
src/core/board.ts: INFO on refresh, INFO on board creation and pin, WARN on rate-limit rejections - [T032] Update
specs/work-board/quickstart.mdwith: single-instance bridge assumption, cold-startget_agent_statusbehavior, and resolution ofask_agentparallelism question - [T033] Open PR to ChrisRomp/copilot-bridge with Phase 1 changes (T008-T016 all passing)
- [T034] Open PR to ChrisRomp/copilot-bridge with Phase 2 changes (T017-T029 all passing)
Acceptance Criteria
Phase 0 -- Daily Standup
- Given the coordinator is configured with the standup schedule, when 09:00 weekday arrives, the coordinator posts a Markdown table to
#work-boardwith one row per configured agent (Status, Current Task, Blockers columns). - Given an agent is offline, when the coordinator runs
ask_agent, the row showsWarning: Unreachableand the board still posts successfully. - Given GitHub Issues exist with
status:activelabels, when the standup runs, those issues appear in the "Open Issues by Agent" section. - Given the coordinator posts a standup, the board matches the format defined in spec section 8 (correct columns, status icons, footer).
Phase 1 -- Pinned Board Post
- Given the coordinator runs for the first time, a new post is created in
#work-boardand pinned to the channel. - Given a pinned board post exists, on scheduled refresh the post is updated in-place (same post ID) with a new timestamp.
- Given the bridge restarts, the next refresh cycle updates the same pinned post (not a new one) -- the post ID was persisted.
- Given one
ask_agentcall fails with a timeout, the failing agent row showsWarning: Unreachableand all other rows show current data. - Given a user types
/board, the pinned post is updated within 5 minutes and an ephemeral confirmation is shown. - Given a user types
/board status, an ephemeral message shows: last refresh time, next scheduled time, agent count, unreachable agent count.
Phase 2 -- post_to_channel and get_agent_status
- Given a bot has a valid
postTargetsentry, callingpost_to_channelposts the content to the target channel within 5 seconds. - Given a bot does NOT have the target channel in
postTargets,post_to_channelthrowsCHANNEL_NOT_ALLOWEDand no message is sent. - Given
post_to_channelis called twice within 60s, the second call returnsRATE_LIMITEDwith no Mattermost API call made. - Given an agent's session is active,
get_agent_statusreturnscurrentState: "active"within 500ms. - Given no active session exists for an agent,
get_agent_statusreturnscurrentState: "unknown"(no exception thrown).
Edge Cases Covered
- Empty agent list: board posts a header-only table with a "No agents configured" message.
- Board post deleted externally: coordinator creates a new post on next refresh and persists the new ID.
- Post size exceeded (above 14,000 chars): compact summary-only view rendered; full detail accessible via
/board detail <agent>. - Concurrent
/boardcommands: second invocation is queued; no duplicate board posts. - Agent returns invalid JSON: row shows
Errorwith the first 100 chars of the raw response. - GitHub API unavailable: board renders from
ask_agentdata only; Issues section omitted with a footer note.
Notes
Key Design Decisions
- GitHub Issues as sole state source of truth (NFR-007): The board is a view. The bridge's SQLite stores only operational metadata (pinned post ID). No new persistent state store is introduced.
- Fail-closed allowlist (NFR-008): A bot without
postTargetsin config cannot callpost_to_channelat all. The allowlist check is the first operation in every handler -- no Mattermost API call is made before it passes. - Post size architecture (NFR-001): Hard cap at 16,000 chars (Mattermost limit is 16,383). Soft limit at 14,000 chars triggers automatic compact mode. At ~200 chars/row this supports ~75 agents before switching.
- No parallel
ask_agentin Phase 1 (open question): Sequential polling is O(N x 20s). Confirm with ChrisRomp whether the bridge supports concurrentask_agentcalls from a single session before Phase 1 ships -- this is critical for deployments with 10+ agents (T007). /boardcommand follows existing patterns: Add'board'to the slash command dispatch table insession-manager.ts, same as/statusand/schedule. Estimated at ~150 LOC for the handler.- Phase 2 token bucket rate limiter: In-memory
Map<string, TokenBucket>keyed by(botId, channelId). Not persisted -- bucket resets on bridge restart. This is intentional to avoid a new store dependency. get_agent_statuslatency contract (FR-022, NFR): Under 500ms enforced by design -- no async I/O in the hot path. ReadslastActivityandlastToolCallfields added to in-memory session objects in T026.
Risks
ask_agenttimeout at scale: If parallelism is not supported, a 15-agent deployment takes up to 5 minutes to refresh (15 x 20s). The 30-minute schedule is conservative to accommodate this, but/boardon-demand refresh may feel slow for larger fleets.- Multi-instance bridge conflict: If two bridge instances both run a bot configured with
post_to_channeltargeting#work-board, both may post without coordination. Current design assumes a single bridge instance. - Cold-start
get_agent_status: After a bridge restart, all sessions returncurrentState: "unknown"until agents re-establish sessions. Document in quickstart.md (T032). - Board ownership ambiguity: Dedicated
@boardidentity is architecturally cleaner but adds configuration overhead per deployment. Using the existing@copilotadmin bot is simpler. Recommended to decide before Phase 1 ships. - Mattermost PUT rate limits: Observed or documented ceiling on
PUT /api/v4/posts/{id}is not confirmed. The 30-minute refresh interval is conservative; investigate before enabling more frequent event-driven updates in Phase 2 (T007).
Suggested Child Issues for copilot-bridge
| Issue | Phase | Priority |
|---|---|---|
feat: Add /board slash command for Work Board refresh |
1 | P1 |
feat: Persist pinned post ID in channel_prefs |
1 | P1 |
feat: Add post_to_channel tool with allowlist security |
2 | P1 |
feat: Add get_agent_status tool for structured session queries |
2 | P2 |
| docs: Phase 0 Work Board runbook for coordinator bot setup | 0 | P0 |