Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -154,3 +154,7 @@ FEATURE_HEALTH_MONITOR=true
FEATURE_SCHEDULER=true
FEATURE_TOOLS_FABRIC=true
FEATURE_RISK_TIERED_GOVERNANCE=true

# Optional federation control plane.
# FEATURE_FEDERATION=false
# FEATURE_FEDERATION_DASHBOARD=false
7 changes: 5 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,11 @@ RUN mkdir -p /home/lancelot/data && \
# Build War Room React SPA
RUN cd src/warroom && npm ci && npm run build && rm -rf node_modules

# Change ownership of the application directory to the non-root user
RUN chown -R lancelot:lancelot /home/lancelot
# Runtime-writable paths are owned by the non-root user. The application,
# virtualenv, and browser binaries remain root-owned/readable, which avoids
# an expensive recursive chown over the full image during every rebuild.
RUN mkdir -p /home/lancelot/data /home/lancelot/workspace /home/lancelot/.codex && \
chown -R lancelot:lancelot /home/lancelot/data /home/lancelot/workspace /home/lancelot/.codex

# F-001: Docker group no longer needed — socket proxy used instead of direct mount

Expand Down
10 changes: 10 additions & 0 deletions config/federation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,13 @@ cost_report_interval_s: 30.0 # How often children report cost to root
# ── Self Address ────────────────────────────────────────────
# This instance's externally-reachable address for peer registration
# self_address: "http://localhost:8000"

# Dashboard settings for the operator-facing fleet view.
dashboard:
enabled: true
poll_interval_s: 10
stream_interval_s: 3
max_recent_activity_items: 50
card_sort_order: "urgency" # urgency | alphabetical | role
show_fleet_activity_feed: true
activity_feed_max_events: 200
28 changes: 28 additions & 0 deletions config/model_profiles.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,34 @@ profiles:
cost_input_per_1k: 0.00113
cost_output_per_1k: 0.0045

gpt-5.5:
capability_tier: deep
context_window: 1000000
supports_tools: true
cost_input_per_1k: 0.005
cost_output_per_1k: 0.03

gpt-5.4:
capability_tier: deep
context_window: 1000000
supports_tools: true
cost_input_per_1k: 0.0025
cost_output_per_1k: 0.015

gpt-5.4-mini:
capability_tier: fast
context_window: 400000
supports_tools: true
cost_input_per_1k: 0.00075
cost_output_per_1k: 0.0045

gpt-5.4-nano:
capability_tier: fast
context_window: 400000
supports_tools: true
cost_input_per_1k: 0.00005
cost_output_per_1k: 0.0004

# ── Anthropic ──────────────────────────────────────────────────────
claude-3-5-haiku-latest:
capability_tier: fast
Expand Down
4 changes: 2 additions & 2 deletions config/models.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ providers:
max_tokens: 16384
temperature: 0.3
deep:
model: "gpt-5.4"
model: "gpt-5.5"
max_tokens: 128000
temperature: 0.7
cache:
Expand All @@ -63,7 +63,7 @@ providers:
max_tokens: 16384
temperature: 0.3
deep:
model: "gpt-5.4"
model: "gpt-5.5"
max_tokens: 128000
temperature: 0.7
cache:
Expand Down
4 changes: 2 additions & 2 deletions docs/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
- [Proof Walkthrough](proof-walkthrough.md)
- [Release Verification](release-verification.md)
- [Configuration Reference](configuration-reference.md)
- [War Room Guide](war-room.md)
- [War Room Guide](war-room.md) - includes Fleet Dashboard operator workflow

## Architecture

Expand All @@ -18,7 +18,7 @@
- [Receipts](receipts.md)
- [UAB](uab.md)
- [HIVE](hive.md)
- [Federation](federation.md)
- [Federation](federation.md) - includes Fleet Dashboard API and proxy approval contract
- [MCP Governance](mcp.md)

## Security and Operations
Expand Down
70 changes: 70 additions & 0 deletions docs/federation.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,62 @@ Mode is derived automatically from the topology shape — not configured directl

---

## Fleet Dashboard

The Fleet Dashboard is the operator control layer above individual War Rooms. It is exposed in War Room at `/war-room/federation/fleet` when `FEATURE_FEDERATION_DASHBOARD=true` and `FEATURE_FEDERATION=true`.

It is designed to answer one operational question first: which Lancelot instances need human attention right now?

### Snapshot Contract

The dashboard control plane uses `GET /api/federation/dashboard` for the fleet snapshot and `GET /api/federation/dashboard/stream` for live snapshot updates. The root dashboard fetches each reachable peer through the signed `GET /api/federation/dashboard/local` endpoint, so peer detail uses the same Ed25519 federation request-signing and replay protection as the rest of the federation plane.

The snapshot includes:

- fleet counters: total instances, instances needing attention, critical instances, pending approvals, active agents, Soul consistency, and fleet cost utilization
- sorted instance cards with health, heartbeat, Soul hash, budget, active HIVE agents, pending approvals, trust proposals, latest receipt activity, and attention reasons
- unified approval queue entries aggregated across the local instance and peers
- unified trust graduation proposals aggregated across the local instance and peers
- fleet activity sourced from receipt streams, not a separate event log

### Instance Card Semantics

`health` is the local instance health-monitor/readiness state. Federation notices such as stale cost data, Soul propagation, or peer-detail fetch failures are shown as `Needs Attention` reasons instead of automatically marking the instance health as degraded.

Important card fields:

| Field | Meaning |
|-------|---------|
| `command_center_url` | Deep link to the instance Command Center (`/war-room/command`) |
| `health` | Local health snapshot: healthy, degraded, or error |
| `heartbeat` | Federation heartbeat freshness for that instance |
| `budget` | Latest cost utilization and threshold state |
| `recent_activity` | Latest receipt-derived activity description |
| `attention_reasons` | Operator-facing notices that explain amber/red card state |

The local `SELF` card links inside the current War Room. Remote cards link to the peer War Room address with the Command Center path appended. The War Room auth flow preserves the requested path so a signed-in operator lands back on the deep link after login.

### Unified Approval Proxy

Fleet-level approve and deny actions call:

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/dashboard/instances/{instance_id}/approvals/{approval_id}/approve` | Approve a pending local or remote dashboard approval |
| POST | `/dashboard/instances/{instance_id}/approvals/{approval_id}/deny` | Deny a pending local or remote dashboard approval |
| POST | `/dashboard/local/approvals/{approval_id}/approve` | Root-to-peer signed local approval endpoint |
| POST | `/dashboard/local/approvals/{approval_id}/deny` | Root-to-peer signed local denial endpoint |

Operator decisions require both `federation.admin` and `governance.admin` on the root dashboard path. Remote local endpoints require an operator request or a signed ROOT peer request. Every proxy decision carries operator identity and emits a governance receipt with `federated_proxy=true`.

### Activity and Attention Sources

Fleet Activity and card `Latest Activity` are receipt-backed. They surface meaningful receipt descriptions from `action_name`, tool/capability metadata, request text, or receipt action type.

An instance can be `Healthy` while still `Needs Attention`. Examples include pending approvals, pending trust proposals, stale heartbeat, stale budget telemetry for a current peer, Soul mismatch, remote dashboard detail unavailable, or runtime errors reported by the federation control plane.

---

## Module Reference

| Module | Purpose |
Expand Down Expand Up @@ -456,6 +512,9 @@ The audit engine supports complete timeline reconstruction across all instances,
|--------|----------|-------------|
| GET | `/stream` | SSE heartbeat stream + initial snapshot |
| GET | `/health` | Current health summary |
| GET | `/dashboard` | Fleet Dashboard snapshot with local and peer cards |
| GET | `/dashboard/stream` | Fleet Dashboard SSE snapshot stream |
| GET | `/dashboard/local` | Local dashboard detail for an operator or signed ROOT peer |

### Discovery
| Method | Endpoint | Description |
Expand Down Expand Up @@ -541,6 +600,10 @@ Pause semantics are now fail-closed and runtime-backed:
|--------|----------|-------------|
| POST | `/killswitch` | Receive kill command |
| POST | `/cost/report` | Report cost data |
| POST | `/dashboard/instances/{instance_id}/approvals/{approval_id}/approve` | Approve a dashboard approval on a local or remote instance |
| POST | `/dashboard/instances/{instance_id}/approvals/{approval_id}/deny` | Deny a dashboard approval on a local or remote instance |
| POST | `/dashboard/local/approvals/{approval_id}/approve` | Signed ROOT-to-peer local approval proxy endpoint |
| POST | `/dashboard/local/approvals/{approval_id}/deny` | Signed ROOT-to-peer local denial proxy endpoint |

Budget-report authority is intentionally narrow:

Expand Down Expand Up @@ -604,3 +667,10 @@ Peer state is persisted in SQLite with WAL mode for concurrent reads and thread-
| `auth_timestamp_window_s` | 30.0 | Replay protection window |
| `nonce_cache_size` | 10000 | Max cached nonces |
| `cost_report_interval_s` | 30.0 | Cost reporting interval |
| `dashboard.enabled` | true | Enables dashboard API responses when feature flags are enabled |
| `dashboard.poll_interval_s` | 10.0 | War Room polling interval for dashboard refreshes |
| `dashboard.stream_interval_s` | 3.0 | Dashboard SSE snapshot interval |
| `dashboard.max_recent_activity_items` | 50 | Receipt-backed activity items kept per local snapshot |
| `dashboard.card_sort_order` | urgency | Instance card sort order: urgency, alphabetical, or role |
| `dashboard.show_fleet_activity_feed` | true | Include fleet activity rows in the snapshot |
| `dashboard.activity_feed_max_events` | 200 | Max fleet activity rows returned to War Room |
52 changes: 52 additions & 0 deletions docs/war-room.md
Original file line number Diff line number Diff line change
Expand Up @@ -576,6 +576,58 @@ That means the page can now reflect real federation control-plane degradation in

---

## Fleet Dashboard

The Fleet Dashboard is the multi-instance operator view for federated Lancelot deployments. It appears in the Federation navigation when both `FEATURE_FEDERATION=true` and `FEATURE_FEDERATION_DASHBOARD=true`.

**Access:** `/war-room/federation/fleet`

Use it as the first screen when you are operating more than one Lancelot instance. It surfaces health awareness, heartbeat freshness, budget state, pending approvals, trust proposals, active HIVE agent counts, recent receipt activity, and attention reasons in one place.

### Instance Cards

Each card represents one Lancelot instance and is sorted by urgency by default. The card shows:

- local health monitor state
- heartbeat freshness
- Soul hash/version signal
- active agent count
- pending approval count
- pending trust proposal count
- budget utilization
- latest receipt-backed activity
- specific `Needs Attention` notices

The `Open Command Center` button deep-links to that instance's Command Center, not just the War Room root. For the local instance it opens `/war-room/command`; for remote peers it opens the peer address with `/war-room/command` appended. If the operator must sign in first, the auth flow preserves the destination and returns to the deep link after login.

### Health vs. Needs Attention

`Health` is the instance readiness/health monitor result. It should only show degraded or error when the instance health snapshot says so.

`Needs Attention` is broader. A card can be healthy and still need attention because it has pending approvals, pending trust proposals, stale heartbeat, stale budget telemetry, remote detail unavailable, Soul mismatch, or federation runtime notices.

`Latest Activity` comes from receipts. It should begin populating after governed chat actions, HIVE events, approvals, denials, kills, pauses, or other receipted work runs on that instance.

### Unified Approval Queue

The Unified Approval Queue aggregates pending T2/T3 governance approvals across the local instance and federated peers. Approve and Deny actions are sent through the federation dashboard proxy, carry the operator identity, and emit governance receipts.

Use the instance Governance Dashboard for detailed local review when needed, but the fleet queue is the control point for operating multiple instances without tab hopping.

### Fleet Activity

Fleet Activity is receipt-backed. It is not a standalone event log. If no activity appears after an agent or governed command runs, check that the instance is emitting receipts and that remote dashboard detail is reachable through signed federation traffic.

### Troubleshooting

If the Fleet Dashboard page is missing, verify both feature flags and `dashboard.enabled` in `config/federation.yaml`.

If a peer card says remote detail is unavailable, check peer registration, `self_address`, signed federation auth, and the peer's `/api/federation/dashboard/local` endpoint.

If `Needs Attention` names stale cost peers, confirm those peer IDs are still registered. Stale cost telemetry for unregistered peers should be filtered out.

---

## Tips for Daily Operation

1. **Start your session** by checking the Health panel — make sure everything is green
Expand Down
11 changes: 10 additions & 1 deletion src/core/boot.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,16 @@ def _validate_boot_environment(api_token: str | None) -> BootEnvironment:
async def _start_core_runtime_services() -> None:
"""Start gateway-owned services that critical subsystems depend on."""
librarian.start()
await antigravity.start()
try:
from feature_flags import FEATURE_TOOLS_ANTIGRAVITY
except Exception as exc:
FEATURE_TOOLS_ANTIGRAVITY = False
logger.warning("Antigravity feature flag lookup failed; skipping browser startup: %s", exc)

if FEATURE_TOOLS_ANTIGRAVITY:
await antigravity.start()
else:
logger.info("Antigravity browser startup skipped (FEATURE_TOOLS_ANTIGRAVITY=false).")


def _wire_orchestrator_runtime_services() -> None:
Expand Down
9 changes: 6 additions & 3 deletions src/core/feature_flags.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@

Federation Environment variables:
FEATURE_FEDERATION — default: false (multi-instance federation layer)
FEATURE_FEDERATION_DASHBOARD — default: false (fleet dashboard UI/API)

MCP (Model Context Protocol) Environment variables:
FEATURE_MCP — default: false (master kill switch for all MCP invocations)
Expand Down Expand Up @@ -244,6 +245,7 @@ def _env_bool(key: str, default: bool = True) -> bool:

# Federation — multi-instance coordination
FEATURE_FEDERATION: bool = _env_bool("FEATURE_FEDERATION", default=False) # Master switch for Federation subsystem (Governance API, heartbeat, identity)
FEATURE_FEDERATION_DASHBOARD: bool = _env_bool("FEATURE_FEDERATION_DASHBOARD", default=False) # Operator fleet dashboard above per-instance War Rooms

# MCP (Model Context Protocol) — governed tool proxy
FEATURE_MCP: bool = _env_bool("FEATURE_MCP", default=False) # Master kill switch for all MCP tool invocations
Expand Down Expand Up @@ -335,7 +337,7 @@ def reload_flags() -> None:
global FEATURE_TOOL_FLOW_STREAMING, FEATURE_ACTION_CARDS
global FEATURE_HIVE, FEATURE_HIVE_UAB
global FEATURE_VAULT_SECRETS
global FEATURE_FEDERATION
global FEATURE_FEDERATION, FEATURE_FEDERATION_DASHBOARD
global FEATURE_MCP
global FEATURE_TIME_TRAVEL
global FEATURE_A2A
Expand Down Expand Up @@ -417,6 +419,7 @@ def reload_flags() -> None:

# Federation
FEATURE_FEDERATION = _env_bool("FEATURE_FEDERATION", default=False)
FEATURE_FEDERATION_DASHBOARD = _env_bool("FEATURE_FEDERATION_DASHBOARD", default=False)

# MCP
FEATURE_MCP = _env_bool("FEATURE_MCP", default=False)
Expand Down Expand Up @@ -516,8 +519,8 @@ def log_feature_flags() -> None:
FEATURE_VAULT_SECRETS,
)
logger.info(
"Federation flags: FEDERATION=%s",
FEATURE_FEDERATION,
"Federation flags: FEDERATION=%s, FEDERATION_DASHBOARD=%s",
FEATURE_FEDERATION, FEATURE_FEDERATION_DASHBOARD,
)
logger.info(
"MCP flags: MCP=%s",
Expand Down
7 changes: 7 additions & 0 deletions src/core/flags_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,13 @@ def init_flags_api(audit_logger=None):
"conflicts": [],
"warning": "Exposes federation API endpoints. Peers authenticate via Ed25519 challenge/response. All federation events are receipt-traced and audit-logged.",
},
"FEATURE_FEDERATION_DASHBOARD": {
"description": "Federation Dashboard - operator fleet view above per-instance War Rooms. Surfaces instance health, pending approvals, trust proposals, budget pressure, and Command Center entry points.",
"category": "Federation",
"requires": ["FEATURE_FEDERATION"],
"conflicts": [],
"warning": "Shows cross-instance operational metadata to authenticated War Room operators. Remote detail retrieval uses signed federation requests.",
},

# ── MCP (Model Context Protocol) ────────────────────────────────
"FEATURE_MCP": {
Expand Down
36 changes: 26 additions & 10 deletions src/core/gateway_boot_support.py
Original file line number Diff line number Diff line change
Expand Up @@ -1359,30 +1359,46 @@ def _bootstrap_model_discovery():
_persisted_config = load_persisted_config()
_persisted_lane_overrides = _persisted_config.get("lane_overrides", {})

_lane_overrides = {}
_fallback_lanes = {}
try:
from provider_profile import ProfileRegistry
_registry = ProfileRegistry()
_prov_name = main_orchestrator.provider.provider_name
if _registry.has_provider(_prov_name):
_profile = _registry.get_profile(_prov_name)
_lane_overrides["fast"] = _profile.fast.model
_lane_overrides["deep"] = _profile.deep.model
_fallback_lanes["fast"] = _profile.fast.model
_fallback_lanes["deep"] = _profile.deep.model
if _profile.cache:
_lane_overrides["cache"] = _profile.cache.model
_fallback_lanes["cache"] = _profile.cache.model
except Exception as exc:
logger.warning("Model discovery profile lookup failed; using persisted/env overrides only: %s", exc)

_lane_overrides.update(_persisted_lane_overrides)
_lane_overrides = dict(_persisted_lane_overrides)

discovery = ModelDiscovery(
provider=main_orchestrator.provider,
profiles_path="config/model_profiles.yaml",
lane_overrides=_lane_overrides,
)
try:
discovery = ModelDiscovery(
provider=main_orchestrator.provider,
profiles_path="config/model_profiles.yaml",
lane_overrides=_lane_overrides,
fallback_lanes=_fallback_lanes,
)
except TypeError:
discovery = ModelDiscovery(
provider=main_orchestrator.provider,
profiles_path="config/model_profiles.yaml",
lane_overrides=_lane_overrides,
)
discovery.refresh()

for _lane, _model_id in discovery.lane_assignments.items():
try:
main_orchestrator.set_lane_model(_lane, _model_id)
except Exception as _e:
logger.warning("Failed to apply lane assignment %s=%s: %s", _lane, _model_id, _e)

for _lane, _model_id in _persisted_lane_overrides.items():
if discovery.lane_assignments.get(_lane) == _model_id:
continue
try:
main_orchestrator.set_lane_model(_lane, _model_id)
except Exception as _e:
Expand Down
Loading