Integrate compat monitor into resource observability split by shuxueshuxue · Pull Request #210 · OpenDCAI/Mycel

shuxueshuxue · 2026-04-06T09:23:14Z

Summary

transplant the richer compat monitor from #182 onto the current resource-observability branch instead of continuing from the stripped-down monitor shell
keep the product resource split from #205 / #209 intact while restoring threads / traces / leases / evaluation monitor flows
move the compat monitor shell to a light-theme ops UI and fix the graft regression where /api/monitor/threads crashed when evaluation_jobs was absent

Why This PR Exists

#182 has the right monitor product surface, but it is hundreds of commits behind current dev
#209 has the right resource split and Supabase-aware seams, but it was built on the wrong monitor baseline
this branch combines the two honestly: compat monitor surface from #182, resource split + wiring work from #209

What Changed In This Cut

cherry-picked the compat monitor backend/frontend and SWE-bench runner from #182
kept the existing /api/resources/* product split from #209
kept /api/monitor/resources as the global monitor/admin surface
reshaped the compat monitor shell to a lighter ops UI:
- default landing route is now /threads
- primary nav is now Threads / Traces / Leases / Eval
- Diverged is folded into Leases as a filtered view (/leases?diverged=1)
- Events remains reachable contextually from leases
added a compat regression guard so monitor threads still render when the local DB has never created evaluation_jobs

Real Evidence

Playwright against local compat monitor branch instance (8013 backend / 5175 monitor)
- /threads rendered the light monitor shell and hit GET /api/monitor/threads?offset=0&limit=50 => 200
- /evaluation rendered the eval shell and hit GET /api/monitor/evaluations?limit=30&offset=0 => 200
- /leases?diverged=1 rendered the filtered lease view with contextual links back to all leases / events

Verification

env -u ALL_PROXY -u all_proxy -u HTTPS_PROXY -u https_proxy -u HTTP_PROXY -u http_proxy uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py
cd frontend/monitor && npm run build
env -u ALL_PROXY -u all_proxy -u HTTPS_PROXY -u https_proxy -u HTTP_PROXY -u http_proxy uv run ruff format --check backend/web/monitor.py tests/Unit/monitor/test_monitor_compat.py

Links

Refs Split user-visible Resources from global monitor overview #205
Follow-up to feat(monitor): transplant compat monitor and swebench runner #182
Supersedes Design resource observability split #209

shuxueshuxue · 2026-04-06T09:31:38Z

Latest light-theme compat monitor pass is now on branch at .

Fresh proof from the active integration worktree:

build: green

==================================== ERRORS ====================================
______ ERROR collecting tests/Integration/test_monitor_resources_route.py ______
ImportError while importing test module '/Users/lexicalmathical/worktrees/leonai--issue-205-monitor-transplant/tests/Integration/test_monitor_resources_route.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/importlib/init.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tests/Integration/test_monitor_resources_route.py:3: in
from backend.web.main import app
backend/web/main.py:83: in
from backend.web.routers import ( # noqa: E402
backend/web/routers/marketplace.py:15: in
from backend.web.services import marketplace_client
backend/web/services/marketplace_client.py:20: in
_hub_client = httpx.Client(timeout=30.0)
^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/httpx/_client.py:700: in init
else self._init_proxy_transport(
.venv/lib/python3.12/site-packages/httpx/_client.py:750: in _init_proxy_transport
return HTTPTransport(
.venv/lib/python3.12/site-packages/httpx/_transports/default.py:191: in init
raise ImportError(
E ImportError: Using SOCKS proxy, but the 'socksio' package is not installed. Make sure to install httpx using pip install httpx[socks].
=========================== short test summary info ============================
ERROR tests/Integration/test_monitor_resources_route.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
1 error in 0.55s:

Playwright caller proof on local monitor ( backend / monitor):
- renders the light shell and hits
- renders the cleaned evaluation surface and hits
- renders the config modal as a real overlay with structured scope/profile/runtime sections
- still renders the filtered lease view
Browser console noise on those passes is only missing ; no route/runtime error surfaced

Scope of this pass:

closed the missing CSS seams for evaluation flow cards, progress bars, pagination, buttons, composer modal, and responsive layout
kept the compat monitor surface intact () instead of collapsing back to the reduced console
updated the local spec/plan docs to record that is the real continuation branch on top of

shuxueshuxue · 2026-04-06T09:31:50Z

Latest light-theme compat monitor pass is now on branch issue-205-monitor-transplant at dfb6e80.

Fresh proof from the active integration worktree:

frontend/monitor build: green
uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py: 5 passed
Playwright caller proof on local monitor (8013 backend / 5175 monitor):
- /threads renders the light shell and hits GET /api/monitor/threads?offset=0&limit=50 -> 200
- /evaluation renders the cleaned evaluation surface and hits GET /api/monitor/evaluations?limit=30&offset=0 -> 200
- /evaluation?new=1 renders the config modal as a real overlay with structured scope/profile/runtime sections
- /leases?diverged=1 still renders the filtered lease view
Browser console noise on those passes is only missing favicon.ico; no route/runtime error surfaced

Scope of this pass:

closed the missing CSS seams for evaluation flow cards, progress bars, pagination, buttons, composer modal, and responsive layout
kept the compat monitor surface intact (Threads / Traces / Leases / Eval) instead of collapsing back to the reduced console
updated the local spec/plan docs to record that Integrate compat monitor into resource observability split #210 is the real continuation branch on top of PR feat(monitor): transplant compat monitor and swebench runner #182

shuxueshuxue · 2026-04-06T09:39:00Z

Follow-up trace/detail pass is now on branch issue-205-monitor-transplant at c598cf4.

This cut stays inside the compat monitor frontend and does not widen backend/storage scope.

Fresh proof:

frontend/monitor build: green
Playwright on local monitor (8013 backend / 5175 monitor):
- /traces now renders the denser ops-table variant instead of roomy detail-page spacing
- /thread/steer-cancel-poison-thread?run=e7922ab2-20c2-472a-93d1-d9f166584075 now shows explicit empty states for missing sessions / related leases
- the same thread detail promotes Live Trace into its own panel instead of leaving it visually flat
- trace detail no longer defaults raw payload open on every tool event; payloads are now drill-down only
- conversation cards now carry actor-colored borders and trace toolbar/timeline spacing is tighter
browser console noise is still only the missing favicon.ico 404

Touched files:

frontend/monitor/src/App.tsx
frontend/monitor/src/styles.css

shuxueshuxue · 2026-04-06T09:44:06Z

Detail-page follow-up is now on branch issue-205-monitor-transplant at ceaac92.

This pass stayed frontend-only and targeted the remaining deep monitor pages.

Fresh proof:

frontend/monitor build: green
real local evaluation created for UI proof: eval-20260406-174046-72ea94
- payload was the minimal backend-accepted local run: count=1, run_eval=false, local sandbox
- backend detail polling showed status=provisional with threads count still 0, which is enough to exercise the provisional detail surface honestly
Playwright deep-page proof on local monitor (8013 backend / 5175 monitor):
- /session/sess-8aa0018fc6ae now keeps compact facts but adds forward links to thread trace and lease detail
- /evaluation/eval-20260406-174046-72ea94 now groups Config vs Score, wraps progress in its own panel, and uses a chip summary bar instead of one overloaded muted line
- empty-thread table state on evaluation detail now reads as intentional empty state instead of a raw table row
- loading/error handling for session/evaluation detail now fail visibly inside the page shell instead of spinning or dropping to a bare div
browser console noise remains only missing favicon.ico 404

Touched files:

frontend/monitor/src/App.tsx
frontend/monitor/src/styles.css

shuxueshuxue · 2026-04-06T09:47:15Z

One more frontend-only correctness pass is now on branch issue-205-monitor-transplant at cc8a156.

This closes a real drill-down bug discovered via Playwright, not just polish.

Fresh facts:

real session detail proof on /session/sess-8aa0018fc6ae showed a lease link that drills into /lease/lease-c8fdd1c9f7f6
backend probe on /api/monitor/lease/lease-c8fdd1c9f7f6 returns 404 Lease not found
before this commit, LeaseDetailPage stayed on a forever-loading shell because it had no error handling

What changed:

LeaseDetailPage and EventDetailPage now catch fetch failures and render explicit page-scoped errors instead of spinning on bare Loading...
LeaseDetailPage also gets explicit empty states for related threads and lease events
this keeps the final monitor drill-downs fail-loudly and diagnosable inside the shared shell

Verification:

frontend/monitor build: green
Playwright on /lease/lease-c8fdd1c9f7f6 now shows a visible in-shell error instead of infinite loading
/event/evt-1eb2992cd49942e8b85b5335d935adb5 still renders normally

shuxueshuxue · 2026-04-06T09:54:27Z

Latest closeout on top of cc8a1566 is now pushed in 286df418.

What changed:

EvaluationDetailPage status chip now carries semantic state styling (provisional / completed_with_errors -> warning, error -> danger, completed -> success) instead of leaving the primary status flatter than the secondary publishable chip.
The Score grid block was re-indented to match the real JSX/DOM hierarchy and reduce future edit risk.
Design spec/checkpoint updated to reflect the final frontend review boundary.

Fresh verification:

cd frontend/monitor && npm run build
env -u ALL_PROXY -u all_proxy uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py
Playwright real-page proof on http://127.0.0.1:5175/evaluation/eval-20260406-174046-72ea94 now shows the leading status chip as eval-summary-chip chip-warning for the provisional run.

No new blocking findings from the final Claude review; remaining notes were non-blocking debt on list pages.

shuxueshuxue · 2026-04-06T10:10:36Z

Pushed 6bbb29e1 to close nu-61 (monitor threads pagination honesty).

What changed:

backend/web/monitor.py::list_threads() now builds one combined thread fact list, sorts it once, and paginates once.
Removed the double-pagination bug where SQL already applied LIMIT/OFFSET and Python then sliced items[offset:offset+limit] again.
Added a regression test in tests/Unit/monitor/test_monitor_compat.py proving page 2 is not sliced empty.
Updated the local spec/plan to mark thread pagination as a first-order honesty seam.

Fresh verification:

env -u ALL_PROXY -u all_proxy uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py -> 6 passed
env -u ALL_PROXY -u all_proxy uv run ruff check backend/web/monitor.py tests/Unit/monitor/test_monitor_compat.py
env -u ALL_PROXY -u all_proxy uv run ruff format --check backend/web/monitor.py tests/Unit/monitor/test_monitor_compat.py
live API: GET /api/monitor/threads?offset=50&limit=50 now returns count=24, page=2, has_next=false, next_offset=null
live Playwright: /threads -> click Next -> UI now shows Showing 51-74 of 74 | page 2 with 24 rows instead of an empty table

This does not yet solve the larger IA questions (dashboard/resources/orphan lease semantics/provisional eval operator surface), but the thread list contract is no longer lying.

shuxueshuxue · 2026-04-06T10:29:20Z

D4 phase-1 landed on 6dea2adb.

What changed:

added /api/monitor/dashboard and a dashboard landing page for compat monitor
changed top nav to Dashboard / Threads / Resources / Eval
removed the redundant nav caption
added a first-class global Resources page backed by /api/monitor/resources + /api/monitor/leases
grouped lease health into Diverged, Orphans, and collapsed All leases
collapsed eval tutorial/reference blocks by default so the eval table is the first-screen operator surface

Fresh proof:

cd frontend/monitor && npm run build
env -u ALL_PROXY -u all_proxy uv run pytest -q tests/Integration/test_monitor_resources_route.py tests/Unit/monitor/test_monitor_compat.py -> 7 passed
Playwright real route proof:
- / resolves to /dashboard
- /dashboard requests /api/monitor/dashboard -> 200
- /resources requests /api/monitor/resources -> 200 and /api/monitor/leases -> 200
browser console noise remains only missing favicon.ico

Honest boundary:

this is not the final monitor UX pass yet
provider detail is stronger now but still lighter than the app resource surface family
lease regrouping is now readable, but deeper semantics still belong to nu-63
provisional eval detail still needs a stronger operator-facing artifact/log story in nu-62

shuxueshuxue · 2026-04-06T10:41:47Z

Lease semantics follow-up landed on 0e3fc3a7.

What changed:

moved lease semantic projection out of backend/web/monitor.py into backend/web/services/monitor_service.py
compat /api/monitor/leases route now delegates instead of owning new lease business logic
kept the current lease contract (items + summary + groups + semantics) but lifted it onto the monitor service layer

Fresh proof:

env -u ALL_PROXY -u all_proxy uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py -> 9 passed
env -u ALL_PROXY -u all_proxy uv run ruff check backend/web/services/monitor_service.py backend/web/monitor.py backend/web/routers/monitor.py tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py
cd frontend/monitor && npm run build
live route proof still holds: /api/monitor/leases returns summary/groups, and /resources still calls /api/monitor/resources + /api/monitor/leases

Why this matters:

the original compat monitor still uses SQLite in places, but this keeps the new lease semantics off the SQLite route file itself
the new behavior now sits on a more database-agnostic monitor service seam instead of expanding raw compat SQL further

Honest boundary:

this does not remove all SQLite from compat monitor
it only shrinks the change surface and lifts the new lease semantics to the existing service abstraction

shuxueshuxue · 2026-04-06T10:51:13Z

Latest cut on closes the first honest pass on D2 provisional eval UX. Backend now exposes from , so the compat route still owns fact retrieval but not the new operator interpretation. On the real provisional run , the detail page now opens with , explicit artifact/log paths, and actionable next steps; the old sparse provisional score grid is folded behind instead of occupying the first screen. Fresh proof on this head: .... [100%]
4 passed in 0.08s, All checks passed!, 0 errors, 0 warnings, 0 informations,

leon-monitor@0.0.0 build
tsc --noEmit && vite build

vite v7.3.1 building client environment for production...
transforming...
✓ 41 modules transformed.
rendering chunks...
computing gzip size...
dist/index.html 0.41 kB │ gzip: 0.28 kB
dist/assets/index-b40waDQn.css 18.46 kB │ gzip: 4.04 kB
dist/assets/index-CQkRGwYx.js 302.98 kB │ gzip: 88.35 kB
✓ built in 446ms, and real Playwright on confirmed the page now surfaces , , , , and the next-step checklist above the fold.

shuxueshuxue · 2026-04-06T10:51:20Z

Latest cut on 257b8383 closes the first honest pass on D2 provisional eval UX. Backend now exposes info.operator_surface from backend/web/services/monitor_service.py, so the compat route still owns fact retrieval but not the new operator interpretation. On the real provisional run eval-20260406-174046-72ea94, the detail page now opens with Operator Status, explicit artifact/log paths, and actionable next steps; the old sparse provisional score grid is folded behind Score artifacts (provisional) instead of occupying the first screen.

Fresh proof on this head:

uv run pytest -q tests/Unit/monitor/test_monitor_compat.py
uv run ruff check backend/web/services/monitor_service.py backend/web/monitor.py tests/Unit/monitor/test_monitor_compat.py
uv run pyright backend/web/services/monitor_service.py
cd frontend/monitor && npm run build
real Playwright on http://127.0.0.1:5175/evaluation/eval-20260406-174046-72ea94

That browser pass confirmed the page now surfaces Runner exited before evaluation threads materialized, Run manifest, STDOUT log, STDERR log, and the next-step checklist above the fold.

shuxueshuxue · 2026-04-06T10:57:37Z

Latest cut on a99357b8 lands D3 phase-2. I kept the original summary/groups/items lease contract intact for compatibility, but added backend-owned triage.summary and triage.groups in backend/web/services/monitor_service.py. The reason for that extra layer is live data: the local monitor no longer meaningfully reads as just 29 diverged; the real split is 3 active drift + 26 detached residue, and frontend-only regrouping over the old coarse buckets cannot express that difference honestly.

Claude/CCM suggested a bounded frontend-only regrouping that reused the existing groups[] surface. I kept the useful part of that advice (present lifecycle groups in the monitor Resources page, not one flat alarming blob) but rejected the frontend-only version because the backend contract was too coarse for the real rows.

Fresh proof on this head:

uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py
uv run ruff check backend/web/services/monitor_service.py backend/web/monitor.py tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py
uv run pyright backend/web/services/monitor_service.py
cd frontend/monitor && npm run build
real Playwright on http://127.0.0.1:5175/resources

That browser pass now reads the live local dataset as ACTIVE DRIFT 3 / DETACHED RESIDUE 26 / ORPHAN CLEANUP 0 / HEALTHY 0, which is much closer to what an operator actually needs to know.

shuxueshuxue · 2026-04-06T11:03:13Z

Latest cut on 87ca560 lands D4 phase-2 polish on the monitor resources surface. This does not change the contract split: monitor still reads the global /api/monitor/resources and /api/monitor/leases paths, while the product page stays on /api/resources/*.

What changed is the UI quality of the monitor provider surface itself:

provider cards now carry a product-like status light, compact metric cells, capability strip, and session-dot strip
selected provider detail now reads like a real panel instead of a loose stats stack
null telemetry no longer renders as fake 0.0 on unavailable providers

One deliberate monitor-specific deviation from the product page remains: unavailable providers stay clickable here, because ops needs to inspect bad providers rather than having the card disabled.

Fresh proof on this head:

cd frontend/monitor && npm run build
real Playwright on http://127.0.0.1:5175/resources

The live browser pass confirmed the global resources page still reads the real local dataset while presenting the provider surface with much higher scan quality than the previous plain metric-card version.

shuxueshuxue · 2026-04-06T11:06:19Z

Latest cut on 8e10d90 lands D4 phase-3 on the monitor resources page. The selected-provider pane no longer jumps straight from summary pills to the raw session table; it now adds a lease-card drill-down layer first.

Why this matters: on the monitor side, the raw session table is still the truth surface, but it is also the noisiest surface. The new lease card grid gives operators a product-like intermediate layer without importing product components or violating the contract split.

Current flow on /resources is now:

global provider cards
selected provider panel
lease card grid grouped by lease
raw session table
global lease-health triage

Fresh proof on this head:

cd frontend/monitor && npm run build
real Playwright on http://127.0.0.1:5175/resources

That browser pass confirmed the selected provider detail now shows Leases (26) with lease cards before the raw Sessions (26) table.

shuxueshuxue · 2026-04-06T11:09:15Z

Latest cut on 24d09b6 cleans up the legacy /leases entry. Even though the top nav now pushes operators toward /resources, the old route still exists and was too raw.

This pass keeps the route alive but changes the first screen to:

triage summary pills
backend-owned attention buckets (active_drift, detached_residue, orphan_cleanup, healthy_capacity)
collapsed full raw table afterward

That keeps the truth surface intact while removing the old “single alarming dump” first impression.

shuxueshuxue · 2026-04-06T11:15:53Z

D2 follow-up is now on branch at 857adb7.

This round hardens the eval operator contract instead of adding more UI polish:

operator_surface now carries a typed kind so eval detail can distinguish bootstrap_failure, running_waiting_for_threads, running_active, completed_with_errors, completed_publishable, and provisional_waiting_for_summary
artifact handling is now honest: all six artifact slots stay visible and each one is marked present or missing instead of silently dropping missing paths
added artifact_summary so the page can say what exists vs what is still absent without making the frontend infer from free-text
added focused unit coverage for bootstrap failure, running-without-thread-rows-yet, and completed-with-errors

Fresh verification on this commit:

uv run pytest -q tests/Unit/monitor/test_monitor_compat.py
uv run ruff check backend/web/services/monitor_service.py tests/Unit/monitor/test_monitor_compat.py
uv run ruff format --check backend/web/services/monitor_service.py tests/Unit/monitor/test_monitor_compat.py
uv run pyright backend/web/services/monitor_service.py
real local eval detail at http://127.0.0.1:8013/api/monitor/evaluation/eval-20260406-174046-72ea94 now returns kind=bootstrap_failure and artifact_summary={present:4,missing:2,total:6}

This keeps the change backend-first and database-agnostic: the compat monitor route still owns fact retrieval, while operator interpretation stays in backend/web/services/monitor_service.py.

shuxueshuxue · 2026-04-06T11:24:06Z

Monitor frontend follow-up is now on branch at b6c1d35.

This round stayed deliberately small and frontend-only:

dashboard Infra Health metrics for Diverged leases and Orphans now deep-link directly to /resources#lease-health instead of dropping operators at the top of the resources page
monitor provider cards are tighter: duplicated paused/stopped footer counts are gone, and provider error/unavailable reason now sits in the header block instead of stretching card height
monitor lease-health now defaults to non-empty attention buckets only; Healthy Capacity moved behind a collapsed details shell so passive healthy inventory stops competing with active drift

This is the same contract surface as before. No backend API changes in this cut.

Fresh verification on this commit:

cd frontend/monitor && npm run build
Playwright on http://127.0.0.1:5175/dashboard
Playwright on http://127.0.0.1:5175/resources#lease-health
snapshot proof shows dashboard inline links point at /resources#lease-health
snapshot proof shows current lease-health first screen renders only Active Drift and Detached Residue for the local dataset, with empty groups suppressed

This is a density/attention cut, not a structural rewrite. The next honest gap remains the deeper sandbox-sheet / drill-down family if we want true parity with the product resources experience.

shuxueshuxue · 2026-04-06T11:31:50Z

Monitor resources drill-down follow-up is now on branch at 2014e01.

This round adds the smallest local deep-drill layer I could land without touching backend contracts:

selected lease cards on the monitor resources page now drive a dedicated Lease Detail panel
the panel surfaces lease/thread quick links, member, started time, grouped session rows, and keeps the full provider session table below as the truth surface
this stays monitor-local and contract-preserving: no new API fields, no product component imports, just better use of existing provider/session/lease payload data

Fresh verification on this commit:

cd frontend/monitor && npm run build
npx prettier --check frontend/monitor/src/App.tsx frontend/monitor/src/styles.css
Playwright on http://127.0.0.1:5175/resources
snapshot proof in monitor-resources-lease-detail-3.yaml now contains Lease Detail plus Open lease / Open thread links
browser console is clean except for the pre-existing favicon.ico 404

This is the next small step toward product-level resources parity without violating the monitor/product contract split.

shuxueshuxue · 2026-04-06T11:36:21Z

Monitor resources follow-up is now on branch at b42d9e5.

This round keeps the new lease drill-down but makes the truth table below obey it:

provider session table now defaults to Selected lease scope instead of always dumping every provider session row
operators can explicitly switch back to All provider sessions when they want the full noisy table
this turns lease card -> Lease Detail -> session rows into one coherent drill-down path while still preserving the provider-wide truth surface

Still no backend changes. Existing monitor payload only.

Fresh verification on this commit:

cd frontend/monitor && npm run build
npx prettier --check frontend/monitor/src/App.tsx frontend/monitor/src/styles.css
Playwright on http://127.0.0.1:5175/resources
snapshot proof in monitor-resources-session-scope.yaml now shows Lease Detail plus the Selected lease / All provider sessions toggle
same snapshot shows default count narrowed to the selected lease for the current local dataset: Sessions (1)
browser console is clean except for the pre-existing favicon.ico 404

This is still a contract-preserving information-ordering cut, not a backend expansion.

shuxueshuxue · 2026-04-06T14:18:50Z

UI 现代化 / hierarchy slice pushed in 1e486e59.

What changed:

monitor shell now leans harder into console hierarchy instead of flat cards
dashboard now has a primary infra-health hero plus secondary workload/eval stack
resources now keeps summary metrics in sticky context, lightens provider selection cards, and pushes lease truth surfaces into clearer primary vs recessed layers
evaluation now uses a split layout: recessed current-submission aside + primary evaluation table, with Open Config moved to shell header

Fresh verification:

cd frontend/monitor && npm run build
real-page Playwright checks on /dashboard, /resources, /evaluation
browser console still only shows the old missing favicon.ico 404

shuxueshuxue · 2026-04-06T14:24:55Z

Resources console split pushed in 3b661116.

What changed:

/resources now has a real split-console structure instead of one long vertical stack
left side is a lighter provider selection rail
right side is the selected-provider work surface (telemetry, lease groups, scoped sessions)
global lease health remains below as a separate truth section
responsive collapse keeps the rail flowing back to one column on narrow widths

Fresh verification:

cd frontend/monitor && npm run build
real-page Playwright check on /resources

shuxueshuxue · 2026-04-06T14:28:00Z

Lease-detail density slice pushed in a935bf08.

What changed:

selected lease panel now reads as a tighter context bar instead of a generic detail card
member/thread/started/status moved into compact context tiles
scoped sessions table is denser and explicitly labels when it is scoped to the selected lease
no backend/API changes

Fresh verification:

cd frontend/monitor && npm run build
real-page Playwright check on /resources
confirmed live DOM contains Lease Detail, Open lease, Open thread, and scoped to selected lease

shuxueshuxue · 2026-04-06T16:37:49Z

Landed nu-74 first slice: bounded monitor cleanup is now live.

What landed

backend contract: POST /api/monitor/resources/cleanup
allowed first action: cleanup_residue
allowed first categories only: detached_residue, orphan_cleanup
response contract: attempted / cleaned / skipped / errors / refreshed_summary
monitor UI: Resources -> Lease Health now exposes per-row Cleanup only for backlog rows
no product /resources cleanup controls
no optimistic disappearance; page re-fetches after action and shows explicit feedback

Proof

env -u ALL_PROXY -u all_proxy uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py -> 17 passed
uv run ruff check backend/web/services/monitor_service.py backend/web/routers/monitor.py tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py -> green
uv run ruff format --check backend/web/services/monitor_service.py backend/web/routers/monitor.py tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py -> green
uv run pyright backend/web/services/monitor_service.py backend/web/routers/monitor.py -> 0 errors, 0 warnings, 0 informations
cd frontend/monitor && npm run build -> green
Playwright snapshot proof on http://127.0.0.1:5175/resources after a real click:
- .playwright-cli/cleanup-sweep-after.yaml contains Cleanup applied: 1 lease cleaned from detached_residue.
- same snapshot shows Detached Residue (24) after the click

Honest boundary

first slice is per-row only
bulk cleanup ergonomics and broader cleanup controls are still not implemented
live/healthy leases still fail loudly and are intentionally excluded from this slice

shuxueshuxue · 2026-04-06T16:45:32Z

Landed another bounded cleanup UX slice on top of nu-74:

monitor Resources -> Lease Health now exposes Cleanup visible for the currently rendered backlog rows in Detached Residue and Cleanup Backlog
this still uses the same backend contract (POST /api/monitor/resources/cleanup) with explicit visible lease_ids; it does not add a hidden bulk backend mode
no cleanup controls were added to product /resources

Fresh proof:

cd frontend/monitor && npm run build -> green
Playwright caller-proof on http://127.0.0.1:5175/resources clicked Cleanup visible
resulting snapshot cleanup-bulk-verify-after.yaml shows:
- Cleanup applied: 8 leases cleaned from detached_residue.
- Detached Residue (8)

This keeps the first bulk affordance honest: explicit visible rows only, re-fetch-backed state change only.

shuxueshuxue · 2026-04-06T16:49:58Z

Added the first misoperation guardrail for monitor cleanup on top of nu-74:

Cleanup visible no longer mutates immediately on first click
group cleanup now stages an inline Confirm cleanup / Cancel row inside Resources -> Lease Health
single-row Cleanup stays one-click; only multi-lease actions get the extra fence
backend contract is unchanged: still POST /api/monitor/resources/cleanup with explicit lease_ids

Fresh proof:

cd frontend/monitor && npm run build -> green
red-state snapshot before click: cleanup-confirm-before.yaml shows Cleanup visible and no confirm row
pending snapshot after first click: cleanup-confirm-pending.yaml shows:
- Confirm cleanup
- Remove 8 visible leases from Detached Residue.
final snapshot after confirm: cleanup-confirm-after.yaml shows:
- Cleanup applied: 8 leases cleaned from detached_residue.

This keeps the first bulk cleanup affordance honest and harder to fat-finger without widening the backend or polluting product /resources.

shuxueshuxue · 2026-04-07T02:32:15Z

Brutal integration update after rebasing #210 onto latest dev and landing compatibility fix .

Fresh controlled runtime:

backend: with full Supabase + Postgres contract
monitor frontend:
app frontend: (started with )

What is green on this branch:

direct ->
->
- ->
app login through the dev proxy is green once the app dev server is started with ; browser lands on
rebase follow-up compatibility pack is back to green:
- targeted pack:
- on touched files:
- touched / : green

What is NOT being counted as #210 regression:

the earlier app login failure was local bringup noise: the dev server had been started without , so Vite kept proxying to stale
app is intentional latest-dev behavior
browser still fails locally because the frontend store reads or falls back to direct ; without a live hub this yields
thread routes still fail if is unavailable; on fresh sweep , , and fail loudly with , while , , and still escalate through the same provider-init seam and return
discriminator proof: the same authenticated brutal sweep against local thread is healthy (, , , , , )

So the current branch verdict is: #210's monitor/resource facade surface is healthy on latest dev; remaining browser failures found in this sweep are either local bringup mistakes or existing latest-dev marketplace/provider dependency debt, not monitor-transplant regressions.

shuxueshuxue · 2026-04-07T02:32:33Z

Brutal integration update after rebasing #210 onto latest dev and landing compatibility fix 0e183cf.

Fresh controlled runtime:

backend: :8014 with full Supabase + Postgres contract
monitor frontend: :5176
app frontend: :5187 (started with LEON_BACKEND_PORT=8014)

What is green on this branch:

direct POST /api/auth/login -> 200
/api/monitor/{health,resources,leases,dashboard} -> 200
/api/resources/overview + /api/resources/overview/refresh -> 200
app login through the dev proxy is green once the app dev server is started with LEON_BACKEND_PORT=8014; browser lands on /chat
rebase follow-up compatibility pack is back to green:
- pytest targeted pack: 19 passed
- pyright on touched files: 0 errors
- touched ruff / ruff format --check: green

What is NOT being counted as #210 regression:

the earlier app login failure was local bringup noise: the dev server had been started without LEON_BACKEND_PORT, so Vite kept proxying to stale :8012
app /resources -> /marketplace is intentional latest-dev behavior
browser /marketplace still fails locally because the frontend store reads VITE_MYCEL_HUB_URL or falls back to direct http://localhost:8090; without a live hub this yields ERR_CONNECTION_REFUSED
daytona_selfhost thread routes still fail if daytona_sdk is unavailable; on fresh sweep /permissions, /session, and /lease fail loudly with 503, while /detail, /runtime, and /tasks still escalate through the same provider-init seam and return 500
discriminator proof: the same authenticated brutal sweep against local thread m_50tMO7PmFp7f-18 is healthy (detail=200, permissions=200, runtime=200, tasks=200, lease=200, session=404 no session)

So the current branch verdict is: #210's monitor/resource facade surface is healthy on latest dev; remaining browser failures found in this sweep are either local bringup mistakes or existing latest-dev marketplace/provider dependency debt, not monitor-transplant regressions.

This was referenced Apr 6, 2026

feat(monitor): transplant compat monitor and swebench runner #182

Closed

Design resource observability split #209

Closed

shuxueshuxue added 3 commits April 7, 2026 09:34

chore: open resource observability split workstream

8eba762

docs: require playwright trace proofs for resources split

4a5835d

feat: split product resources from monitor routes

3e17be8

shuxueshuxue added 13 commits April 7, 2026 09:49

fix: serve historical lease detail

2fcc358

fix: guide empty run traces to events

e4c64e2

fix: classify lease detail under leases shell

51e8226

refactor: hide redundant thread lease links

f57f24c

fix: keep monitor resources honest without lease groups

3f3c8f6

style: format monitor app shell

4ec2fd1

test: mark eval composer modal for sweep proofs

aa71306

fix: honor monitor deep links after async load

3cb8b13

feat: add bounded monitor cleanup contract

72ca872

feat: add monitor cleanup controls

8990ea9

docs: dedupe cleanup slice spec

f97d7a5

feat: add visible cleanup controls

69d6e87

feat: confirm visible cleanup actions

0d2f997

shuxueshuxue force-pushed the issue-205-monitor-transplant branch from bcce157 to 0d2f997 Compare April 7, 2026 01:49

fix: restore resource compatibility after dev rebase

0e183cf

shuxueshuxue added 13 commits April 7, 2026 10:34

fix: add dialog semantics to operator guide

412622b

fix: add dialog semantics to eval composer

1c225f1

fix: restore modal escape focus return

524a8ee

fix: trap modal tab focus

ff3aed7

fix: restore monitor resource triage snapshot

8ef107d

fix: stop repeated conversation error polling

df82778

fix: restore compat lease deep links

a592f6c

fix: surface monitor page load errors

d1a6569

fix: stabilize app permissions polling and resources build

18b8cf9

fix: surface app thread load failures

a91a909

fix: hide fake task status on failed threads

94418cf

fix: disable app input on failed threads

5dea600

fix: defer failed thread side-effects

f769e97

Conversation

shuxueshuxue commented Apr 6, 2026

Summary

Why This PR Exists

What Changed In This Cut

Real Evidence

Verification

Links

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 6, 2026

Uh oh!

shuxueshuxue commented Apr 7, 2026

Uh oh!

shuxueshuxue commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant