Skip to content

Integrate compat monitor into resource observability split#210

Draft
shuxueshuxue wants to merge 74 commits intodevfrom
issue-205-monitor-transplant
Draft

Integrate compat monitor into resource observability split#210
shuxueshuxue wants to merge 74 commits intodevfrom
issue-205-monitor-transplant

Conversation

@shuxueshuxue
Copy link
Copy Markdown
Collaborator

Summary

  • transplant the richer compat monitor from #182 onto the current resource-observability branch instead of continuing from the stripped-down monitor shell
  • keep the product resource split from #205 / #209 intact while restoring threads / traces / leases / evaluation monitor flows
  • move the compat monitor shell to a light-theme ops UI and fix the graft regression where /api/monitor/threads crashed when evaluation_jobs was absent

Why This PR Exists

  • #182 has the right monitor product surface, but it is hundreds of commits behind current dev
  • #209 has the right resource split and Supabase-aware seams, but it was built on the wrong monitor baseline
  • this branch combines the two honestly: compat monitor surface from #182, resource split + wiring work from #209

What Changed In This Cut

  • cherry-picked the compat monitor backend/frontend and SWE-bench runner from #182
  • kept the existing /api/resources/* product split from #209
  • kept /api/monitor/resources as the global monitor/admin surface
  • reshaped the compat monitor shell to a lighter ops UI:
    • default landing route is now /threads
    • primary nav is now Threads / Traces / Leases / Eval
    • Diverged is folded into Leases as a filtered view (/leases?diverged=1)
    • Events remains reachable contextually from leases
  • added a compat regression guard so monitor threads still render when the local DB has never created evaluation_jobs

Real Evidence

  • Playwright against local compat monitor branch instance (8013 backend / 5175 monitor)
    • /threads rendered the light monitor shell and hit GET /api/monitor/threads?offset=0&limit=50 => 200
    • /evaluation rendered the eval shell and hit GET /api/monitor/evaluations?limit=30&offset=0 => 200
    • /leases?diverged=1 rendered the filtered lease view with contextual links back to all leases / events

Verification

  • env -u ALL_PROXY -u all_proxy -u HTTPS_PROXY -u https_proxy -u HTTP_PROXY -u http_proxy uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py
  • cd frontend/monitor && npm run build
  • env -u ALL_PROXY -u all_proxy -u HTTPS_PROXY -u https_proxy -u HTTP_PROXY -u http_proxy uv run ruff format --check backend/web/monitor.py tests/Unit/monitor/test_monitor_compat.py

Links

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Latest light-theme compat monitor pass is now on branch at .

Fresh proof from the active integration worktree:

  • build: green

==================================== ERRORS ====================================
______ ERROR collecting tests/Integration/test_monitor_resources_route.py ______
ImportError while importing test module '/Users/lexicalmathical/worktrees/leonai--issue-205-monitor-transplant/tests/Integration/test_monitor_resources_route.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../.local/share/uv/python/cpython-3.12.11-macos-aarch64-none/lib/python3.12/importlib/init.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tests/Integration/test_monitor_resources_route.py:3: in
from backend.web.main import app
backend/web/main.py:83: in
from backend.web.routers import ( # noqa: E402
backend/web/routers/marketplace.py:15: in
from backend.web.services import marketplace_client
backend/web/services/marketplace_client.py:20: in
_hub_client = httpx.Client(timeout=30.0)
^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/httpx/_client.py:700: in init
else self._init_proxy_transport(
.venv/lib/python3.12/site-packages/httpx/_client.py:750: in _init_proxy_transport
return HTTPTransport(
.venv/lib/python3.12/site-packages/httpx/_transports/default.py:191: in init
raise ImportError(
E ImportError: Using SOCKS proxy, but the 'socksio' package is not installed. Make sure to install httpx using pip install httpx[socks].
=========================== short test summary info ============================
ERROR tests/Integration/test_monitor_resources_route.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
1 error in 0.55s:

  • Playwright caller proof on local monitor ( backend / monitor):
    • renders the light shell and hits
    • renders the cleaned evaluation surface and hits
    • renders the config modal as a real overlay with structured scope/profile/runtime sections
    • still renders the filtered lease view
  • Browser console noise on those passes is only missing ; no route/runtime error surfaced

Scope of this pass:

  • closed the missing CSS seams for evaluation flow cards, progress bars, pagination, buttons, composer modal, and responsive layout
  • kept the compat monitor surface intact () instead of collapsing back to the reduced console
  • updated the local spec/plan docs to record that is the real continuation branch on top of

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Latest light-theme compat monitor pass is now on branch issue-205-monitor-transplant at dfb6e80.

Fresh proof from the active integration worktree:

  • frontend/monitor build: green
  • uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py: 5 passed
  • Playwright caller proof on local monitor (8013 backend / 5175 monitor):
    • /threads renders the light shell and hits GET /api/monitor/threads?offset=0&limit=50 -> 200
    • /evaluation renders the cleaned evaluation surface and hits GET /api/monitor/evaluations?limit=30&offset=0 -> 200
    • /evaluation?new=1 renders the config modal as a real overlay with structured scope/profile/runtime sections
    • /leases?diverged=1 still renders the filtered lease view
  • Browser console noise on those passes is only missing favicon.ico; no route/runtime error surfaced

Scope of this pass:

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Follow-up trace/detail pass is now on branch issue-205-monitor-transplant at c598cf4.

This cut stays inside the compat monitor frontend and does not widen backend/storage scope.

Fresh proof:

  • frontend/monitor build: green
  • Playwright on local monitor (8013 backend / 5175 monitor):
    • /traces now renders the denser ops-table variant instead of roomy detail-page spacing
    • /thread/steer-cancel-poison-thread?run=e7922ab2-20c2-472a-93d1-d9f166584075 now shows explicit empty states for missing sessions / related leases
    • the same thread detail promotes Live Trace into its own panel instead of leaving it visually flat
    • trace detail no longer defaults raw payload open on every tool event; payloads are now drill-down only
    • conversation cards now carry actor-colored borders and trace toolbar/timeline spacing is tighter
  • browser console noise is still only the missing favicon.ico 404

Touched files:

  • frontend/monitor/src/App.tsx
  • frontend/monitor/src/styles.css

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Detail-page follow-up is now on branch issue-205-monitor-transplant at ceaac92.

This pass stayed frontend-only and targeted the remaining deep monitor pages.

Fresh proof:

  • frontend/monitor build: green
  • real local evaluation created for UI proof: eval-20260406-174046-72ea94
    • payload was the minimal backend-accepted local run: count=1, run_eval=false, local sandbox
    • backend detail polling showed status=provisional with threads count still 0, which is enough to exercise the provisional detail surface honestly
  • Playwright deep-page proof on local monitor (8013 backend / 5175 monitor):
    • /session/sess-8aa0018fc6ae now keeps compact facts but adds forward links to thread trace and lease detail
    • /evaluation/eval-20260406-174046-72ea94 now groups Config vs Score, wraps progress in its own panel, and uses a chip summary bar instead of one overloaded muted line
    • empty-thread table state on evaluation detail now reads as intentional empty state instead of a raw table row
    • loading/error handling for session/evaluation detail now fail visibly inside the page shell instead of spinning or dropping to a bare div
  • browser console noise remains only missing favicon.ico 404

Touched files:

  • frontend/monitor/src/App.tsx
  • frontend/monitor/src/styles.css

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

One more frontend-only correctness pass is now on branch issue-205-monitor-transplant at cc8a156.

This closes a real drill-down bug discovered via Playwright, not just polish.

Fresh facts:

  • real session detail proof on /session/sess-8aa0018fc6ae showed a lease link that drills into /lease/lease-c8fdd1c9f7f6
  • backend probe on /api/monitor/lease/lease-c8fdd1c9f7f6 returns 404 Lease not found
  • before this commit, LeaseDetailPage stayed on a forever-loading shell because it had no error handling

What changed:

  • LeaseDetailPage and EventDetailPage now catch fetch failures and render explicit page-scoped errors instead of spinning on bare Loading...
  • LeaseDetailPage also gets explicit empty states for related threads and lease events
  • this keeps the final monitor drill-downs fail-loudly and diagnosable inside the shared shell

Verification:

  • frontend/monitor build: green
  • Playwright on /lease/lease-c8fdd1c9f7f6 now shows a visible in-shell error instead of infinite loading
  • /event/evt-1eb2992cd49942e8b85b5335d935adb5 still renders normally

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Latest closeout on top of cc8a1566 is now pushed in 286df418.

What changed:

  • EvaluationDetailPage status chip now carries semantic state styling (provisional / completed_with_errors -> warning, error -> danger, completed -> success) instead of leaving the primary status flatter than the secondary publishable chip.
  • The Score grid block was re-indented to match the real JSX/DOM hierarchy and reduce future edit risk.
  • Design spec/checkpoint updated to reflect the final frontend review boundary.

Fresh verification:

  • cd frontend/monitor && npm run build
  • env -u ALL_PROXY -u all_proxy uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py
  • Playwright real-page proof on http://127.0.0.1:5175/evaluation/eval-20260406-174046-72ea94 now shows the leading status chip as eval-summary-chip chip-warning for the provisional run.

No new blocking findings from the final Claude review; remaining notes were non-blocking debt on list pages.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Pushed 6bbb29e1 to close nu-61 (monitor threads pagination honesty).

What changed:

  • backend/web/monitor.py::list_threads() now builds one combined thread fact list, sorts it once, and paginates once.
  • Removed the double-pagination bug where SQL already applied LIMIT/OFFSET and Python then sliced items[offset:offset+limit] again.
  • Added a regression test in tests/Unit/monitor/test_monitor_compat.py proving page 2 is not sliced empty.
  • Updated the local spec/plan to mark thread pagination as a first-order honesty seam.

Fresh verification:

  • env -u ALL_PROXY -u all_proxy uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py -> 6 passed
  • env -u ALL_PROXY -u all_proxy uv run ruff check backend/web/monitor.py tests/Unit/monitor/test_monitor_compat.py
  • env -u ALL_PROXY -u all_proxy uv run ruff format --check backend/web/monitor.py tests/Unit/monitor/test_monitor_compat.py
  • live API: GET /api/monitor/threads?offset=50&limit=50 now returns count=24, page=2, has_next=false, next_offset=null
  • live Playwright: /threads -> click Next -> UI now shows Showing 51-74 of 74 | page 2 with 24 rows instead of an empty table

This does not yet solve the larger IA questions (dashboard/resources/orphan lease semantics/provisional eval operator surface), but the thread list contract is no longer lying.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

D4 phase-1 landed on 6dea2adb.

What changed:

  • added /api/monitor/dashboard and a dashboard landing page for compat monitor
  • changed top nav to Dashboard / Threads / Resources / Eval
  • removed the redundant nav caption
  • added a first-class global Resources page backed by /api/monitor/resources + /api/monitor/leases
  • grouped lease health into Diverged, Orphans, and collapsed All leases
  • collapsed eval tutorial/reference blocks by default so the eval table is the first-screen operator surface

Fresh proof:

  • cd frontend/monitor && npm run build
  • env -u ALL_PROXY -u all_proxy uv run pytest -q tests/Integration/test_monitor_resources_route.py tests/Unit/monitor/test_monitor_compat.py -> 7 passed
  • Playwright real route proof:
    • / resolves to /dashboard
    • /dashboard requests /api/monitor/dashboard -> 200
    • /resources requests /api/monitor/resources -> 200 and /api/monitor/leases -> 200
  • browser console noise remains only missing favicon.ico

Honest boundary:

  • this is not the final monitor UX pass yet
  • provider detail is stronger now but still lighter than the app resource surface family
  • lease regrouping is now readable, but deeper semantics still belong to nu-63
  • provisional eval detail still needs a stronger operator-facing artifact/log story in nu-62

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Lease semantics follow-up landed on 0e3fc3a7.

What changed:

  • moved lease semantic projection out of backend/web/monitor.py into backend/web/services/monitor_service.py
  • compat /api/monitor/leases route now delegates instead of owning new lease business logic
  • kept the current lease contract (items + summary + groups + semantics) but lifted it onto the monitor service layer

Fresh proof:

  • env -u ALL_PROXY -u all_proxy uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py -> 9 passed
  • env -u ALL_PROXY -u all_proxy uv run ruff check backend/web/services/monitor_service.py backend/web/monitor.py backend/web/routers/monitor.py tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py
  • cd frontend/monitor && npm run build
  • live route proof still holds: /api/monitor/leases returns summary/groups, and /resources still calls /api/monitor/resources + /api/monitor/leases

Why this matters:

  • the original compat monitor still uses SQLite in places, but this keeps the new lease semantics off the SQLite route file itself
  • the new behavior now sits on a more database-agnostic monitor service seam instead of expanding raw compat SQL further

Honest boundary:

  • this does not remove all SQLite from compat monitor
  • it only shrinks the change surface and lifts the new lease semantics to the existing service abstraction

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Latest cut on closes the first honest pass on D2 provisional eval UX. Backend now exposes from , so the compat route still owns fact retrieval but not the new operator interpretation. On the real provisional run , the detail page now opens with , explicit artifact/log paths, and actionable next steps; the old sparse provisional score grid is folded behind instead of occupying the first screen. Fresh proof on this head: .... [100%]
4 passed in 0.08s, All checks passed!, 0 errors, 0 warnings, 0 informations,

leon-monitor@0.0.0 build
tsc --noEmit && vite build

vite v7.3.1 building client environment for production...
transforming...
✓ 41 modules transformed.
rendering chunks...
computing gzip size...
dist/index.html 0.41 kB │ gzip: 0.28 kB
dist/assets/index-b40waDQn.css 18.46 kB │ gzip: 4.04 kB
dist/assets/index-CQkRGwYx.js 302.98 kB │ gzip: 88.35 kB
✓ built in 446ms, and real Playwright on confirmed the page now surfaces , , , , and the next-step checklist above the fold.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Latest cut on 257b8383 closes the first honest pass on D2 provisional eval UX. Backend now exposes info.operator_surface from backend/web/services/monitor_service.py, so the compat route still owns fact retrieval but not the new operator interpretation. On the real provisional run eval-20260406-174046-72ea94, the detail page now opens with Operator Status, explicit artifact/log paths, and actionable next steps; the old sparse provisional score grid is folded behind Score artifacts (provisional) instead of occupying the first screen.

Fresh proof on this head:

  • uv run pytest -q tests/Unit/monitor/test_monitor_compat.py
  • uv run ruff check backend/web/services/monitor_service.py backend/web/monitor.py tests/Unit/monitor/test_monitor_compat.py
  • uv run pyright backend/web/services/monitor_service.py
  • cd frontend/monitor && npm run build
  • real Playwright on http://127.0.0.1:5175/evaluation/eval-20260406-174046-72ea94

That browser pass confirmed the page now surfaces Runner exited before evaluation threads materialized, Run manifest, STDOUT log, STDERR log, and the next-step checklist above the fold.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Latest cut on a99357b8 lands D3 phase-2. I kept the original summary/groups/items lease contract intact for compatibility, but added backend-owned triage.summary and triage.groups in backend/web/services/monitor_service.py. The reason for that extra layer is live data: the local monitor no longer meaningfully reads as just 29 diverged; the real split is 3 active drift + 26 detached residue, and frontend-only regrouping over the old coarse buckets cannot express that difference honestly.

Claude/CCM suggested a bounded frontend-only regrouping that reused the existing groups[] surface. I kept the useful part of that advice (present lifecycle groups in the monitor Resources page, not one flat alarming blob) but rejected the frontend-only version because the backend contract was too coarse for the real rows.

Fresh proof on this head:

  • uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py
  • uv run ruff check backend/web/services/monitor_service.py backend/web/monitor.py tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py
  • uv run pyright backend/web/services/monitor_service.py
  • cd frontend/monitor && npm run build
  • real Playwright on http://127.0.0.1:5175/resources

That browser pass now reads the live local dataset as ACTIVE DRIFT 3 / DETACHED RESIDUE 26 / ORPHAN CLEANUP 0 / HEALTHY 0, which is much closer to what an operator actually needs to know.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Latest cut on 87ca560 lands D4 phase-2 polish on the monitor resources surface. This does not change the contract split: monitor still reads the global /api/monitor/resources and /api/monitor/leases paths, while the product page stays on /api/resources/*.

What changed is the UI quality of the monitor provider surface itself:

  • provider cards now carry a product-like status light, compact metric cells, capability strip, and session-dot strip
  • selected provider detail now reads like a real panel instead of a loose stats stack
  • null telemetry no longer renders as fake 0.0 on unavailable providers

One deliberate monitor-specific deviation from the product page remains: unavailable providers stay clickable here, because ops needs to inspect bad providers rather than having the card disabled.

Fresh proof on this head:

  • cd frontend/monitor && npm run build
  • real Playwright on http://127.0.0.1:5175/resources

The live browser pass confirmed the global resources page still reads the real local dataset while presenting the provider surface with much higher scan quality than the previous plain metric-card version.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Latest cut on 8e10d90 lands D4 phase-3 on the monitor resources page. The selected-provider pane no longer jumps straight from summary pills to the raw session table; it now adds a lease-card drill-down layer first.

Why this matters: on the monitor side, the raw session table is still the truth surface, but it is also the noisiest surface. The new lease card grid gives operators a product-like intermediate layer without importing product components or violating the contract split.

Current flow on /resources is now:

  1. global provider cards
  2. selected provider panel
  3. lease card grid grouped by lease
  4. raw session table
  5. global lease-health triage

Fresh proof on this head:

  • cd frontend/monitor && npm run build
  • real Playwright on http://127.0.0.1:5175/resources

That browser pass confirmed the selected provider detail now shows Leases (26) with lease cards before the raw Sessions (26) table.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Latest cut on 24d09b6 cleans up the legacy /leases entry. Even though the top nav now pushes operators toward /resources, the old route still exists and was too raw.

This pass keeps the route alive but changes the first screen to:

  • triage summary pills
  • backend-owned attention buckets (active_drift, detached_residue, orphan_cleanup, healthy_capacity)
  • collapsed full raw table afterward

That keeps the truth surface intact while removing the old “single alarming dump” first impression.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

D2 follow-up is now on branch at 857adb7.

This round hardens the eval operator contract instead of adding more UI polish:

  • operator_surface now carries a typed kind so eval detail can distinguish bootstrap_failure, running_waiting_for_threads, running_active, completed_with_errors, completed_publishable, and provisional_waiting_for_summary
  • artifact handling is now honest: all six artifact slots stay visible and each one is marked present or missing instead of silently dropping missing paths
  • added artifact_summary so the page can say what exists vs what is still absent without making the frontend infer from free-text
  • added focused unit coverage for bootstrap failure, running-without-thread-rows-yet, and completed-with-errors

Fresh verification on this commit:

  • uv run pytest -q tests/Unit/monitor/test_monitor_compat.py
  • uv run ruff check backend/web/services/monitor_service.py tests/Unit/monitor/test_monitor_compat.py
  • uv run ruff format --check backend/web/services/monitor_service.py tests/Unit/monitor/test_monitor_compat.py
  • uv run pyright backend/web/services/monitor_service.py
  • real local eval detail at http://127.0.0.1:8013/api/monitor/evaluation/eval-20260406-174046-72ea94 now returns kind=bootstrap_failure and artifact_summary={present:4,missing:2,total:6}

This keeps the change backend-first and database-agnostic: the compat monitor route still owns fact retrieval, while operator interpretation stays in backend/web/services/monitor_service.py.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Monitor frontend follow-up is now on branch at b6c1d35.

This round stayed deliberately small and frontend-only:

  • dashboard Infra Health metrics for Diverged leases and Orphans now deep-link directly to /resources#lease-health instead of dropping operators at the top of the resources page
  • monitor provider cards are tighter: duplicated paused/stopped footer counts are gone, and provider error/unavailable reason now sits in the header block instead of stretching card height
  • monitor lease-health now defaults to non-empty attention buckets only; Healthy Capacity moved behind a collapsed details shell so passive healthy inventory stops competing with active drift

This is the same contract surface as before. No backend API changes in this cut.

Fresh verification on this commit:

This is a density/attention cut, not a structural rewrite. The next honest gap remains the deeper sandbox-sheet / drill-down family if we want true parity with the product resources experience.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Monitor resources drill-down follow-up is now on branch at 2014e01.

This round adds the smallest local deep-drill layer I could land without touching backend contracts:

  • selected lease cards on the monitor resources page now drive a dedicated Lease Detail panel
  • the panel surfaces lease/thread quick links, member, started time, grouped session rows, and keeps the full provider session table below as the truth surface
  • this stays monitor-local and contract-preserving: no new API fields, no product component imports, just better use of existing provider/session/lease payload data

Fresh verification on this commit:

  • cd frontend/monitor && npm run build
  • npx prettier --check frontend/monitor/src/App.tsx frontend/monitor/src/styles.css
  • Playwright on http://127.0.0.1:5175/resources
  • snapshot proof in monitor-resources-lease-detail-3.yaml now contains Lease Detail plus Open lease / Open thread links
  • browser console is clean except for the pre-existing favicon.ico 404

This is the next small step toward product-level resources parity without violating the monitor/product contract split.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Monitor resources follow-up is now on branch at b42d9e5.

This round keeps the new lease drill-down but makes the truth table below obey it:

  • provider session table now defaults to Selected lease scope instead of always dumping every provider session row
  • operators can explicitly switch back to All provider sessions when they want the full noisy table
  • this turns lease card -> Lease Detail -> session rows into one coherent drill-down path while still preserving the provider-wide truth surface

Still no backend changes. Existing monitor payload only.

Fresh verification on this commit:

  • cd frontend/monitor && npm run build
  • npx prettier --check frontend/monitor/src/App.tsx frontend/monitor/src/styles.css
  • Playwright on http://127.0.0.1:5175/resources
  • snapshot proof in monitor-resources-session-scope.yaml now shows Lease Detail plus the Selected lease / All provider sessions toggle
  • same snapshot shows default count narrowed to the selected lease for the current local dataset: Sessions (1)
  • browser console is clean except for the pre-existing favicon.ico 404

This is still a contract-preserving information-ordering cut, not a backend expansion.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

UI 现代化 / hierarchy slice pushed in 1e486e59.

What changed:

  • monitor shell now leans harder into console hierarchy instead of flat cards
  • dashboard now has a primary infra-health hero plus secondary workload/eval stack
  • resources now keeps summary metrics in sticky context, lightens provider selection cards, and pushes lease truth surfaces into clearer primary vs recessed layers
  • evaluation now uses a split layout: recessed current-submission aside + primary evaluation table, with Open Config moved to shell header

Fresh verification:

  • cd frontend/monitor && npm run build
  • real-page Playwright checks on /dashboard, /resources, /evaluation
  • browser console still only shows the old missing favicon.ico 404

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Resources console split pushed in 3b661116.

What changed:

  • /resources now has a real split-console structure instead of one long vertical stack
  • left side is a lighter provider selection rail
  • right side is the selected-provider work surface (telemetry, lease groups, scoped sessions)
  • global lease health remains below as a separate truth section
  • responsive collapse keeps the rail flowing back to one column on narrow widths

Fresh verification:

  • cd frontend/monitor && npm run build
  • real-page Playwright check on /resources

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Lease-detail density slice pushed in a935bf08.

What changed:

  • selected lease panel now reads as a tighter context bar instead of a generic detail card
  • member/thread/started/status moved into compact context tiles
  • scoped sessions table is denser and explicitly labels when it is scoped to the selected lease
  • no backend/API changes

Fresh verification:

  • cd frontend/monitor && npm run build
  • real-page Playwright check on /resources
  • confirmed live DOM contains Lease Detail, Open lease, Open thread, and scoped to selected lease

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Landed nu-74 first slice: bounded monitor cleanup is now live.

What landed

  • backend contract: POST /api/monitor/resources/cleanup
  • allowed first action: cleanup_residue
  • allowed first categories only: detached_residue, orphan_cleanup
  • response contract: attempted / cleaned / skipped / errors / refreshed_summary
  • monitor UI: Resources -> Lease Health now exposes per-row Cleanup only for backlog rows
  • no product /resources cleanup controls
  • no optimistic disappearance; page re-fetches after action and shows explicit feedback

Proof

  • env -u ALL_PROXY -u all_proxy uv run pytest -q tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py -> 17 passed
  • uv run ruff check backend/web/services/monitor_service.py backend/web/routers/monitor.py tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py -> green
  • uv run ruff format --check backend/web/services/monitor_service.py backend/web/routers/monitor.py tests/Unit/monitor/test_monitor_compat.py tests/Integration/test_monitor_resources_route.py -> green
  • uv run pyright backend/web/services/monitor_service.py backend/web/routers/monitor.py -> 0 errors, 0 warnings, 0 informations
  • cd frontend/monitor && npm run build -> green
  • Playwright snapshot proof on http://127.0.0.1:5175/resources after a real click:
    • .playwright-cli/cleanup-sweep-after.yaml contains Cleanup applied: 1 lease cleaned from detached_residue.
    • same snapshot shows Detached Residue (24) after the click

Honest boundary

  • first slice is per-row only
  • bulk cleanup ergonomics and broader cleanup controls are still not implemented
  • live/healthy leases still fail loudly and are intentionally excluded from this slice

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Landed another bounded cleanup UX slice on top of nu-74:

  • monitor Resources -> Lease Health now exposes Cleanup visible for the currently rendered backlog rows in Detached Residue and Cleanup Backlog
  • this still uses the same backend contract (POST /api/monitor/resources/cleanup) with explicit visible lease_ids; it does not add a hidden bulk backend mode
  • no cleanup controls were added to product /resources

Fresh proof:

  • cd frontend/monitor && npm run build -> green
  • Playwright caller-proof on http://127.0.0.1:5175/resources clicked Cleanup visible
  • resulting snapshot cleanup-bulk-verify-after.yaml shows:
    • Cleanup applied: 8 leases cleaned from detached_residue.
    • Detached Residue (8)

This keeps the first bulk affordance honest: explicit visible rows only, re-fetch-backed state change only.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Added the first misoperation guardrail for monitor cleanup on top of nu-74:

  • Cleanup visible no longer mutates immediately on first click
  • group cleanup now stages an inline Confirm cleanup / Cancel row inside Resources -> Lease Health
  • single-row Cleanup stays one-click; only multi-lease actions get the extra fence
  • backend contract is unchanged: still POST /api/monitor/resources/cleanup with explicit lease_ids

Fresh proof:

  • cd frontend/monitor && npm run build -> green
  • red-state snapshot before click: cleanup-confirm-before.yaml shows Cleanup visible and no confirm row
  • pending snapshot after first click: cleanup-confirm-pending.yaml shows:
    • Confirm cleanup
    • Remove 8 visible leases from Detached Residue.
  • final snapshot after confirm: cleanup-confirm-after.yaml shows:
    • Cleanup applied: 8 leases cleaned from detached_residue.

This keeps the first bulk cleanup affordance honest and harder to fat-finger without widening the backend or polluting product /resources.

@shuxueshuxue shuxueshuxue force-pushed the issue-205-monitor-transplant branch from bcce157 to 0d2f997 Compare April 7, 2026 01:49
@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Brutal integration update after rebasing #210 onto latest dev and landing compatibility fix .

Fresh controlled runtime:

  • backend: with full Supabase + Postgres contract
  • monitor frontend:
  • app frontend: (started with )

What is green on this branch:

  • direct ->
  • ->
    • ->
  • app login through the dev proxy is green once the app dev server is started with ; browser lands on
  • rebase follow-up compatibility pack is back to green:
    • targeted pack:
    • on touched files:
    • touched / : green

What is NOT being counted as #210 regression:

  • the earlier app login failure was local bringup noise: the dev server had been started without , so Vite kept proxying to stale
  • app is intentional latest-dev behavior
  • browser still fails locally because the frontend store reads or falls back to direct ; without a live hub this yields
  • thread routes still fail if is unavailable; on fresh sweep , , and fail loudly with , while , , and still escalate through the same provider-init seam and return
  • discriminator proof: the same authenticated brutal sweep against local thread is healthy (, , , , , )

So the current branch verdict is: #210's monitor/resource facade surface is healthy on latest dev; remaining browser failures found in this sweep are either local bringup mistakes or existing latest-dev marketplace/provider dependency debt, not monitor-transplant regressions.

@shuxueshuxue
Copy link
Copy Markdown
Collaborator Author

Brutal integration update after rebasing #210 onto latest dev and landing compatibility fix 0e183cf.

Fresh controlled runtime:

  • backend: :8014 with full Supabase + Postgres contract
  • monitor frontend: :5176
  • app frontend: :5187 (started with LEON_BACKEND_PORT=8014)

What is green on this branch:

  • direct POST /api/auth/login -> 200
  • /api/monitor/{health,resources,leases,dashboard} -> 200
  • /api/resources/overview + /api/resources/overview/refresh -> 200
  • app login through the dev proxy is green once the app dev server is started with LEON_BACKEND_PORT=8014; browser lands on /chat
  • rebase follow-up compatibility pack is back to green:
    • pytest targeted pack: 19 passed
    • pyright on touched files: 0 errors
    • touched ruff / ruff format --check: green

What is NOT being counted as #210 regression:

  • the earlier app login failure was local bringup noise: the dev server had been started without LEON_BACKEND_PORT, so Vite kept proxying to stale :8012
  • app /resources -> /marketplace is intentional latest-dev behavior
  • browser /marketplace still fails locally because the frontend store reads VITE_MYCEL_HUB_URL or falls back to direct http://localhost:8090; without a live hub this yields ERR_CONNECTION_REFUSED
  • daytona_selfhost thread routes still fail if daytona_sdk is unavailable; on fresh sweep /permissions, /session, and /lease fail loudly with 503, while /detail, /runtime, and /tasks still escalate through the same provider-init seam and return 500
  • discriminator proof: the same authenticated brutal sweep against local thread m_50tMO7PmFp7f-18 is healthy (detail=200, permissions=200, runtime=200, tasks=200, lease=200, session=404 no session)

So the current branch verdict is: #210's monitor/resource facade surface is healthy on latest dev; remaining browser failures found in this sweep are either local bringup mistakes or existing latest-dev marketplace/provider dependency debt, not monitor-transplant regressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant