feat(monitor): transplant compat monitor and swebench runner#182
feat(monitor): transplant compat monitor and swebench runner#182shaluoyan523 wants to merge 4 commits intoOpenDCAI:mainfrom
Conversation
|
Follow-up branch is now open as #210. This continues the resource-observability work on top of the compat monitor surface restored here, while keeping the newer split and Supabase-aware wiring from the current dev line. I am using #182 as the monitor surface source, but transplanting it into a current branch rather than rebasing this PR forward hundreds of commits. |
|
Follow-up branch is now open as #210. This continues the resource-observability work on top of the compat monitor surface restored here, while keeping the newer /api/resources/* split and Supabase-aware wiring from the current dev line. I am using #182 as the monitor surface source, but transplanting it into a current branch rather than rebasing this PR forward hundreds of commits. |
|
Closing this in favor of #210.\n\nReason: #182 is the original compat monitor transplant baseline, but #210 is now the active integration branch that carries this monitor surface forward onto current dev together with the resource observability split and subsequent monitor UX/drill-down work. Please review #210 as the canonical follow-up. |
Monitor 兼容版移植说明
变更背景
当前
main分支上的 monitor 已退化为较早期的 sandbox console,仅保留:而
/home/dataset-local/data1/Mycel-compat-monitor-pr93中的 monitor 已扩展出:本分支的目标是将这套兼容版 monitor 能力移植回最新版
main,并补齐当前主线上的运行环境适配。本次变更内容
1. 移植 monitor 前后端
移植并恢复了以下 monitor 能力:
EvaluationPageEvaluationDetailPageSessionDetailPageThread TraceConversation / Events / Steps多视图/api/monitor/evaluations/api/monitor/evaluation/{evaluation_id}/api/monitor/evaluation/runs/api/monitor/session/{session_id}/api/monitor/thread/{thread_id}/trace对应文件:
backend/web/monitor.pybackend/web/routers/monitor.pyfrontend/monitor/src/App.tsxfrontend/monitor/src/styles.cssfrontend/monitor/vite.config.ts2. 适配最新版 main 的后端结构
为兼容当前主线的存储拆分与路由结构,补了以下适配:
backend.web.monitor/api/monitor/health/api/monitor/resources/api/monitor/resources/refresh/api/monitor/sandbox/{lease_id}/browse/api/monitor/sandbox/{lease_id}/readSQLiteDBRole.RUN_EVENTSQLiteDBRole.SANDBOXDB_PATH3. 修复 monitor 显示异常
修复了几个会导致“看起来不对劲”的问题:
Threads页此前只看chat_sessions,运行中的 SWE-Bench 线程只写 checkpoint 时不会显示Evaluation detail在没有 session、只有 checkpoint 的阶段不会渲染线程行/api/threads/{thread_id},会因为缺少 Bearer token 报:Conversation load failed: Missing or invalid Authorization header/api/monitor/thread/{thread_id}/conversation4. 恢复 SWE-Bench 运行入口
当前主线 monitor UI 里保留了 SWE-Bench 入口,但执行脚本已经不在仓库中。为让 monitor 的 evaluation 功能可实际执行,本分支恢复了:
eval/swebench/run_slice.py并做了当前环境适配:
--eval-timeout-sec--git-timeout-sec~/.leon/models.json读取OPENAI_API_KEYLEON_SANDBOX_DB_PATH5. 补齐评测依赖声明
将 monitor 的 SWE-Bench 运行依赖加入项目依赖声明:
datasetsswebenchsocksio对应文件:
pyproject.tomluv.lock已验证内容
编译/构建验证
已完成:
python3 -m py_compile backend/web/monitor.pypython3 -m py_compile backend/web/routers/monitor.pypython3 -m py_compile eval/swebench/run_slice.pycd frontend/monitor && npm run build接口验证
已确认以下接口可用:
/api/monitor/evaluations/api/monitor/evaluation/{evaluation_id}/api/monitor/evaluation/runs/api/monitor/session/{session_id}/api/monitor/thread/{thread_id}/trace/api/monitor/thread/{thread_id}/conversation/api/monitor/resources运行态验证
已通过 monitor 发起 1 条最小 SWE-Bench 测试任务,并验证:
当前分支说明
本分支为:
monitor-compat-transplant目的:
main后续建议
建议后续继续拆两步:
backend/web/monitor.py中与 SWE-Bench runner 强绑定的逻辑进一步抽到独立 service,降低 monitor 文件体积。