Skip to content

Add MAS visual debugger activity stack#38

Merged
cm2435 merged 7 commits intomainfrom
feature/mas-run-visual-debugger-plan
Apr 26, 2026
Merged

Add MAS visual debugger activity stack#38
cm2435 merged 7 commits intomainfrom
feature/mas-run-visual-debugger-plan

Conversation

@cm2435
Copy link
Copy Markdown
Contributor

@cm2435 cm2435 commented Apr 26, 2026

Summary

  • Add a frontend-derived MAS activity domain and activity stack timeline that shows overlapping executions, graph mutations, messages, artifacts, evaluations, context events, and sandbox spans without backend DTO changes.
  • Wire the run workspace so timeline activity selection highlights graph tasks and opens a time-aware task workspace.
  • Add deterministic concurrent MAS fixtures, semantic layout tests, browser geometry checks, and local-only PNG screenshot dumping for visual review.

Test plan

  • pnpm exec tsx --test src/features/activity/buildRunActivities.test.ts src/features/activity/stackLayout.test.ts src/features/activity/goldenFixture.test.ts src/features/graph/layout/goldenLayout.test.ts src/components/workspace/filterTaskEvidenceForTime.test.ts && pnpm run test:contracts
  • pnpm run typecheck && pnpm run lint
  • VISUAL_DEBUGGER_SCREENSHOTS=1 pnpm exec playwright test tests/e2e/activity-stack.spec.ts --project=chromium
  • pnpm exec playwright test tests/e2e/activity-stack.spec.ts tests/e2e/run.snapshot.spec.ts --project=chromium
  • rm -rf .next && pnpm run build

Visual review artifacts

Local PNGs were generated at:

  • ergon-dashboard/tmp/visual-debugger/run-full.png
  • ergon-dashboard/tmp/visual-debugger/graph-canvas.png
  • ergon-dashboard/tmp/visual-debugger/activity-stack.png
  • ergon-dashboard/tmp/visual-debugger/workspace-open.png

Made with Cursor

Derive run activity from existing dashboard state and mutations so MAS runs can be replayed as a full graph plus overlapping activity stack without backend DTO changes.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown

E2E smoke — researchrubrics

Screenshots pushed to screenshots/pr-38.

researchrubrics/27960731-6488-4f41-a4f9-a70996402082-activity-stack.png
researchrubrics/27960731-6488-4f41-a4f9-a70996402082-sad.png
researchrubrics/27960731-6488-4f41-a4f9-a70996402082-visual-debugger-full.png
researchrubrics/48bfdc43-dfee-4216-9417-e6f585c3ceb1-activity-stack.png
researchrubrics/48bfdc43-dfee-4216-9417-e6f585c3ceb1-happy.png
researchrubrics/48bfdc43-dfee-4216-9417-e6f585c3ceb1-visual-debugger-full.png
researchrubrics/6cf2af2f-1b0a-434c-bb0b-befb90dd918a-activity-stack.png
researchrubrics/6cf2af2f-1b0a-434c-bb0b-befb90dd918a-happy.png
researchrubrics/6cf2af2f-1b0a-434c-bb0b-befb90dd918a-visual-debugger-full.png
researchrubrics/cohort-ci-smoke-researchrubrics-20260426T124516.png

@github-actions
Copy link
Copy Markdown

E2E smoke — minif2f

Screenshots pushed to screenshots/pr-38.

minif2f/328d78d1-f564-4e74-84f1-6e36ca8de772-activity-stack.png
minif2f/328d78d1-f564-4e74-84f1-6e36ca8de772-happy.png
minif2f/328d78d1-f564-4e74-84f1-6e36ca8de772-visual-debugger-full.png
minif2f/cc5df3b1-68f0-4e9b-8460-ed9614c8f203-activity-stack.png
minif2f/cc5df3b1-68f0-4e9b-8460-ed9614c8f203-happy.png
minif2f/cc5df3b1-68f0-4e9b-8460-ed9614c8f203-visual-debugger-full.png
minif2f/cohort-ci-smoke-minif2f-20260426T125138.png
minif2f/d4db0b1e-76de-4502-9787-bbcf60d90ca5-activity-stack.png
minif2f/d4db0b1e-76de-4502-9787-bbcf60d90ca5-happy.png
minif2f/d4db0b1e-76de-4502-9787-bbcf60d90ca5-visual-debugger-full.png

@github-actions
Copy link
Copy Markdown

E2E smoke — researchrubrics

No PNG screenshots were uploaded for this leg. See screenshots/pr-38 for the uploaded placeholder.

@github-actions
Copy link
Copy Markdown

E2E smoke — minif2f

No PNG screenshots were uploaded for this leg. See screenshots/pr-38 for the uploaded placeholder.

@github-actions
Copy link
Copy Markdown

E2E smoke — swebench-verified

Screenshots pushed to screenshots/pr-38.

swebench-verified/414190aa-d152-4cf5-825f-d80a4901226a-activity-stack.png
swebench-verified/414190aa-d152-4cf5-825f-d80a4901226a-happy.png
swebench-verified/414190aa-d152-4cf5-825f-d80a4901226a-visual-debugger-full.png
swebench-verified/74cdb60a-7bf0-467f-986f-9c31c515dc51-activity-stack.png
swebench-verified/74cdb60a-7bf0-467f-986f-9c31c515dc51-happy.png
swebench-verified/74cdb60a-7bf0-467f-986f-9c31c515dc51-visual-debugger-full.png
swebench-verified/7e7a3d36-20ba-4bbd-a536-352056b6ad5a-activity-stack.png
swebench-verified/7e7a3d36-20ba-4bbd-a536-352056b6ad5a-happy.png
swebench-verified/7e7a3d36-20ba-4bbd-a536-352056b6ad5a-visual-debugger-full.png
swebench-verified/cohort-ci-smoke-swebench-verified-20260426T125750.png

@github-actions
Copy link
Copy Markdown

E2E smoke — swebench-verified

No PNG screenshots were uploaded for this leg. See screenshots/pr-38 for the uploaded placeholder.

cm2435 added 3 commits April 26, 2026 22:28
Tighten the dashboard trace/debugger experience, make smoke e2e runs exercise the canonical sad path, and move sandbox test doubles behind explicit test-support boundaries so production sandbox setup fails loudly when E2B is not configured.

Made-with: Cursor
Resolve the execute_task comment conflict while preserving the skipped-task contract violation path from the sandbox boundary cleanup.

Made-with: Cursor
Apply Ruff formatting and regenerate dashboard contracts so the Python and frontend drift checks agree with the committed sources.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown

E2E smoke — swebench-verified

No PNG screenshots were uploaded for this leg. See screenshots/pr-38 for the uploaded placeholder.

@github-actions
Copy link
Copy Markdown

E2E smoke — minif2f

No PNG screenshots were uploaded for this leg. See screenshots/pr-38 for the uploaded placeholder.

@github-actions
Copy link
Copy Markdown

E2E smoke — researchrubrics

No PNG screenshots were uploaded for this leg. See screenshots/pr-38 for the uploaded placeholder.

@github-actions
Copy link
Copy Markdown

E2E smoke — swebench-verified

No PNG screenshots were uploaded for this leg. See screenshots/pr-38 for the uploaded placeholder.

@github-actions
Copy link
Copy Markdown

E2E smoke — researchrubrics

No PNG screenshots were uploaded for this leg. See screenshots/pr-38 for the uploaded placeholder.

@github-actions
Copy link
Copy Markdown

E2E smoke — minif2f

No PNG screenshots were uploaded for this leg. See screenshots/pr-38 for the uploaded placeholder.

Regenerate REST OpenAPI contracts, carry cancelled task counts through dashboard state, and clean up Python suppression/type-check issues from the sandbox boundary refactor.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown

E2E smoke — researchrubrics

No PNG screenshots were uploaded for this leg. See screenshots/pr-38 for the uploaded placeholder.

@github-actions
Copy link
Copy Markdown

E2E smoke — swebench-verified

No PNG screenshots were uploaded for this leg. See screenshots/pr-38 for the uploaded placeholder.

@github-actions
Copy link
Copy Markdown

E2E smoke — minif2f

No PNG screenshots were uploaded for this leg. See screenshots/pr-38 for the uploaded placeholder.

Keep generated REST contracts lint-clean, update the e2e workflow guard for parallel smoke jobs, and rebase the thread-summary migration onto the latest main migration head.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown

E2E smoke — minif2f

Screenshots pushed to screenshots/pr-38.

minif2f/6268c1ce-29d4-44c8-8c12-aef297e2636c-activity-stack.png
minif2f/6268c1ce-29d4-44c8-8c12-aef297e2636c-sad.png
minif2f/6268c1ce-29d4-44c8-8c12-aef297e2636c-visual-debugger-full.png
minif2f/cohort-ci-smoke-minif2f-20260426T214846.png

@github-actions
Copy link
Copy Markdown

E2E smoke — researchrubrics

Screenshots pushed to screenshots/pr-38.

researchrubrics/372dd6ef-272d-48bb-af0d-29548ba3211f-activity-stack.png
researchrubrics/372dd6ef-272d-48bb-af0d-29548ba3211f-sad.png
researchrubrics/372dd6ef-272d-48bb-af0d-29548ba3211f-visual-debugger-full.png
researchrubrics/cohort-ci-smoke-researchrubrics-20260426T214908.png

@github-actions
Copy link
Copy Markdown

E2E smoke — swebench-verified

Screenshots pushed to screenshots/pr-38.

swebench-verified/cohort-ci-smoke-swebench-verified-20260426T214956.png
swebench-verified/f2c01297-9236-4a5b-99c3-9948f4f32606-activity-stack.png
swebench-verified/f2c01297-9236-4a5b-99c3-9948f4f32606-sad.png
swebench-verified/f2c01297-9236-4a5b-99c3-9948f4f32606-visual-debugger-full.png

@cm2435 cm2435 merged commit 1e640c6 into main Apr 26, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant