Skip to content

Add experiment-run domain model#40

Merged
cm2435 merged 6 commits intomainfrom
feature/experiment-run-domain-model
Apr 27, 2026
Merged

Add experiment-run domain model#40
cm2435 merged 6 commits intomainfrom
feature/experiment-run-domain-model

Conversation

@cm2435
Copy link
Copy Markdown
Contributor

@cm2435 cm2435 commented Apr 27, 2026

Summary

  • Adds experiments as the launch-level aggregate and migrates runs to reference experiment_id plus workflow_definition_id.
  • Introduces explicit ergon experiment define / ergon experiment run lifecycle services, API routes, and CLI commands.
  • Updates cohort views to list experiments, updates real-LLM harness launch flow, and refreshes test seeding paths.

Test plan

  • uv run pytest tests/unit -q
  • uv run pytest tests/integration/propagation tests/integration/swebench_verified -q
  • pnpm --dir ergon-dashboard typecheck
  • uv run python -m compileall ergon_core/ergon_core ergon_cli/ergon_cli tests/unit/runtime tests/real_llm/benchmarks/test_researchrubrics.py
  • uv run ruff check <new Python files> --select F,E,W,I --output-format=concise

Made with Cursor

cm2435 added 5 commits April 26, 2026 23:16
Document the overnight RQ1 hillclimb and fix the runtime paths needed for workflow-CLI agents to spawn and schedule specialist subtasks during real-LLM rollouts.

Made-with: Cursor
Centralize model resolution in core so provider-prefixed cloud targets use OpenRouter consistently, while preserving typed logprob payloads without the previous API import cycle.

Made-with: Cursor
Introduce experiments as the launch-level aggregate so multi-sample jobs create separate workflow runs under one experiment, with CLI/API/dashboard surfaces using the new lifecycle.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown

E2E smoke — researchrubrics

No PNG screenshots were uploaded for this leg. See screenshots/pr-40 for the uploaded placeholder.

@github-actions
Copy link
Copy Markdown

E2E smoke — minif2f

No PNG screenshots were uploaded for this leg. See screenshots/pr-40 for the uploaded placeholder.

@github-actions
Copy link
Copy Markdown

E2E smoke — swebench-verified

No PNG screenshots were uploaded for this leg. See screenshots/pr-40 for the uploaded placeholder.

Adds experiment analytics surfaces, stricter generated contracts, CLI/domain cleanup, and an application-level health probe so smoke CI waits for completed API startup before submitting cohorts.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown

E2E smoke — minif2f

Screenshots pushed to screenshots/pr-40.

minif2f/cohort-ci-smoke-minif2f-20260427T145630.png
minif2f/d131398b-65ec-4eb9-834c-0d4ce46a6a68-activity-stack.png
minif2f/d131398b-65ec-4eb9-834c-0d4ce46a6a68-sad.png
minif2f/d131398b-65ec-4eb9-834c-0d4ce46a6a68-visual-debugger-full.png

@github-actions
Copy link
Copy Markdown

E2E smoke — swebench-verified

Screenshots pushed to screenshots/pr-40.

swebench-verified/0f173b80-36b3-4bcf-a27c-85a78a755bb4-activity-stack.png
swebench-verified/0f173b80-36b3-4bcf-a27c-85a78a755bb4-sad.png
swebench-verified/0f173b80-36b3-4bcf-a27c-85a78a755bb4-visual-debugger-full.png
swebench-verified/cohort-ci-smoke-swebench-verified-20260427T145632.png

@github-actions
Copy link
Copy Markdown

E2E smoke — researchrubrics

Screenshots pushed to screenshots/pr-40.

researchrubrics/266eca07-2aef-4c69-9704-c2625d498982-activity-stack.png
researchrubrics/266eca07-2aef-4c69-9704-c2625d498982-sad.png
researchrubrics/266eca07-2aef-4c69-9704-c2625d498982-visual-debugger-full.png
researchrubrics/cohort-ci-smoke-researchrubrics-20260427T145645.png

@cm2435 cm2435 changed the base branch from feature/finish-agent-workflow-cli to main April 27, 2026 15:00
@cm2435 cm2435 merged commit 4592bb1 into main Apr 27, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant