feat: MLSys 2026 DAG Scheduler — Track A (Rust) + Track B (Gemini Agent)#13
feat: MLSys 2026 DAG Scheduler — Track A (Rust) + Track B (Gemini Agent)#13melroyanthony merged 20 commits intomainfrom
Conversation
- Move PROBLEM.md, example_problem.json, mlsys.h, benchmarks/ into problem/ - Create solution/ directory scaffold (requirements, checkpoints, scripts, docs) - Matches expected structure: problem/ for inputs, solution/ for outputs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Added comprehensive .gitignore - Created solution/ directory structure - Stage 0 checkpoint validated Stage: 0/5
- 29 functional requirements, 6 non-functional requirements - RICE scores for 15 features with prioritization rationale - MoSCoW: 7 Must-Have, 5 Should-Have, 3 Won't-Have - MVP scope with acceptance criteria and dependency graph - Baseline-first strategy: correct schedule before optimization Stage: 1/5
- Module decomposition: 13 Python modules in pipeline architecture - Complete latency model spec matching C++ Evaluate() function - Data model: Python dataclasses mirroring mlsys.h structs - Input/output JSON schema documentation - C4 workspace with context and container views - Pipeline: Parse → Topo sort → Baseline → Fusion → Retention → Split-K → Granularity → Serialize - ADR-001: Python language choice - ADR-002: Baseline-first development strategy - ADR-003: Greedy fusion over DP/beam search Stage: 2/5
- Created GitHub issues: yarongmu-google#39-yarongmu-google#50 (12 MVP features) - Feature branch: feat/issue-39-mlsys-scheduler Stage: 2.5/5
Track A - Rust scheduler binary: - Full optimizer pipeline: baseline → fusion → retention → split-K → granularity search - All 5 PROBLEM.md worked examples verified (9 unit tests passing) - All 5 benchmarks produce valid solutions under timeout Track B - Python Gemini agent: - Local evaluator matching C++ Evaluate() - Multi-stage optimizer: fusion, granularity, traversal, retention - Gemini 2.5 Flash integration with iterative refinement - Baseline fallback guarantees valid output without API Updated: - ADR-001: Rust for Track A, Python for Track B - Stage 2.5 checkpoint: issues on fork (melroyanthony/MLSys #1-#12) Stage: 3/5
- Track A (Rust): 15 unit tests passing (latency model, edge cases, benchmark integration) - Track B (Python): 29 pytest tests passing (all 5 PROBLEM.md examples, OOM detection, fusion correctness, serialization roundtrip) - E2E happy path script: 13/13 passing across both tracks and all 5 benchmarks - Bug fixes: 0 (all tests passed on first run) Stage: 4/5 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Track A (Rust): 15 tests passing - 9 worked-example latency model tests - 6 edge-case tests (OOM, cyclic DAG, tiny tensors, roundtrip) Track B (Python): 29 tests passing - 13 PROBLEM.md example tests - 11 edge-case tests - 5 benchmark integration tests E2E: 13/13 checks passing - Both tracks validated against all 5 benchmarks - JSON validity, coverage, granularity checks No bugs found. Stage: 4/5
- Added target/ and Cargo.lock to .gitignore - Removed accidentally committed Rust build artifacts from tracking
- README.md with quick start, architecture overview, benchmark results - CHANGELOG.md documenting all features - GitHub Actions CI: Rust build/test, Python validation, E2E benchmarks - All 15 Rust tests passing, E2E validated Stage: 5/5 - Complete
There was a problem hiding this comment.
Pull request overview
Implements the MLSys 2026 contest DAG scheduler in two tracks (Rust binary + Python agent) and adds supporting SDLC/Claude Code configuration plus CI to build/test both tracks and run E2E benchmark validation.
Changes:
- Add CI workflow to build/test Rust, smoke-test Python agent, and run E2E checks on 5 benchmarks.
- Add
solution/CHANGELOG.mddocumenting the end-to-end optimizer pipeline, tests, and artifacts. - Add/expand
.claude/configuration (agents, commands, skills, hooks, rules) and repo-level SDLC docs.
Reviewed changes
Copilot reviewed 97 out of 157 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| solution/CHANGELOG.md | Adds release notes and high-level inventory of Track A/Track B components and test coverage. |
| solution/.github/workflows/ci.yml | Introduces GitHub Actions pipeline for Rust + Python + E2E benchmark validation. |
| CLAUDE.md | Documents SDLC pipeline and available Claude Code commands/agents for this repo. |
| .claude/skills/meta/skill-generator/templates/testing.template.md | Adds skill template for generating backend/frontend test scaffolds. |
| .claude/skills/meta/skill-generator/templates/product-manager.template.md | Adds skill template for requirement analysis and prioritization artifacts. |
| .claude/skills/meta/skill-generator/templates/frontend.template.md | Adds skill template for Next.js frontend scaffolding guidance. |
| .claude/skills/meta/skill-generator/templates/devops.template.md | Adds skill template for Docker/docker-compose deployment guidance. |
| .claude/skills/meta/skill-generator/templates/database.template.md | Adds skill template for schema/migrations/query-pattern documentation. |
| .claude/skills/meta/skill-generator/templates/backend.template.md | Adds skill template for FastAPI/SQLModel backend scaffolding guidance. |
| .claude/skills/meta/skill-generator/templates/architect.template.md | Adds skill template for architecture docs (C4, data model, API). |
| .claude/skills/meta/skill-generator/SKILL.md | Defines the “skill-generator” meta skill and its artifact outputs. |
| .claude/skills/meta/session-persistence/SKILL.md | Adds a structured process for saving/resuming long-running sessions. |
| .claude/skills/meta/self-improvement/SKILL.md | Adds a workflow for iterating on the SDLC config based on gaps/learnings. |
| .claude/skills/meta/orchestrator/SOLUTION-STRUCTURE.md | Documents expected repository + solution/ structure and root-level constraints. |
| .claude/skills/meta/orchestrator/SKILL.md | Defines orchestrator stage flow, checkpoints, and git/commit protocol guidance. |
| .claude/skills/meta/orchestrator/HANDOFF-PROTOCOL.md | Defines artifacts + validation criteria when handing off between stages. |
| .claude/skills/meta/model-routing/SKILL.md | Adds model-selection guidance for different task types. |
| .claude/skills/meta/judge/SKILL.md | Adds stage validation methodology and report format for “judge”. |
| .claude/skills/meta/judge/RUBRICS.md | Adds stage scoring rubrics with weights and pass/fail thresholds. |
| .claude/skills/meta/judge/CRITIQUE-PROMPTS.md | Adds qualitative critique prompts to accompany rubric scoring. |
| .claude/skills/meta/config-validation/SKILL.md | Adds lint/validation rules for .claude/ assets and shell portability. |
| .claude/skills/foundation/verification-loop/SKILL.md | Adds a 6-phase verification loop (build/type/lint/test/security/diff). |
| .claude/skills/foundation/tdd/SKILL.md | Adds TDD guidance and coverage targets. |
| .claude/skills/foundation/system-design/APPROACH.md | Adds a time-boxed system design methodology reference. |
| .claude/skills/foundation/search-first/SKILL.md | Adds “research-before-coding” workflow and evaluation matrix. |
| .claude/skills/foundation/product-manager/SKILL.md | Adds product analysis workflow and artifact templates. |
| .claude/skills/foundation/product-manager/PRIORITIZATION.md | Adds deeper prioritization guidance (RICE/MoSCoW/Kano/etc.). |
| .claude/skills/foundation/handoff-protocol/SKILL.md | Adds generic inter-agent handoff format + validation rules. |
| .claude/skills/foundation/debugging/SKILL.md | Adds RCA/debugging/triage workflow guidance and regression-prevention patterns. |
| .claude/skills/foundation/database/MONGODB.md | Adds MongoDB async/Beanie patterns and testing notes. |
| .claude/skills/foundation/autonomous-loops/SKILL.md | Adds patterns for sequential/parallel autonomous loops and exit conditions. |
| .claude/skills/foundation/ai-integration/MODELS.md | Adds model integration notes and sample patterns for AI features. |
| .claude/settings.json | Defines tool permissions and hook execution wiring for Claude Code. |
| .claude/rules/security.md | Adds global security rules for codegen/review. |
| .claude/rules/git.md | Adds commit/PR/branching rules for the SDLC workflow. |
| .claude/rules/coding-standards.md | Adds general coding standards across languages. |
| .claude/hooks/scripts/quality-gate.sh | Adds pre-edit formatting/lint warnings per language. |
| .claude/hooks/scripts/debug-check.sh | Adds post-edit debug-artifact scanner (console.log/print/TODO/FIXME). |
| .claude/hooks/scripts/commit-reminder.sh | Adds session-stop reminder for uncommitted changes + debug statement scan. |
| .claude/commands/verify.md | Adds a slash command to run the verification loop. |
| .claude/commands/validate-e2e.md | Adds a slash command to validate Docker compose + a happy-path E2E flow. |
| .claude/commands/validate-config.md | Adds a slash command to lint .claude/ configuration consistency. |
| .claude/commands/upgrade.md | Adds a slash command to plan/execute dependency upgrades safely. |
| .claude/commands/tester.md | Adds a slash command describing Stage 4 QA/testing flow and artifacts. |
| .claude/commands/tdd.md | Adds a slash command to run a RED/GREEN/REFACTOR loop for a scope. |
| .claude/commands/security-review.md | Adds a slash command to run a security-focused review workflow. |
| .claude/commands/search-first.md | Adds a slash command to research existing solutions before coding. |
| .claude/commands/save-session.md | Adds a slash command to capture session state to a checkpoint file. |
| .claude/commands/review.md | Adds a slash command to run a staff-level code review workflow. |
| .claude/commands/resume-session.md | Adds a slash command to resume from a saved session checkpoint. |
| .claude/commands/resolve-review.md | Adds a slash command to fetch/apply/respond to PR review comments. |
| .claude/commands/readme-generator.md | Adds a slash command to generate solution/README.md and solution/CHANGELOG.md. |
| .claude/commands/product-manager.md | Adds a slash command for Stage 1 requirements work. |
| .claude/commands/postman.md | Adds a slash command to generate a Postman collection from OpenAPI. |
| .claude/commands/plan.md | Adds a slash command to create an implementation plan via planner agent. |
| .claude/commands/judge.md | Adds a slash command to validate a stage against rubrics and write checkpoints. |
| .claude/commands/investigate.md | Adds a slash command for RCA → fix → test → PR workflow. |
| .claude/commands/git-deploy.md | Adds a slash command to push generated output to GitHub via gh/git. |
| .claude/commands/generate-skill.md | Adds a slash command to scaffold a new skill directory/file. |
| .claude/commands/generate-command.md | Adds a slash command to scaffold a new command definition. |
| .claude/commands/generate-agent.md | Adds a slash command to scaffold a new agent definition. |
| .claude/commands/frontend-dev.md | Adds a slash command describing Stage 3 frontend implementation flow. |
| .claude/commands/create-pr.md | Adds a slash command describing an automated PR creation workflow. |
| .claude/commands/create-issue.md | Adds a slash command describing an automated issue creation workflow. |
| .claude/commands/commit.md | Adds a slash command describing conventional commit creation workflow. |
| .claude/commands/cloud-deploy.md | Adds a slash command for IaC generation/deploy workflows. |
| .claude/commands/backend-dev.md | Adds a slash command describing Stage 3 backend implementation flow. |
| .claude/commands/audit.md | Adds a slash command to audit .claude/ inventory vs documentation. |
| .claude/commands/architect.md | Adds a slash command describing Stage 2 architecture deliverables. |
| .claude/agents/refactor.md | Adds an agent definition for safe refactoring with verification. |
| .claude/agents/planner.md | Adds an agent definition for decomposing work into dependency-ordered units. |
| .claude/agents/judge.md | Adds an agent definition for stage validation using rubrics. |
| .claude/agents/git-deployer.md | Adds an agent definition for pushing content to GitHub (via git/gh). |
| .claude/agents/debugger.md | Adds an agent definition for RCA and debugging workflows. |
| .claude/agents/code-reviewer.md | Adds an agent definition for staff-level code review workflows. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- Removed .claude/ directory from git tracking - Added .claude/ to .gitignore to prevent future commits Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot review feedback: let Track A and Track B commands fail naturally so CI logs show actionable error output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a complete MLSys 2026 contest submission scaffold: two schedulers (Rust Track A + Python Gemini agent Track B), supporting benchmarks/problem definition, CI, E2E validation, and extensive planning/architecture documentation.
Changes:
- Introduces a Rust-based optimizer pipeline (baseline → fusion → retention → split‑K → granularity → traversal) with JSON I/O.
- Adds a Python agent that generates a local schedule and optionally refines it via Gemini, plus prompts.
- Adds CI workflows, E2E scripts, benchmark/problem files, and requirements/architecture/decision docs.
Reviewed changes
Copilot reviewed 56 out of 65 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| solution/scripts/test-e2e.sh | Local E2E runner for both tracks + JSON validation. |
| solution/requirements/rice-scores.md | Feature prioritization (RICE) for the scheduler work. |
| solution/requirements/requirements.md | Functional/NFR requirements and assumptions for the scheduler. |
| solution/requirements/mvp-scope.md | MVP scope, acceptance criteria, and risk register. |
| solution/requirements/moscow.md | MoSCoW prioritization of features. |
| solution/README.md | Project overview, quick start, architecture, and testing instructions. |
| solution/docs/decisions/ADR-003-greedy-fusion.md | Rationale for greedy fusion approach. |
| solution/docs/decisions/ADR-002-baseline-first.md | Rationale for baseline-first implementation. |
| solution/docs/decisions/ADR-001-language-python.md | Rationale for Rust (A) + Python (B) split. |
| solution/docs/architecture/workspace.dsl | Structurizr/C4 workspace describing module relationships. |
| solution/docs/architecture/user-journeys.md | User journeys for solve/evaluate/batch workflows. |
| solution/docs/architecture/security-model.md | Threat model and validation notes. |
| solution/docs/architecture/deployment-topology.md | Local CLI deployment + run/test instructions. |
| solution/docs/architecture/database-schema.md | Input/output JSON schema + internal model mapping. |
| solution/docs/architecture/data-flow.md | Pipeline data-flow diagrams for scheduling/latency/memory. |
| solution/docs/architecture/api-error-catalog.md | Error/exit-code catalog for CLI-style failures. |
| solution/checkpoints/stage-4-validation.md | Stage 4 validation write-up and reported results. |
| solution/checkpoints/stage-2.5-validation.md | Issue/branch setup checkpoint. |
| solution/checkpoints/stage-2-validation.md | Architecture checkpoint summary. |
| solution/checkpoints/stage-1-validation.md | Requirements checkpoint summary. |
| solution/checkpoints/stage-0-validation.md | Project setup checkpoint summary. |
| solution/CHANGELOG.md | Changelog describing initial 1.0.0 deliverables. |
| solution/backend/rust/src/serializer.rs | Rust solution JSON serialization. |
| solution/backend/rust/src/parser.rs | Rust problem JSON parsing + helper granularity utilities. |
| solution/backend/rust/src/optimizer/traversal.rs | Rust traversal-order (snake) optimizer. |
| solution/backend/rust/src/optimizer/splitk.rs | Rust split‑K optimization stage. |
| solution/backend/rust/src/optimizer/retention.rs | Rust tensor retention optimization stage. |
| solution/backend/rust/src/optimizer/pipeline.rs | Rust optimizer pipeline orchestration. |
| solution/backend/rust/src/optimizer/mod.rs | Rust optimizer module exports. |
| solution/backend/rust/src/optimizer/granularity.rs | Rust per-subgraph granularity search. |
| solution/backend/rust/src/optimizer/fusion.rs | Rust greedy fusion stage. |
| solution/backend/rust/src/models.rs | Rust core data model structs. |
| solution/backend/rust/src/memory.rs | Rust working-set computation + OOM checks. |
| solution/backend/rust/src/latency.rs | Rust latency model implementation. |
| solution/backend/rust/src/evaluate.rs | Rust evaluator for validating solutions. |
| solution/backend/rust/src/dag.rs | Rust DAG construction + topo sort + boundary tensor helpers. |
| solution/backend/rust/src/baseline.rs | Rust baseline schedule generator. |
| solution/backend/rust/Cargo.toml | Rust crate config + release profile settings. |
| solution/backend/pyproject.toml | Python backend project metadata + dev deps. |
| solution/backend/mlsys_scheduler/serializer.py | Python backend solution serializer. |
| solution/backend/mlsys_scheduler/parser.py | Python backend problem parser. |
| solution/backend/mlsys_scheduler/models.py | Python backend dataclasses for model types. |
| solution/backend/mlsys_scheduler/dag.py | Python backend DAG analysis utilities. |
| solution/backend/mlsys_scheduler/init.py | Python backend package init/version. |
| solution/agent/scheduler.py | Track B local optimizer pipeline (baseline/fusion/granularity/retention/traversal). |
| solution/agent/requirements.txt | Track B dependency list. |
| solution/agent/prompts/system.md | Gemini system prompt (rules/model/spec). |
| solution/agent/prompts/strategies.md | Gemini strategy guidance prompt. |
| solution/agent/prompts/examples.md | Few-shot examples from PROBLEM.md. |
| solution/agent/agent.py | Gemini-powered agent entrypoint with local validation loop. |
| solution/.github/workflows/ci.yml | CI for Rust, Python agent smoke tests, and E2E benchmark runs. |
| problem/mlsys.h | C++ reference structs + Evaluate/ReadProblem/ReadSolution signatures. |
| problem/example_problem.json | Minimal example problem JSON. |
| problem/benchmarks/mlsys-2026-9.json | Benchmark input graph #9. |
| problem/benchmarks/mlsys-2026-5.json | Benchmark input graph #5. |
| problem/benchmarks/mlsys-2026-13.json | Benchmark input graph #13. |
| problem/benchmarks/mlsys-2026-1.json | Benchmark input graph #1. |
| CLAUDE.md | Repo-level SDLC/Claude Code workflow documentation. |
| .gitignore | Ignore rules for Python/Node/Rust/tooling artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- test-e2e.sh: derive PROJECT_ROOT dynamically instead of hardcoded path - ci.yml: hash Cargo.toml instead of Cargo.lock for cache key - README.md: fix Python version requirement to 3.12+ - stage-4-validation.md: correct Rust edition to 2021, Python to 3.12 - serializer.rs: replace unwrap() with proper error propagation via ? - fusion.rs: add structural validity check for consistent boundary output dimensions before merging subgraphs - dag.rs: add bounds checking on tensor indices from input JSON
- Removed stale Python package (mlsys_scheduler/, pyproject.toml, tests/) from solution/backend/ — leftover from pre-Rust implementation plan - Fixed solution/README.md: - Corrected latency model formula (per-step roofline, not aggregate) - Added traversal.rs to project structure and key files table - Fixed Track B commands to use uv run python consistently - Filled benchmark results with actual latency values - Updated CI to use uv run python for validation scripts - Added .pytest_cache/ to .gitignore - Removed stale .venv, .pytest_cache, uv.lock from backend/
There was a problem hiding this comment.
Pull request overview
Adds a full MLSys 2026 contest “DAG scheduler” submission, including a Track A Rust CLI optimizer, a Track B Python agent/evaluator, end-to-end validation scripts, CI, benchmark fixtures, and extensive requirements/architecture documentation.
Changes:
- Implement Track A Rust scheduler pipeline (parse → DAG → optimize → serialize) with optimizer stages (fusion/retention/split‑K/granularity/traversal).
- Implement Track B Python local optimizer + Gemini agent loop, plus a Python evaluator for local validation.
- Add CI workflow, E2E script, benchmark/problem fixtures, and comprehensive docs (requirements, ADRs, architecture, checkpoints, README/CHANGELOG).
Reviewed changes
Copilot reviewed 49 out of 58 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| solution/scripts/test-e2e.sh | Adds a local E2E happy-path runner validating both tracks across all benchmarks. |
| solution/requirements/rice-scores.md | RICE scoring/prioritization for planned features. |
| solution/requirements/requirements.md | Detailed functional + non-functional requirements for the scheduler. |
| solution/requirements/mvp-scope.md | MVP feature scope, dependencies, acceptance criteria, risks. |
| solution/requirements/moscow.md | MoSCoW prioritization of features. |
| solution/README.md | Top-level submission README (tracks, usage, architecture, testing, results). |
| solution/docs/decisions/ADR-003-greedy-fusion.md | ADR documenting greedy fusion decision. |
| solution/docs/decisions/ADR-002-baseline-first.md | ADR documenting baseline-first approach. |
| solution/docs/decisions/ADR-001-language-python.md | ADR documenting Rust Track A + Python Track B split. |
| solution/docs/architecture/workspace.dsl | Structurizr/C4-style architecture model. |
| solution/docs/architecture/user-journeys.md | User journeys for solve/evaluate/batch workflows. |
| solution/docs/architecture/system-design.md | System design and model/spec details. |
| solution/docs/architecture/security-model.md | Local-CLI threat model and validation notes. |
| solution/docs/architecture/deployment-topology.md | Dev environment + how to run/test locally. |
| solution/docs/architecture/database-schema.md | Input/output schema and internal structure reference (no DB). |
| solution/docs/architecture/data-flow.md | Data flow and pipeline diagrams. |
| solution/docs/architecture/api-error-catalog.md | Error catalog and exit code conventions. |
| solution/checkpoints/stage-4-validation.md | Stage checkpoint: testing/validation summary. |
| solution/checkpoints/stage-2.5-validation.md | Stage checkpoint: issue creation/branch setup. |
| solution/checkpoints/stage-2-validation.md | Stage checkpoint: architecture/system design artifacts. |
| solution/checkpoints/stage-1-validation.md | Stage checkpoint: requirements artifacts. |
| solution/checkpoints/stage-0-validation.md | Stage checkpoint: initial project scaffolding. |
| solution/CHANGELOG.md | Changelog for the submission contents. |
| solution/backend/rust/src/serializer.rs | Rust: serialize Solution → JSON (including traversal_orders). |
| solution/backend/rust/src/parser.rs | Rust: parse Problem JSON + helpers for native granularity/K_full. |
| solution/backend/rust/src/optimizer/traversal.rs | Rust: snake/zig-zag traversal generation and latency comparison. |
| solution/backend/rust/src/optimizer/splitk.rs | Rust: split‑K application + retained-set building. |
| solution/backend/rust/src/optimizer/retention.rs | Rust: tensor retention selection between subgraphs. |
| solution/backend/rust/src/optimizer/pipeline.rs | Rust: end-to-end optimizer stage orchestration + emergency OOM fix. |
| solution/backend/rust/src/optimizer/mod.rs | Rust: optimizer module exports. |
| solution/backend/rust/src/optimizer/granularity.rs | Rust: grid search over (w,h,k) candidates under OOM constraint. |
| solution/backend/rust/src/optimizer/fusion.rs | Rust: greedy subgraph fusion + feasible granularity finder. |
| solution/backend/rust/src/models.rs | Rust: core data model types (Problem/Solution/SubgraphDef/etc.). |
| solution/backend/rust/src/memory.rs | Rust: working-set calculator, OOM check, split‑K search helper. |
| solution/backend/rust/src/latency.rs | Rust: roofline latency model + memory plan for step-by-step costs. |
| solution/backend/rust/src/evaluate.rs | Rust: local evaluator for validation in tests. |
| solution/backend/rust/src/dag.rs | Rust: DAG construction (producers/consumers), topo sort, boundary tensors. |
| solution/backend/rust/src/baseline.rs | Rust: baseline schedule builder (1 op/subgraph). |
| solution/backend/rust/Cargo.toml | Rust crate manifest and release profile settings. |
| solution/agent/scheduler.py | Python: local baseline + greedy optimizer pipeline + retention/traversal. |
| solution/agent/requirements.txt | Python: runtime dependency list (google-genai). |
| solution/agent/prompts/system.md | Gemini system prompt with model + constraints. |
| solution/agent/prompts/strategies.md | Gemini strategy guidance for optimizations. |
| solution/agent/prompts/examples.md | Gemini few-shot examples from PROBLEM.md. |
| solution/agent/agent.py | Python: Track B agent loop (local optimize + optional Gemini refinement). |
| solution/.github/workflows/ci.yml | CI: Rust build/test, Python smoke test, and E2E jobs. |
| problem/mlsys.h | C++ header defining Problem/Solution structs and evaluator interfaces. |
| problem/example_problem.json | Example input problem fixture. |
| problem/benchmarks/mlsys-2026-9.json | Benchmark 9 fixture. |
| problem/benchmarks/mlsys-2026-5.json | Benchmark 5 fixture. |
| problem/benchmarks/mlsys-2026-13.json | Benchmark 13 fixture. |
| problem/benchmarks/mlsys-2026-1.json | Benchmark 1 fixture. |
| CLAUDE.md | Repository SDLC/Claude Code configuration documentation. |
| .gitignore | Adds ignore rules for Python/Node/Rust artifacts and other tooling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Issue #8 (CLI benchmark runner): - Added evaluate subcommand: mlsys evaluate --problem <f> --solution <f> - Added parse_solution() to parser.rs - evaluate.rs now has a CLI entry point, prints PASS/FAIL + latency Correctness fix (latency.rs): - Each MatMul op now scaled by its own K_full (k / K_full_for_this_op) instead of using the first boundary MatMul's K_full for all ops - Same fix in Python evaluator: num_k_steps derived from boundary output MatMul's K_full Review comment fixes: - evaluator.py: added widths/heights length validation - test-e2e.sh: added trap handler for temp dir cleanup on exit/signal Refs #8
There was a problem hiding this comment.
Pull request overview
Adds a full MLSys 2026 DAG scheduler submission with two implementations (Rust Track A + Python/Gemini Track B), plus extensive documentation, benchmarks, and a CI/E2E validation story.
Changes:
- Introduces a Rust scheduler binary with optimizer pipeline (fusion, retention, split‑K, granularity search, traversal).
- Introduces a Python agent that runs a local optimizer first, then optionally refines via Gemini.
- Adds project documentation (requirements, architecture, ADRs, checkpoints), benchmarks, and an E2E script + CI workflow.
Reviewed changes
Copilot reviewed 49 out of 58 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| solution/scripts/test-e2e.sh | Adds local “happy path” script to build/run both tracks against all benchmarks and sanity-check JSON. |
| solution/requirements/rice-scores.md | RICE prioritization document for feature planning. |
| solution/requirements/requirements.md | Detailed functional/non-functional requirements and assumptions. |
| solution/requirements/mvp-scope.md | MVP scope definition, acceptance criteria, and risk register. |
| solution/requirements/moscow.md | MoSCoW prioritization for features. |
| solution/README.md | End-user documentation: quickstart, architecture, testing, and results. |
| solution/docs/decisions/ADR-003-greedy-fusion.md | ADR for choosing greedy fusion vs DP/beam/ILP. |
| solution/docs/decisions/ADR-002-baseline-first.md | ADR for baseline-first implementation strategy. |
| solution/docs/decisions/ADR-001-language-python.md | ADR for Rust Track A + Python Track B language split. |
| solution/docs/architecture/workspace.dsl | Structurizr DSL model describing architecture relationships. |
| solution/docs/architecture/user-journeys.md | User-journey documentation for solve/evaluate/batch flows. |
| solution/docs/architecture/system-design.md | Detailed system design + latency/working-set specification. |
| solution/docs/architecture/security-model.md | Security/threat-model doc for local CLI. |
| solution/docs/architecture/deployment-topology.md | Local “deployment”/dev topology and runbook. |
| solution/docs/architecture/database-schema.md | Data model/schema reference (no DB; JSON schemas + mappings). |
| solution/docs/architecture/data-flow.md | Pipeline and latency/working-set data-flow diagrams. |
| solution/docs/architecture/api-error-catalog.md | Error catalog and exit codes for CLI-style usage. |
| solution/checkpoints/stage-4-validation.md | Stage checkpoint summarizing tests/validation outcomes. |
| solution/checkpoints/stage-2.5-validation.md | Stage checkpoint for issue creation/branch setup. |
| solution/checkpoints/stage-2-validation.md | Stage checkpoint for architecture/system design completion. |
| solution/checkpoints/stage-1-validation.md | Stage checkpoint for requirements analysis completion. |
| solution/checkpoints/stage-0-validation.md | Stage checkpoint for initial repo setup. |
| solution/CHANGELOG.md | Changelog for the delivered solution components. |
| solution/backend/rust/src/serializer.rs | Serializes Solution to contest JSON format. |
| solution/backend/rust/src/parser.rs | Parses problem + solution JSON into Rust data structures. |
| solution/backend/rust/src/optimizer/traversal.rs | Implements snake/zig-zag traversal optimization for MatMul tiling. |
| solution/backend/rust/src/optimizer/splitk.rs | Applies split‑K when MatMul subgraphs OOM at full k. |
| solution/backend/rust/src/optimizer/retention.rs | Heuristic retention selection across subgraph boundaries. |
| solution/backend/rust/src/optimizer/pipeline.rs | Orchestrates optimizer stages into a full pipeline. |
| solution/backend/rust/src/optimizer/mod.rs | Exposes optimizer modules. |
| solution/backend/rust/src/optimizer/granularity.rs | Searches (w,h,k) candidates for best latency under memory constraint. |
| solution/backend/rust/src/optimizer/fusion.rs | Greedy fusion of adjacent subgraphs with feasibility checks. |
| solution/backend/rust/src/models.rs | Core Rust data types for Problem/Solution/etc. |
| solution/backend/rust/src/memory.rs | Working-set calculation + OOM checks + split‑K search helper. |
| solution/backend/rust/src/latency.rs | Roofline latency model + memory-transfer accounting. |
| solution/backend/rust/src/evaluate.rs | Local evaluator for validating solutions (coverage + OOM + latency). |
| solution/backend/rust/src/dag.rs | DAG construction, topo sort, and boundary/ephemeral tensor utilities. |
| solution/backend/rust/src/baseline.rs | Baseline schedule generation (1 op/subgraph). |
| solution/backend/rust/Cargo.toml | Rust crate definition and release profile settings. |
| solution/agent/scheduler.py | Python local optimizer pipeline (baseline, fusion, split‑K, granularity, traversal, retention). |
| solution/agent/requirements.txt | Python dependency list (google-genai). |
| solution/agent/prompts/system.md | System prompt for Gemini schedule generation. |
| solution/agent/prompts/strategies.md | Strategy prompt for Gemini improvements. |
| solution/agent/prompts/examples.md | Few-shot examples from PROBLEM.md for Gemini calibration. |
| solution/agent/agent.py | Python Track B agent: local optimize + optional Gemini refinement + validation. |
| solution/.github/workflows/ci.yml | CI workflow definition for Rust/Python/E2E checks (currently placed under solution/). |
| problem/mlsys.h | Contest reference header (Problem/Solution structs + Evaluate signature). |
| problem/example_problem.json | Example problem input JSON. |
| problem/benchmarks/mlsys-2026-9.json | Benchmark input JSON #9. |
| problem/benchmarks/mlsys-2026-5.json | Benchmark input JSON #5. |
| problem/benchmarks/mlsys-2026-13.json | Benchmark input JSON #13. |
| problem/benchmarks/mlsys-2026-1.json | Benchmark input JSON #1. |
| CLAUDE.md | Claude Code SDLC configuration documentation. |
| .gitignore | Repository ignore rules for Python/Rust/etc artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
CI workflow: - Copied ci.yml to repo-root .github/workflows/ (GitHub only runs from root) Parser hardening (parser.rs): - parse_solution returns errors instead of coercing invalid indices to 0 - parse_problem validates op_type (MatMul/Pointwise), MatMul arity (2 inputs), non-empty outputs, and tensor index bounds - native_granularity_for_subgraph uses min K_full (safe for all ops) Memory model (memory.rs): - find_split_k uses min K_full across MatMuls instead of max Pipeline (pipeline.rs): - Updated header comment to reflect actual 9-stage pipeline E2E script (test-e2e.sh): - Skip Track A benchmarks on build failure instead of running missing binary - Removed stderr suppression from cargo build Agent (agent.py): - Coerce traversal_order elements to int - Validate traversal_order is a valid permutation before accepting Architecture docs (system-design.md): - Removed editorial "Wait -- actually" self-correction, kept final formula only
Architecture docs (system-design, database-schema, deployment-topology, user-journeys, workspace.dsl, api-error-catalog, security-model, data-flow): - Replaced all Python module references with Rust src/ layout - Updated data types from Python dataclasses to Rust structs - Fixed CLI commands to match actual interface - Updated optimizer composition to show all 9 stages - Removed editorial self-corrections, kept only final formulas Requirements docs: - NFR-006: updated to dual-track (Rust + Python) - A-008: updated to reflect Rust Track A + Python Track B - mvp-scope: marked F-10 (traversal) as implemented, C++ row as superseded Checkpoints: - stage-1: updated language decision and traversal status - stage-2: rewritten to reflect Rust modules, 9-stage pipeline, per-op K_full README + CHANGELOG: - Pipeline updated from 8 to 9 stages (added traversal optimization) - Fixed simplified latency formula (sum-of-per-step-max, not max-of-totals) - Added traversal.rs and evaluate subcommand to CHANGELOG ADR renamed: ADR-001-language-python.md -> ADR-001-language-selection.md
There was a problem hiding this comment.
Pull request overview
Adds a complete MLSys 2026 contest submission scaffold: a Rust “Track A” CLI scheduler, a Python “Track B” Gemini agent, benchmark/problem fixtures, CI, E2E validation, and extensive planning/architecture documentation.
Changes:
- Implement Track A Rust optimizer pipeline (parse → DAG → baseline/fusion/retention/split‑K/granularity/traversal → serialize) plus a local
evaluatesubcommand. - Implement Track B Python local optimizer + Gemini refinement loop with prompts, plus dependency setup.
- Add CI (Rust + Python + E2E), benchmark fixtures, E2E script, and requirement/ADR/architecture/checkpoint docs.
Reviewed changes
Copilot reviewed 50 out of 59 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| solution/scripts/test-e2e.sh | Adds an end-to-end “happy path” runner/validator for both tracks across all 5 benchmarks. |
| solution/requirements/rice-scores.md | Documents RICE prioritization and rationale for the feature plan. |
| solution/requirements/requirements.md | Captures functional/non-functional requirements and assumptions for the scheduler. |
| solution/requirements/mvp-scope.md | Defines MVP scope, acceptance criteria, and risks. |
| solution/requirements/moscow.md | MoSCoW prioritization for features. |
| solution/README.md | Top-level solution README: quickstart, architecture, testing, and results summary. |
| solution/docs/decisions/ADR-003-greedy-fusion.md | ADR explaining greedy fusion choice over DP/beam. |
| solution/docs/decisions/ADR-002-baseline-first.md | ADR for baseline-first implementation strategy. |
| solution/docs/decisions/ADR-001-language-selection.md | ADR selecting Rust for Track A and Python for Track B. |
| solution/docs/architecture/workspace.dsl | Structurizr/C4-style architecture model of the system. |
| solution/docs/architecture/user-journeys.md | User journeys for solve/evaluate/batch flows. |
| solution/docs/architecture/security-model.md | Security/threat model documentation for a local CLI tool. |
| solution/docs/architecture/deployment-topology.md | Explains local execution topology and setup for both tracks. |
| solution/docs/architecture/database-schema.md | Documents JSON schemas and internal model mappings (no DB). |
| solution/docs/architecture/data-flow.md | Sequence/data-flow diagrams for the pipeline and latency/WS checks. |
| solution/docs/architecture/api-error-catalog.md | Catalogs CLI/evaluator error conditions and exit codes. |
| solution/checkpoints/stage-4-validation.md | Checkpoint notes for test/validation stage and reported results. |
| solution/checkpoints/stage-2.5-validation.md | Checkpoint notes for issue creation/branch setup stage. |
| solution/checkpoints/stage-2-validation.md | Checkpoint notes for architecture/design stage. |
| solution/checkpoints/stage-1-validation.md | Checkpoint notes for requirements stage. |
| solution/checkpoints/stage-0-validation.md | Checkpoint notes for project setup stage. |
| solution/CHANGELOG.md | Adds a v1.0.0 changelog entry describing delivered components. |
| solution/backend/rust/src/serializer.rs | Implements Solution → JSON serialization for Track A. |
| solution/backend/rust/src/parser.rs | Implements Problem/Solution parsing and helpers (k_full/native granularity). |
| solution/backend/rust/src/optimizer/traversal.rs | Implements snake/zig-zag traversal optimization for MatMul tiling. |
| solution/backend/rust/src/optimizer/splitk.rs | Applies split‑K to resolve OOM for MatMul-containing subgraphs. |
| solution/backend/rust/src/optimizer/retention.rs | Implements tensor retention decisions across subgraph boundaries. |
| solution/backend/rust/src/optimizer/pipeline.rs | Orchestrates the full 9-stage optimizer pipeline. |
| solution/backend/rust/src/optimizer/mod.rs | Exposes optimizer submodules. |
| solution/backend/rust/src/optimizer/granularity.rs | Searches best (w,h,k) granularity per subgraph under OOM constraint. |
| solution/backend/rust/src/optimizer/fusion.rs | Greedy adjacent subgraph fusion logic. |
| solution/backend/rust/src/models.rs | Core in-memory data structures (Problem/Op/Tensor/Solution/SubgraphDef). |
| solution/backend/rust/src/memory.rs | Working-set calculation, OOM checking, and split‑K search helper. |
| solution/backend/rust/src/latency.rs | Roofline latency model + memory plan building. |
| solution/backend/rust/src/evaluate.rs | Local evaluator implementation for checking a solution against a problem. |
| solution/backend/rust/src/dag.rs | DAG construction, topo sort, boundary tensor queries, and output dimensions. |
| solution/backend/rust/src/baseline.rs | Baseline “1 op per subgraph” schedule builder. |
| solution/backend/rust/Cargo.toml | Rust crate definition and release profile settings. |
| solution/agent/scheduler.py | Python local optimizer pipeline (baseline/fusion/granularity/retention/traversal). |
| solution/agent/requirements.txt | Python runtime dependency list (google-genai). |
| solution/agent/prompts/system.md | System prompt defining rules/objective/output format for Gemini. |
| solution/agent/prompts/strategies.md | Strategy guidance prompt for optimization suggestions. |
| solution/agent/prompts/examples.md | Few-shot examples from PROBLEM.md for latency calibration. |
| solution/agent/agent.py | Track B agent entrypoint: local optimize + optional Gemini refinement. |
| solution/.github/workflows/ci.yml | Adds a (duplicate) CI workflow file under solution/ tree. |
| problem/mlsys.h | Adds the C++ interface header describing canonical structs/Evaluate entrypoints. |
| problem/example_problem.json | Adds a small example problem fixture. |
| problem/benchmarks/mlsys-2026-1.json | Adds benchmark 1 fixture. |
| problem/benchmarks/mlsys-2026-5.json | Adds benchmark 5 fixture. |
| problem/benchmarks/mlsys-2026-9.json | Adds benchmark 9 fixture. |
| problem/benchmarks/mlsys-2026-13.json | Adds benchmark 13 fixture. |
| problem/benchmarks/mlsys-2026-17.json | Adds benchmark 17 fixture. |
| CLAUDE.md | Adds SDLC/Claude Code configuration documentation. |
| .gitignore | Adds repo ignore patterns for Python/Node/Rust and tooling. |
| .github/workflows/ci.yml | Adds the actual root CI workflow (Rust + Python + E2E). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
parser.rs: - parse_solution validates granularity array length (exactly 3) - All fields use strict validation with errors instead of unwrap_or defaults evaluate.rs: - Validates traversal_order is a valid permutation of [0, num_tiles) - Checks reported subgraph_latency matches computed value (tolerance 0.5) CI: - Removed duplicate solution/.github/workflows/ci.yml (only repo-root copy) rice-scores.md: - Fixed sort order: F-03 (10.0) now ranked above F-02 (7.5) - Marked F-10 as implemented despite low RICE score database-schema.md: - Fixed benchmark 17 ops count: 96 -> 103 test-e2e.sh: - Pass json_file as sys.argv[1] instead of shell interpolation - Relaxed duplicate op check to allow recomputation (ops in multiple subgraphs) - Use uv run python instead of python3
- Removed CLAUDE.md from git tracking - Added CLAUDE.md to .gitignore - Fixed deployment-topology.md: pyproject.toml -> requirements.txt for Track B
Summary
Full implementation of the MLSys 2026 contest DAG scheduler with two tracks:
mlsyswith 9-stage optimizer pipelineCloses #1, Closes #2, Closes #3, Closes #4, Closes #5, Closes #6, Closes #7, Closes #8, Closes #9, Closes #10, Closes #11, Closes #12
Architecture
graph LR A[Problem JSON] --> B[Parser] B --> C[DAG Analysis] C --> D[Baseline] D --> E[Chain Fusion] E --> F[Retention 1] F --> G[Split-K] G --> H[Granularity Search] H --> I[Retention 2] I --> J[Emergency OOM Fix] J --> K[Latency Recalc] K --> L[Traversal Opt] L --> M[Solution JSON]Optimizer Pipeline (9 stages)
CLI Interface
Test Results
Benchmark Results (Track A — Rust)
All benchmarks complete in under 1 second.
SDLC Pipeline Stages
Key Fixes During Review
Test plan
cargo test— 15 unit tests covering all 5 PROBLEM.md examples