Skip to content

feat: MLSys 2026 DAG Scheduler — Track A (Rust) + Track B (Gemini Agent)#13

Merged
melroyanthony merged 20 commits intomainfrom
feat/issue-39-mlsys-scheduler
Mar 14, 2026
Merged

feat: MLSys 2026 DAG Scheduler — Track A (Rust) + Track B (Gemini Agent)#13
melroyanthony merged 20 commits intomainfrom
feat/issue-39-mlsys-scheduler

Conversation

@melroyanthony
Copy link
Owner

@melroyanthony melroyanthony commented Mar 14, 2026

Summary

Full implementation of the MLSys 2026 contest DAG scheduler with two tracks:

  • Track A (Rust): Compiled binary mlsys with 9-stage optimizer pipeline
  • Track B (Python): Gemini-powered agent with local evaluator fallback

Closes #1, Closes #2, Closes #3, Closes #4, Closes #5, Closes #6, Closes #7, Closes #8, Closes #9, Closes #10, Closes #11, Closes #12

Architecture

graph LR
    A[Problem JSON] --> B[Parser]
    B --> C[DAG Analysis]
    C --> D[Baseline]
    D --> E[Chain Fusion]
    E --> F[Retention 1]
    F --> G[Split-K]
    G --> H[Granularity Search]
    H --> I[Retention 2]
    I --> J[Emergency OOM Fix]
    J --> K[Latency Recalc]
    K --> L[Traversal Opt]
    L --> M[Solution JSON]
Loading

Optimizer Pipeline (9 stages)

  1. Baseline: 1 op per subgraph, native granularity — guarantees valid output
  2. Chain Fusion: Greedy merge of adjacent ops when working set fits and boundary output dims are consistent
  3. Retention (pass 1): Keep boundary tensors resident across subgraph boundaries
  4. Split-K: Reduce k dimension (using min K_full) for memory-constrained MatMuls
  5. Granularity Search: Exhaustive (w, h, k) search per subgraph for minimum latency
  6. Retention (pass 2): Re-evaluate after granularity changes
  7. Emergency OOM Fix: Reduce granularity for any remaining OOM subgraphs
  8. Final Latency Recalculation: Recompute all subgraph latencies
  9. Traversal Optimization: Snake/zig-zag tile ordering for MatMul data reuse

CLI Interface

# Track A — Solve
./mlsys <input.json> <output.json>

# Track A — Evaluate existing solution
./mlsys evaluate --problem <input.json> --solution <solution.json>

# Track B — Python agent
GOOGLE_API_KEY=<key> uv run python agent.py <input.json> <output.json>

Test Results

  • Track A (Rust): 15 tests passing (9 worked examples + 6 edge cases)
  • Track B (Python): Local evaluator + optimizer validated
  • E2E: Both tracks validated against all 5 benchmarks

Benchmark Results (Track A — Rust)

Benchmark Ops Tensors Fast Mem Bandwidth Latency
mlsys-2026-1 5 9 60,000 20 27,443
mlsys-2026-5 19 29 30,000 15 27,856
mlsys-2026-9 32 49 250,000 25 110,100
mlsys-2026-13 63 100 600,000 50 191,693
mlsys-2026-17 103 160 500,000 100 23,650

All benchmarks complete in under 1 second.

SDLC Pipeline Stages

Stage Status Judge Score
0: Setup Done
1: Requirements (RICE/MoSCoW) Done 91/100
2: Architecture & Design Done 88/100
2.5: Issues + Branch Done 12 issues
3: Implementation (Rust + Python) Done 82/100
4: Testing & Validation Done 15 tests, 0 bugs
5: Docs, CI/CD, Finalization Done

Key Fixes During Review

  • Per-op K_full scaling: each MatMul uses its own K_full (not boundary MatMul's)
  • Split-K uses min K_full across ops (safe for mixed-K subgraphs)
  • Fusion validates boundary output dimension consistency
  • Parser validates op_type, MatMul arity, tensor bounds
  • Snake traversal optimization (Stage 9)
  • Evaluate subcommand for solution validation
  • All planning docs updated to reflect Rust implementation

Test plan

  • cargo test — 15 unit tests covering all 5 PROBLEM.md examples
  • All 5 benchmarks produce valid JSON with non-negative latencies
  • Evaluate subcommand validates solutions (PASS/FAIL + latency)
  • Track B runs in baseline mode without API key
  • E2E script validates both tracks against all benchmarks

melroyanthony and others added 10 commits March 14, 2026 14:27
- Move PROBLEM.md, example_problem.json, mlsys.h, benchmarks/ into problem/
- Create solution/ directory scaffold (requirements, checkpoints, scripts, docs)
- Matches expected structure: problem/ for inputs, solution/ for outputs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Added comprehensive .gitignore
- Created solution/ directory structure
- Stage 0 checkpoint validated

Stage: 0/5
- 29 functional requirements, 6 non-functional requirements
- RICE scores for 15 features with prioritization rationale
- MoSCoW: 7 Must-Have, 5 Should-Have, 3 Won't-Have
- MVP scope with acceptance criteria and dependency graph
- Baseline-first strategy: correct schedule before optimization

Stage: 1/5
- Module decomposition: 13 Python modules in pipeline architecture
- Complete latency model spec matching C++ Evaluate() function
- Data model: Python dataclasses mirroring mlsys.h structs
- Input/output JSON schema documentation
- C4 workspace with context and container views
- Pipeline: Parse → Topo sort → Baseline → Fusion → Retention → Split-K → Granularity → Serialize
- ADR-001: Python language choice
- ADR-002: Baseline-first development strategy
- ADR-003: Greedy fusion over DP/beam search

Stage: 2/5
- Created GitHub issues: yarongmu-google#39-yarongmu-google#50 (12 MVP features)
- Feature branch: feat/issue-39-mlsys-scheduler

Stage: 2.5/5
Track A - Rust scheduler binary:
- Full optimizer pipeline: baseline → fusion → retention → split-K → granularity search
- All 5 PROBLEM.md worked examples verified (9 unit tests passing)
- All 5 benchmarks produce valid solutions under timeout

Track B - Python Gemini agent:
- Local evaluator matching C++ Evaluate()
- Multi-stage optimizer: fusion, granularity, traversal, retention
- Gemini 2.5 Flash integration with iterative refinement
- Baseline fallback guarantees valid output without API

Updated:
- ADR-001: Rust for Track A, Python for Track B
- Stage 2.5 checkpoint: issues on fork (melroyanthony/MLSys #1-#12)

Stage: 3/5
- Track A (Rust): 15 unit tests passing (latency model, edge cases,
  benchmark integration)
- Track B (Python): 29 pytest tests passing (all 5 PROBLEM.md examples,
  OOM detection, fusion correctness, serialization roundtrip)
- E2E happy path script: 13/13 passing across both tracks and all 5 benchmarks
- Bug fixes: 0 (all tests passed on first run)

Stage: 4/5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Track A (Rust): 15 tests passing
- 9 worked-example latency model tests
- 6 edge-case tests (OOM, cyclic DAG, tiny tensors, roundtrip)

Track B (Python): 29 tests passing
- 13 PROBLEM.md example tests
- 11 edge-case tests
- 5 benchmark integration tests

E2E: 13/13 checks passing
- Both tracks validated against all 5 benchmarks
- JSON validity, coverage, granularity checks

No bugs found.

Stage: 4/5
- Added target/ and Cargo.lock to .gitignore
- Removed accidentally committed Rust build artifacts from tracking
- README.md with quick start, architecture overview, benchmark results
- CHANGELOG.md documenting all features
- GitHub Actions CI: Rust build/test, Python validation, E2E benchmarks
- All 15 Rust tests passing, E2E validated

Stage: 5/5 - Complete
Copilot AI review requested due to automatic review settings March 14, 2026 19:02
@melroyanthony melroyanthony self-assigned this Mar 14, 2026
@melroyanthony melroyanthony added the enhancement New feature or request label Mar 14, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the MLSys 2026 contest DAG scheduler in two tracks (Rust binary + Python agent) and adds supporting SDLC/Claude Code configuration plus CI to build/test both tracks and run E2E benchmark validation.

Changes:

  • Add CI workflow to build/test Rust, smoke-test Python agent, and run E2E checks on 5 benchmarks.
  • Add solution/CHANGELOG.md documenting the end-to-end optimizer pipeline, tests, and artifacts.
  • Add/expand .claude/ configuration (agents, commands, skills, hooks, rules) and repo-level SDLC docs.

Reviewed changes

Copilot reviewed 97 out of 157 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
solution/CHANGELOG.md Adds release notes and high-level inventory of Track A/Track B components and test coverage.
solution/.github/workflows/ci.yml Introduces GitHub Actions pipeline for Rust + Python + E2E benchmark validation.
CLAUDE.md Documents SDLC pipeline and available Claude Code commands/agents for this repo.
.claude/skills/meta/skill-generator/templates/testing.template.md Adds skill template for generating backend/frontend test scaffolds.
.claude/skills/meta/skill-generator/templates/product-manager.template.md Adds skill template for requirement analysis and prioritization artifacts.
.claude/skills/meta/skill-generator/templates/frontend.template.md Adds skill template for Next.js frontend scaffolding guidance.
.claude/skills/meta/skill-generator/templates/devops.template.md Adds skill template for Docker/docker-compose deployment guidance.
.claude/skills/meta/skill-generator/templates/database.template.md Adds skill template for schema/migrations/query-pattern documentation.
.claude/skills/meta/skill-generator/templates/backend.template.md Adds skill template for FastAPI/SQLModel backend scaffolding guidance.
.claude/skills/meta/skill-generator/templates/architect.template.md Adds skill template for architecture docs (C4, data model, API).
.claude/skills/meta/skill-generator/SKILL.md Defines the “skill-generator” meta skill and its artifact outputs.
.claude/skills/meta/session-persistence/SKILL.md Adds a structured process for saving/resuming long-running sessions.
.claude/skills/meta/self-improvement/SKILL.md Adds a workflow for iterating on the SDLC config based on gaps/learnings.
.claude/skills/meta/orchestrator/SOLUTION-STRUCTURE.md Documents expected repository + solution/ structure and root-level constraints.
.claude/skills/meta/orchestrator/SKILL.md Defines orchestrator stage flow, checkpoints, and git/commit protocol guidance.
.claude/skills/meta/orchestrator/HANDOFF-PROTOCOL.md Defines artifacts + validation criteria when handing off between stages.
.claude/skills/meta/model-routing/SKILL.md Adds model-selection guidance for different task types.
.claude/skills/meta/judge/SKILL.md Adds stage validation methodology and report format for “judge”.
.claude/skills/meta/judge/RUBRICS.md Adds stage scoring rubrics with weights and pass/fail thresholds.
.claude/skills/meta/judge/CRITIQUE-PROMPTS.md Adds qualitative critique prompts to accompany rubric scoring.
.claude/skills/meta/config-validation/SKILL.md Adds lint/validation rules for .claude/ assets and shell portability.
.claude/skills/foundation/verification-loop/SKILL.md Adds a 6-phase verification loop (build/type/lint/test/security/diff).
.claude/skills/foundation/tdd/SKILL.md Adds TDD guidance and coverage targets.
.claude/skills/foundation/system-design/APPROACH.md Adds a time-boxed system design methodology reference.
.claude/skills/foundation/search-first/SKILL.md Adds “research-before-coding” workflow and evaluation matrix.
.claude/skills/foundation/product-manager/SKILL.md Adds product analysis workflow and artifact templates.
.claude/skills/foundation/product-manager/PRIORITIZATION.md Adds deeper prioritization guidance (RICE/MoSCoW/Kano/etc.).
.claude/skills/foundation/handoff-protocol/SKILL.md Adds generic inter-agent handoff format + validation rules.
.claude/skills/foundation/debugging/SKILL.md Adds RCA/debugging/triage workflow guidance and regression-prevention patterns.
.claude/skills/foundation/database/MONGODB.md Adds MongoDB async/Beanie patterns and testing notes.
.claude/skills/foundation/autonomous-loops/SKILL.md Adds patterns for sequential/parallel autonomous loops and exit conditions.
.claude/skills/foundation/ai-integration/MODELS.md Adds model integration notes and sample patterns for AI features.
.claude/settings.json Defines tool permissions and hook execution wiring for Claude Code.
.claude/rules/security.md Adds global security rules for codegen/review.
.claude/rules/git.md Adds commit/PR/branching rules for the SDLC workflow.
.claude/rules/coding-standards.md Adds general coding standards across languages.
.claude/hooks/scripts/quality-gate.sh Adds pre-edit formatting/lint warnings per language.
.claude/hooks/scripts/debug-check.sh Adds post-edit debug-artifact scanner (console.log/print/TODO/FIXME).
.claude/hooks/scripts/commit-reminder.sh Adds session-stop reminder for uncommitted changes + debug statement scan.
.claude/commands/verify.md Adds a slash command to run the verification loop.
.claude/commands/validate-e2e.md Adds a slash command to validate Docker compose + a happy-path E2E flow.
.claude/commands/validate-config.md Adds a slash command to lint .claude/ configuration consistency.
.claude/commands/upgrade.md Adds a slash command to plan/execute dependency upgrades safely.
.claude/commands/tester.md Adds a slash command describing Stage 4 QA/testing flow and artifacts.
.claude/commands/tdd.md Adds a slash command to run a RED/GREEN/REFACTOR loop for a scope.
.claude/commands/security-review.md Adds a slash command to run a security-focused review workflow.
.claude/commands/search-first.md Adds a slash command to research existing solutions before coding.
.claude/commands/save-session.md Adds a slash command to capture session state to a checkpoint file.
.claude/commands/review.md Adds a slash command to run a staff-level code review workflow.
.claude/commands/resume-session.md Adds a slash command to resume from a saved session checkpoint.
.claude/commands/resolve-review.md Adds a slash command to fetch/apply/respond to PR review comments.
.claude/commands/readme-generator.md Adds a slash command to generate solution/README.md and solution/CHANGELOG.md.
.claude/commands/product-manager.md Adds a slash command for Stage 1 requirements work.
.claude/commands/postman.md Adds a slash command to generate a Postman collection from OpenAPI.
.claude/commands/plan.md Adds a slash command to create an implementation plan via planner agent.
.claude/commands/judge.md Adds a slash command to validate a stage against rubrics and write checkpoints.
.claude/commands/investigate.md Adds a slash command for RCA → fix → test → PR workflow.
.claude/commands/git-deploy.md Adds a slash command to push generated output to GitHub via gh/git.
.claude/commands/generate-skill.md Adds a slash command to scaffold a new skill directory/file.
.claude/commands/generate-command.md Adds a slash command to scaffold a new command definition.
.claude/commands/generate-agent.md Adds a slash command to scaffold a new agent definition.
.claude/commands/frontend-dev.md Adds a slash command describing Stage 3 frontend implementation flow.
.claude/commands/create-pr.md Adds a slash command describing an automated PR creation workflow.
.claude/commands/create-issue.md Adds a slash command describing an automated issue creation workflow.
.claude/commands/commit.md Adds a slash command describing conventional commit creation workflow.
.claude/commands/cloud-deploy.md Adds a slash command for IaC generation/deploy workflows.
.claude/commands/backend-dev.md Adds a slash command describing Stage 3 backend implementation flow.
.claude/commands/audit.md Adds a slash command to audit .claude/ inventory vs documentation.
.claude/commands/architect.md Adds a slash command describing Stage 2 architecture deliverables.
.claude/agents/refactor.md Adds an agent definition for safe refactoring with verification.
.claude/agents/planner.md Adds an agent definition for decomposing work into dependency-ordered units.
.claude/agents/judge.md Adds an agent definition for stage validation using rubrics.
.claude/agents/git-deployer.md Adds an agent definition for pushing content to GitHub (via git/gh).
.claude/agents/debugger.md Adds an agent definition for RCA and debugging workflows.
.claude/agents/code-reviewer.md Adds an agent definition for staff-level code review workflows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

melroyanthony and others added 3 commits March 14, 2026 19:05
- Removed .claude/ directory from git tracking
- Added .claude/ to .gitignore to prevent future commits

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot review feedback: let Track A and Track B commands fail
naturally so CI logs show actionable error output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Added optimizer/traversal.rs with snake/zig-zag tile ordering
- Changed Cargo.toml edition from 2024 to 2021 for compatibility
- Documented tensors_to_retain usage in memory.rs working set calc

Refs #1, #3, #12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a complete MLSys 2026 contest submission scaffold: two schedulers (Rust Track A + Python Gemini agent Track B), supporting benchmarks/problem definition, CI, E2E validation, and extensive planning/architecture documentation.

Changes:

  • Introduces a Rust-based optimizer pipeline (baseline → fusion → retention → split‑K → granularity → traversal) with JSON I/O.
  • Adds a Python agent that generates a local schedule and optionally refines it via Gemini, plus prompts.
  • Adds CI workflows, E2E scripts, benchmark/problem files, and requirements/architecture/decision docs.

Reviewed changes

Copilot reviewed 56 out of 65 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
solution/scripts/test-e2e.sh Local E2E runner for both tracks + JSON validation.
solution/requirements/rice-scores.md Feature prioritization (RICE) for the scheduler work.
solution/requirements/requirements.md Functional/NFR requirements and assumptions for the scheduler.
solution/requirements/mvp-scope.md MVP scope, acceptance criteria, and risk register.
solution/requirements/moscow.md MoSCoW prioritization of features.
solution/README.md Project overview, quick start, architecture, and testing instructions.
solution/docs/decisions/ADR-003-greedy-fusion.md Rationale for greedy fusion approach.
solution/docs/decisions/ADR-002-baseline-first.md Rationale for baseline-first implementation.
solution/docs/decisions/ADR-001-language-python.md Rationale for Rust (A) + Python (B) split.
solution/docs/architecture/workspace.dsl Structurizr/C4 workspace describing module relationships.
solution/docs/architecture/user-journeys.md User journeys for solve/evaluate/batch workflows.
solution/docs/architecture/security-model.md Threat model and validation notes.
solution/docs/architecture/deployment-topology.md Local CLI deployment + run/test instructions.
solution/docs/architecture/database-schema.md Input/output JSON schema + internal model mapping.
solution/docs/architecture/data-flow.md Pipeline data-flow diagrams for scheduling/latency/memory.
solution/docs/architecture/api-error-catalog.md Error/exit-code catalog for CLI-style failures.
solution/checkpoints/stage-4-validation.md Stage 4 validation write-up and reported results.
solution/checkpoints/stage-2.5-validation.md Issue/branch setup checkpoint.
solution/checkpoints/stage-2-validation.md Architecture checkpoint summary.
solution/checkpoints/stage-1-validation.md Requirements checkpoint summary.
solution/checkpoints/stage-0-validation.md Project setup checkpoint summary.
solution/CHANGELOG.md Changelog describing initial 1.0.0 deliverables.
solution/backend/rust/src/serializer.rs Rust solution JSON serialization.
solution/backend/rust/src/parser.rs Rust problem JSON parsing + helper granularity utilities.
solution/backend/rust/src/optimizer/traversal.rs Rust traversal-order (snake) optimizer.
solution/backend/rust/src/optimizer/splitk.rs Rust split‑K optimization stage.
solution/backend/rust/src/optimizer/retention.rs Rust tensor retention optimization stage.
solution/backend/rust/src/optimizer/pipeline.rs Rust optimizer pipeline orchestration.
solution/backend/rust/src/optimizer/mod.rs Rust optimizer module exports.
solution/backend/rust/src/optimizer/granularity.rs Rust per-subgraph granularity search.
solution/backend/rust/src/optimizer/fusion.rs Rust greedy fusion stage.
solution/backend/rust/src/models.rs Rust core data model structs.
solution/backend/rust/src/memory.rs Rust working-set computation + OOM checks.
solution/backend/rust/src/latency.rs Rust latency model implementation.
solution/backend/rust/src/evaluate.rs Rust evaluator for validating solutions.
solution/backend/rust/src/dag.rs Rust DAG construction + topo sort + boundary tensor helpers.
solution/backend/rust/src/baseline.rs Rust baseline schedule generator.
solution/backend/rust/Cargo.toml Rust crate config + release profile settings.
solution/backend/pyproject.toml Python backend project metadata + dev deps.
solution/backend/mlsys_scheduler/serializer.py Python backend solution serializer.
solution/backend/mlsys_scheduler/parser.py Python backend problem parser.
solution/backend/mlsys_scheduler/models.py Python backend dataclasses for model types.
solution/backend/mlsys_scheduler/dag.py Python backend DAG analysis utilities.
solution/backend/mlsys_scheduler/init.py Python backend package init/version.
solution/agent/scheduler.py Track B local optimizer pipeline (baseline/fusion/granularity/retention/traversal).
solution/agent/requirements.txt Track B dependency list.
solution/agent/prompts/system.md Gemini system prompt (rules/model/spec).
solution/agent/prompts/strategies.md Gemini strategy guidance prompt.
solution/agent/prompts/examples.md Few-shot examples from PROBLEM.md.
solution/agent/agent.py Gemini-powered agent entrypoint with local validation loop.
solution/.github/workflows/ci.yml CI for Rust, Python agent smoke tests, and E2E benchmark runs.
problem/mlsys.h C++ reference structs + Evaluate/ReadProblem/ReadSolution signatures.
problem/example_problem.json Minimal example problem JSON.
problem/benchmarks/mlsys-2026-9.json Benchmark input graph #9.
problem/benchmarks/mlsys-2026-5.json Benchmark input graph #5.
problem/benchmarks/mlsys-2026-13.json Benchmark input graph #13.
problem/benchmarks/mlsys-2026-1.json Benchmark input graph #1.
CLAUDE.md Repo-level SDLC/Claude Code workflow documentation.
.gitignore Ignore rules for Python/Node/Rust/tooling artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

- test-e2e.sh: derive PROJECT_ROOT dynamically instead of hardcoded path
- ci.yml: hash Cargo.toml instead of Cargo.lock for cache key
- README.md: fix Python version requirement to 3.12+
- stage-4-validation.md: correct Rust edition to 2021, Python to 3.12
- serializer.rs: replace unwrap() with proper error propagation via ?
- fusion.rs: add structural validity check for consistent boundary
  output dimensions before merging subgraphs
- dag.rs: add bounds checking on tensor indices from input JSON
- Removed stale Python package (mlsys_scheduler/, pyproject.toml, tests/)
  from solution/backend/ — leftover from pre-Rust implementation plan
- Fixed solution/README.md:
  - Corrected latency model formula (per-step roofline, not aggregate)
  - Added traversal.rs to project structure and key files table
  - Fixed Track B commands to use uv run python consistently
  - Filled benchmark results with actual latency values
- Updated CI to use uv run python for validation scripts
- Added .pytest_cache/ to .gitignore
- Removed stale .venv, .pytest_cache, uv.lock from backend/
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a full MLSys 2026 contest “DAG scheduler” submission, including a Track A Rust CLI optimizer, a Track B Python agent/evaluator, end-to-end validation scripts, CI, benchmark fixtures, and extensive requirements/architecture documentation.

Changes:

  • Implement Track A Rust scheduler pipeline (parse → DAG → optimize → serialize) with optimizer stages (fusion/retention/split‑K/granularity/traversal).
  • Implement Track B Python local optimizer + Gemini agent loop, plus a Python evaluator for local validation.
  • Add CI workflow, E2E script, benchmark/problem fixtures, and comprehensive docs (requirements, ADRs, architecture, checkpoints, README/CHANGELOG).

Reviewed changes

Copilot reviewed 49 out of 58 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
solution/scripts/test-e2e.sh Adds a local E2E happy-path runner validating both tracks across all benchmarks.
solution/requirements/rice-scores.md RICE scoring/prioritization for planned features.
solution/requirements/requirements.md Detailed functional + non-functional requirements for the scheduler.
solution/requirements/mvp-scope.md MVP feature scope, dependencies, acceptance criteria, risks.
solution/requirements/moscow.md MoSCoW prioritization of features.
solution/README.md Top-level submission README (tracks, usage, architecture, testing, results).
solution/docs/decisions/ADR-003-greedy-fusion.md ADR documenting greedy fusion decision.
solution/docs/decisions/ADR-002-baseline-first.md ADR documenting baseline-first approach.
solution/docs/decisions/ADR-001-language-python.md ADR documenting Rust Track A + Python Track B split.
solution/docs/architecture/workspace.dsl Structurizr/C4-style architecture model.
solution/docs/architecture/user-journeys.md User journeys for solve/evaluate/batch workflows.
solution/docs/architecture/system-design.md System design and model/spec details.
solution/docs/architecture/security-model.md Local-CLI threat model and validation notes.
solution/docs/architecture/deployment-topology.md Dev environment + how to run/test locally.
solution/docs/architecture/database-schema.md Input/output schema and internal structure reference (no DB).
solution/docs/architecture/data-flow.md Data flow and pipeline diagrams.
solution/docs/architecture/api-error-catalog.md Error catalog and exit code conventions.
solution/checkpoints/stage-4-validation.md Stage checkpoint: testing/validation summary.
solution/checkpoints/stage-2.5-validation.md Stage checkpoint: issue creation/branch setup.
solution/checkpoints/stage-2-validation.md Stage checkpoint: architecture/system design artifacts.
solution/checkpoints/stage-1-validation.md Stage checkpoint: requirements artifacts.
solution/checkpoints/stage-0-validation.md Stage checkpoint: initial project scaffolding.
solution/CHANGELOG.md Changelog for the submission contents.
solution/backend/rust/src/serializer.rs Rust: serialize Solution → JSON (including traversal_orders).
solution/backend/rust/src/parser.rs Rust: parse Problem JSON + helpers for native granularity/K_full.
solution/backend/rust/src/optimizer/traversal.rs Rust: snake/zig-zag traversal generation and latency comparison.
solution/backend/rust/src/optimizer/splitk.rs Rust: split‑K application + retained-set building.
solution/backend/rust/src/optimizer/retention.rs Rust: tensor retention selection between subgraphs.
solution/backend/rust/src/optimizer/pipeline.rs Rust: end-to-end optimizer stage orchestration + emergency OOM fix.
solution/backend/rust/src/optimizer/mod.rs Rust: optimizer module exports.
solution/backend/rust/src/optimizer/granularity.rs Rust: grid search over (w,h,k) candidates under OOM constraint.
solution/backend/rust/src/optimizer/fusion.rs Rust: greedy subgraph fusion + feasible granularity finder.
solution/backend/rust/src/models.rs Rust: core data model types (Problem/Solution/SubgraphDef/etc.).
solution/backend/rust/src/memory.rs Rust: working-set calculator, OOM check, split‑K search helper.
solution/backend/rust/src/latency.rs Rust: roofline latency model + memory plan for step-by-step costs.
solution/backend/rust/src/evaluate.rs Rust: local evaluator for validation in tests.
solution/backend/rust/src/dag.rs Rust: DAG construction (producers/consumers), topo sort, boundary tensors.
solution/backend/rust/src/baseline.rs Rust: baseline schedule builder (1 op/subgraph).
solution/backend/rust/Cargo.toml Rust crate manifest and release profile settings.
solution/agent/scheduler.py Python: local baseline + greedy optimizer pipeline + retention/traversal.
solution/agent/requirements.txt Python: runtime dependency list (google-genai).
solution/agent/prompts/system.md Gemini system prompt with model + constraints.
solution/agent/prompts/strategies.md Gemini strategy guidance for optimizations.
solution/agent/prompts/examples.md Gemini few-shot examples from PROBLEM.md.
solution/agent/agent.py Python: Track B agent loop (local optimize + optional Gemini refinement).
solution/.github/workflows/ci.yml CI: Rust build/test, Python smoke test, and E2E jobs.
problem/mlsys.h C++ header defining Problem/Solution structs and evaluator interfaces.
problem/example_problem.json Example input problem fixture.
problem/benchmarks/mlsys-2026-9.json Benchmark 9 fixture.
problem/benchmarks/mlsys-2026-5.json Benchmark 5 fixture.
problem/benchmarks/mlsys-2026-13.json Benchmark 13 fixture.
problem/benchmarks/mlsys-2026-1.json Benchmark 1 fixture.
CLAUDE.md Repository SDLC/Claude Code configuration documentation.
.gitignore Adds ignore rules for Python/Node/Rust artifacts and other tooling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Issue #8 (CLI benchmark runner):
- Added evaluate subcommand: mlsys evaluate --problem <f> --solution <f>
- Added parse_solution() to parser.rs
- evaluate.rs now has a CLI entry point, prints PASS/FAIL + latency

Correctness fix (latency.rs):
- Each MatMul op now scaled by its own K_full (k / K_full_for_this_op)
  instead of using the first boundary MatMul's K_full for all ops
- Same fix in Python evaluator: num_k_steps derived from boundary
  output MatMul's K_full

Review comment fixes:
- evaluator.py: added widths/heights length validation
- test-e2e.sh: added trap handler for temp dir cleanup on exit/signal

Refs #8
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a full MLSys 2026 DAG scheduler submission with two implementations (Rust Track A + Python/Gemini Track B), plus extensive documentation, benchmarks, and a CI/E2E validation story.

Changes:

  • Introduces a Rust scheduler binary with optimizer pipeline (fusion, retention, split‑K, granularity search, traversal).
  • Introduces a Python agent that runs a local optimizer first, then optionally refines via Gemini.
  • Adds project documentation (requirements, architecture, ADRs, checkpoints), benchmarks, and an E2E script + CI workflow.

Reviewed changes

Copilot reviewed 49 out of 58 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
solution/scripts/test-e2e.sh Adds local “happy path” script to build/run both tracks against all benchmarks and sanity-check JSON.
solution/requirements/rice-scores.md RICE prioritization document for feature planning.
solution/requirements/requirements.md Detailed functional/non-functional requirements and assumptions.
solution/requirements/mvp-scope.md MVP scope definition, acceptance criteria, and risk register.
solution/requirements/moscow.md MoSCoW prioritization for features.
solution/README.md End-user documentation: quickstart, architecture, testing, and results.
solution/docs/decisions/ADR-003-greedy-fusion.md ADR for choosing greedy fusion vs DP/beam/ILP.
solution/docs/decisions/ADR-002-baseline-first.md ADR for baseline-first implementation strategy.
solution/docs/decisions/ADR-001-language-python.md ADR for Rust Track A + Python Track B language split.
solution/docs/architecture/workspace.dsl Structurizr DSL model describing architecture relationships.
solution/docs/architecture/user-journeys.md User-journey documentation for solve/evaluate/batch flows.
solution/docs/architecture/system-design.md Detailed system design + latency/working-set specification.
solution/docs/architecture/security-model.md Security/threat-model doc for local CLI.
solution/docs/architecture/deployment-topology.md Local “deployment”/dev topology and runbook.
solution/docs/architecture/database-schema.md Data model/schema reference (no DB; JSON schemas + mappings).
solution/docs/architecture/data-flow.md Pipeline and latency/working-set data-flow diagrams.
solution/docs/architecture/api-error-catalog.md Error catalog and exit codes for CLI-style usage.
solution/checkpoints/stage-4-validation.md Stage checkpoint summarizing tests/validation outcomes.
solution/checkpoints/stage-2.5-validation.md Stage checkpoint for issue creation/branch setup.
solution/checkpoints/stage-2-validation.md Stage checkpoint for architecture/system design completion.
solution/checkpoints/stage-1-validation.md Stage checkpoint for requirements analysis completion.
solution/checkpoints/stage-0-validation.md Stage checkpoint for initial repo setup.
solution/CHANGELOG.md Changelog for the delivered solution components.
solution/backend/rust/src/serializer.rs Serializes Solution to contest JSON format.
solution/backend/rust/src/parser.rs Parses problem + solution JSON into Rust data structures.
solution/backend/rust/src/optimizer/traversal.rs Implements snake/zig-zag traversal optimization for MatMul tiling.
solution/backend/rust/src/optimizer/splitk.rs Applies split‑K when MatMul subgraphs OOM at full k.
solution/backend/rust/src/optimizer/retention.rs Heuristic retention selection across subgraph boundaries.
solution/backend/rust/src/optimizer/pipeline.rs Orchestrates optimizer stages into a full pipeline.
solution/backend/rust/src/optimizer/mod.rs Exposes optimizer modules.
solution/backend/rust/src/optimizer/granularity.rs Searches (w,h,k) candidates for best latency under memory constraint.
solution/backend/rust/src/optimizer/fusion.rs Greedy fusion of adjacent subgraphs with feasibility checks.
solution/backend/rust/src/models.rs Core Rust data types for Problem/Solution/etc.
solution/backend/rust/src/memory.rs Working-set calculation + OOM checks + split‑K search helper.
solution/backend/rust/src/latency.rs Roofline latency model + memory-transfer accounting.
solution/backend/rust/src/evaluate.rs Local evaluator for validating solutions (coverage + OOM + latency).
solution/backend/rust/src/dag.rs DAG construction, topo sort, and boundary/ephemeral tensor utilities.
solution/backend/rust/src/baseline.rs Baseline schedule generation (1 op/subgraph).
solution/backend/rust/Cargo.toml Rust crate definition and release profile settings.
solution/agent/scheduler.py Python local optimizer pipeline (baseline, fusion, split‑K, granularity, traversal, retention).
solution/agent/requirements.txt Python dependency list (google-genai).
solution/agent/prompts/system.md System prompt for Gemini schedule generation.
solution/agent/prompts/strategies.md Strategy prompt for Gemini improvements.
solution/agent/prompts/examples.md Few-shot examples from PROBLEM.md for Gemini calibration.
solution/agent/agent.py Python Track B agent: local optimize + optional Gemini refinement + validation.
solution/.github/workflows/ci.yml CI workflow definition for Rust/Python/E2E checks (currently placed under solution/).
problem/mlsys.h Contest reference header (Problem/Solution structs + Evaluate signature).
problem/example_problem.json Example problem input JSON.
problem/benchmarks/mlsys-2026-9.json Benchmark input JSON #9.
problem/benchmarks/mlsys-2026-5.json Benchmark input JSON #5.
problem/benchmarks/mlsys-2026-13.json Benchmark input JSON #13.
problem/benchmarks/mlsys-2026-1.json Benchmark input JSON #1.
CLAUDE.md Claude Code SDLC configuration documentation.
.gitignore Repository ignore rules for Python/Rust/etc artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

CI workflow:
- Copied ci.yml to repo-root .github/workflows/ (GitHub only runs from root)

Parser hardening (parser.rs):
- parse_solution returns errors instead of coercing invalid indices to 0
- parse_problem validates op_type (MatMul/Pointwise), MatMul arity (2 inputs),
  non-empty outputs, and tensor index bounds
- native_granularity_for_subgraph uses min K_full (safe for all ops)

Memory model (memory.rs):
- find_split_k uses min K_full across MatMuls instead of max

Pipeline (pipeline.rs):
- Updated header comment to reflect actual 9-stage pipeline

E2E script (test-e2e.sh):
- Skip Track A benchmarks on build failure instead of running missing binary
- Removed stderr suppression from cargo build

Agent (agent.py):
- Coerce traversal_order elements to int
- Validate traversal_order is a valid permutation before accepting

Architecture docs (system-design.md):
- Removed editorial "Wait -- actually" self-correction, kept final formula only
Architecture docs (system-design, database-schema, deployment-topology,
user-journeys, workspace.dsl, api-error-catalog, security-model, data-flow):
- Replaced all Python module references with Rust src/ layout
- Updated data types from Python dataclasses to Rust structs
- Fixed CLI commands to match actual interface
- Updated optimizer composition to show all 9 stages
- Removed editorial self-corrections, kept only final formulas

Requirements docs:
- NFR-006: updated to dual-track (Rust + Python)
- A-008: updated to reflect Rust Track A + Python Track B
- mvp-scope: marked F-10 (traversal) as implemented, C++ row as superseded

Checkpoints:
- stage-1: updated language decision and traversal status
- stage-2: rewritten to reflect Rust modules, 9-stage pipeline, per-op K_full

README + CHANGELOG:
- Pipeline updated from 8 to 9 stages (added traversal optimization)
- Fixed simplified latency formula (sum-of-per-step-max, not max-of-totals)
- Added traversal.rs and evaluate subcommand to CHANGELOG

ADR renamed: ADR-001-language-python.md -> ADR-001-language-selection.md
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a complete MLSys 2026 contest submission scaffold: a Rust “Track A” CLI scheduler, a Python “Track B” Gemini agent, benchmark/problem fixtures, CI, E2E validation, and extensive planning/architecture documentation.

Changes:

  • Implement Track A Rust optimizer pipeline (parse → DAG → baseline/fusion/retention/split‑K/granularity/traversal → serialize) plus a local evaluate subcommand.
  • Implement Track B Python local optimizer + Gemini refinement loop with prompts, plus dependency setup.
  • Add CI (Rust + Python + E2E), benchmark fixtures, E2E script, and requirement/ADR/architecture/checkpoint docs.

Reviewed changes

Copilot reviewed 50 out of 59 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
solution/scripts/test-e2e.sh Adds an end-to-end “happy path” runner/validator for both tracks across all 5 benchmarks.
solution/requirements/rice-scores.md Documents RICE prioritization and rationale for the feature plan.
solution/requirements/requirements.md Captures functional/non-functional requirements and assumptions for the scheduler.
solution/requirements/mvp-scope.md Defines MVP scope, acceptance criteria, and risks.
solution/requirements/moscow.md MoSCoW prioritization for features.
solution/README.md Top-level solution README: quickstart, architecture, testing, and results summary.
solution/docs/decisions/ADR-003-greedy-fusion.md ADR explaining greedy fusion choice over DP/beam.
solution/docs/decisions/ADR-002-baseline-first.md ADR for baseline-first implementation strategy.
solution/docs/decisions/ADR-001-language-selection.md ADR selecting Rust for Track A and Python for Track B.
solution/docs/architecture/workspace.dsl Structurizr/C4-style architecture model of the system.
solution/docs/architecture/user-journeys.md User journeys for solve/evaluate/batch flows.
solution/docs/architecture/security-model.md Security/threat model documentation for a local CLI tool.
solution/docs/architecture/deployment-topology.md Explains local execution topology and setup for both tracks.
solution/docs/architecture/database-schema.md Documents JSON schemas and internal model mappings (no DB).
solution/docs/architecture/data-flow.md Sequence/data-flow diagrams for the pipeline and latency/WS checks.
solution/docs/architecture/api-error-catalog.md Catalogs CLI/evaluator error conditions and exit codes.
solution/checkpoints/stage-4-validation.md Checkpoint notes for test/validation stage and reported results.
solution/checkpoints/stage-2.5-validation.md Checkpoint notes for issue creation/branch setup stage.
solution/checkpoints/stage-2-validation.md Checkpoint notes for architecture/design stage.
solution/checkpoints/stage-1-validation.md Checkpoint notes for requirements stage.
solution/checkpoints/stage-0-validation.md Checkpoint notes for project setup stage.
solution/CHANGELOG.md Adds a v1.0.0 changelog entry describing delivered components.
solution/backend/rust/src/serializer.rs Implements Solution → JSON serialization for Track A.
solution/backend/rust/src/parser.rs Implements Problem/Solution parsing and helpers (k_full/native granularity).
solution/backend/rust/src/optimizer/traversal.rs Implements snake/zig-zag traversal optimization for MatMul tiling.
solution/backend/rust/src/optimizer/splitk.rs Applies split‑K to resolve OOM for MatMul-containing subgraphs.
solution/backend/rust/src/optimizer/retention.rs Implements tensor retention decisions across subgraph boundaries.
solution/backend/rust/src/optimizer/pipeline.rs Orchestrates the full 9-stage optimizer pipeline.
solution/backend/rust/src/optimizer/mod.rs Exposes optimizer submodules.
solution/backend/rust/src/optimizer/granularity.rs Searches best (w,h,k) granularity per subgraph under OOM constraint.
solution/backend/rust/src/optimizer/fusion.rs Greedy adjacent subgraph fusion logic.
solution/backend/rust/src/models.rs Core in-memory data structures (Problem/Op/Tensor/Solution/SubgraphDef).
solution/backend/rust/src/memory.rs Working-set calculation, OOM checking, and split‑K search helper.
solution/backend/rust/src/latency.rs Roofline latency model + memory plan building.
solution/backend/rust/src/evaluate.rs Local evaluator implementation for checking a solution against a problem.
solution/backend/rust/src/dag.rs DAG construction, topo sort, boundary tensor queries, and output dimensions.
solution/backend/rust/src/baseline.rs Baseline “1 op per subgraph” schedule builder.
solution/backend/rust/Cargo.toml Rust crate definition and release profile settings.
solution/agent/scheduler.py Python local optimizer pipeline (baseline/fusion/granularity/retention/traversal).
solution/agent/requirements.txt Python runtime dependency list (google-genai).
solution/agent/prompts/system.md System prompt defining rules/objective/output format for Gemini.
solution/agent/prompts/strategies.md Strategy guidance prompt for optimization suggestions.
solution/agent/prompts/examples.md Few-shot examples from PROBLEM.md for latency calibration.
solution/agent/agent.py Track B agent entrypoint: local optimize + optional Gemini refinement.
solution/.github/workflows/ci.yml Adds a (duplicate) CI workflow file under solution/ tree.
problem/mlsys.h Adds the C++ interface header describing canonical structs/Evaluate entrypoints.
problem/example_problem.json Adds a small example problem fixture.
problem/benchmarks/mlsys-2026-1.json Adds benchmark 1 fixture.
problem/benchmarks/mlsys-2026-5.json Adds benchmark 5 fixture.
problem/benchmarks/mlsys-2026-9.json Adds benchmark 9 fixture.
problem/benchmarks/mlsys-2026-13.json Adds benchmark 13 fixture.
problem/benchmarks/mlsys-2026-17.json Adds benchmark 17 fixture.
CLAUDE.md Adds SDLC/Claude Code configuration documentation.
.gitignore Adds repo ignore patterns for Python/Node/Rust and tooling.
.github/workflows/ci.yml Adds the actual root CI workflow (Rust + Python + E2E).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

parser.rs:
- parse_solution validates granularity array length (exactly 3)
- All fields use strict validation with errors instead of unwrap_or defaults

evaluate.rs:
- Validates traversal_order is a valid permutation of [0, num_tiles)
- Checks reported subgraph_latency matches computed value (tolerance 0.5)

CI:
- Removed duplicate solution/.github/workflows/ci.yml (only repo-root copy)

rice-scores.md:
- Fixed sort order: F-03 (10.0) now ranked above F-02 (7.5)
- Marked F-10 as implemented despite low RICE score

database-schema.md:
- Fixed benchmark 17 ops count: 96 -> 103

test-e2e.sh:
- Pass json_file as sys.argv[1] instead of shell interpolation
- Relaxed duplicate op check to allow recomputation (ops in multiple subgraphs)
- Use uv run python instead of python3
- Removed CLAUDE.md from git tracking
- Added CLAUDE.md to .gitignore
- Fixed deployment-topology.md: pyproject.toml -> requirements.txt for Track B
@melroyanthony melroyanthony merged commit 0c29c8d into main Mar 14, 2026
6 checks passed
@melroyanthony melroyanthony deleted the feat/issue-39-mlsys-scheduler branch March 14, 2026 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment