Skip to content

perf: per-subgraph k optimization for split schedules #17

@melroyanthony

Description

@melroyanthony

P2: Per-subgraph k optimization

Problem

With cost-based fusion (P1), the schedule may have multiple subgraphs. Each subgraph should independently optimize its k dimension based on its own MatMul ops' K_full values and memory constraints.

Proposed Approach

For each subgraph after fusion decisions:

  1. If subgraph contains MatMul ops, search k from min(K_full) down to 1
  2. For each k candidate, compute working set and latency
  3. Select the k that minimizes latency while fitting in memory (accounting for retained tensors)

Expected Impact

2-10x improvement combined with P1, especially on benchmarks with large K_full values (9, 13: K_full=4096).

Acceptance Criteria

  • Each subgraph independently searches for optimal k
  • k candidates: K_full, K_full/2, K_full/4, ..., 1
  • Working set includes retained tensors from previous subgraphs
  • Track A (Rust) and Track B (Python) both updated
  • All tests pass

Dependencies

Depends on #16 (cost-based fusion) — only relevant when schedule has multiple subgraphs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions