forked from yarongmu-google/MLSys
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
P2: Per-subgraph k optimization
Problem
With cost-based fusion (P1), the schedule may have multiple subgraphs. Each subgraph should independently optimize its k dimension based on its own MatMul ops' K_full values and memory constraints.
Proposed Approach
For each subgraph after fusion decisions:
- If subgraph contains MatMul ops, search k from min(K_full) down to 1
- For each k candidate, compute working set and latency
- Select the k that minimizes latency while fitting in memory (accounting for retained tensors)
Expected Impact
2-10x improvement combined with P1, especially on benchmarks with large K_full values (9, 13: K_full=4096).
Acceptance Criteria
- Each subgraph independently searches for optimal k
- k candidates: K_full, K_full/2, K_full/4, ..., 1
- Working set includes retained tensors from previous subgraphs
- Track A (Rust) and Track B (Python) both updated
- All tests pass
Dependencies
Depends on #16 (cost-based fusion) — only relevant when schedule has multiple subgraphs.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request