forked from yarongmu-google/MLSys
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
P1: Cost-based fusion decisions
Problem
The current fusion strategy greedily merges ALL adjacent ops into a single mega-subgraph. This forces a single shared granularity (w, h, k) on all ops, which is suboptimal when ops have different optimal tile sizes (e.g., 4096x4096 MatMul followed by 128x128 Pointwise).
Proposed Approach
Before merging two subgraphs, compare:
- `latency(fused, best_shared_granularity)`
- `latency(A, best_gran_A) + latency(B, best_gran_B) + DRAM_boundary_cost`
Only fuse if it actually reduces total latency.
Expected Impact
2-5x improvement on heterogeneous graphs (benchmarks with mixed MatMul/Pointwise and varying tensor sizes).
Acceptance Criteria
- Fusion decision compares fused vs split latency (not just memory feasibility)
- Each candidate merge evaluated with optimal granularity for both fused and split cases
- DRAM boundary transfer cost correctly estimated for split case
- Track A (Rust) and Track B (Python) both updated
- All tests pass; no regression on any benchmark
Dependencies
Depends on #15 (k > 1 fix) — cost comparison requires correct latency calculation with proper k values.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request