forked from yarongmu-google/MLSys
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
P3: Tensor retention between subgraphs
Problem
With multiple subgraphs (from P1), boundary tensors are written to DRAM then re-read. Retaining tensors in fast memory across subgraph boundaries eliminates this round-trip cost (`2 * tensor_size / bandwidth`).
Current State
Retention logic exists (`optimizer/retention.rs`) but is never triggered because the mega-fusion strategy produces only 1 subgraph with 0 boundaries.
Acceptance Criteria
- After fusion decisions create multiple subgraphs, retention pass identifies candidates
- Retention only applied when residual capacity allows (retained tensor at full size)
- Net latency improvement validated: retained saves > capacity cost
- Track A (Rust) and Track B (Python) both updated
- Verified against Example 3C pattern (4,638.4 latency)
Dependencies
Depends on #16 (cost-based fusion) — only relevant with multiple subgraphs.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request