Skip to content

perf: tensor retention optimization between subgraphs #18

@melroyanthony

Description

@melroyanthony

P3: Tensor retention between subgraphs

Problem

With multiple subgraphs (from P1), boundary tensors are written to DRAM then re-read. Retaining tensors in fast memory across subgraph boundaries eliminates this round-trip cost (`2 * tensor_size / bandwidth`).

Current State

Retention logic exists (`optimizer/retention.rs`) but is never triggered because the mega-fusion strategy produces only 1 subgraph with 0 boundaries.

Acceptance Criteria

  • After fusion decisions create multiple subgraphs, retention pass identifies candidates
  • Retention only applied when residual capacity allows (retained tensor at full size)
  • Net latency improvement validated: retained saves > capacity cost
  • Track A (Rust) and Track B (Python) both updated
  • Verified against Example 3C pattern (4,638.4 latency)

Dependencies

Depends on #16 (cost-based fusion) — only relevant with multiple subgraphs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions