perf: per-subgraph k optimization for split schedules

## P2: Per-subgraph k optimization

### Problem
With cost-based fusion (P1), the schedule may have multiple subgraphs. Each subgraph should independently optimize its k dimension based on its own MatMul ops' K_full values and memory constraints.

### Proposed Approach
For each subgraph after fusion decisions:
1. If subgraph contains MatMul ops, search k from min(K_full) down to 1
2. For each k candidate, compute working set and latency
3. Select the k that minimizes latency while fitting in memory (accounting for retained tensors)

### Expected Impact
2-10x improvement combined with P1, especially on benchmarks with large K_full values (9, 13: K_full=4096).

### Acceptance Criteria
- [ ] Each subgraph independently searches for optimal k
- [ ] k candidates: K_full, K_full/2, K_full/4, ..., 1
- [ ] Working set includes retained tensors from previous subgraphs
- [ ] Track A (Rust) and Track B (Python) both updated
- [ ] All tests pass

### Dependencies
Depends on #16 (cost-based fusion) — only relevant when schedule has multiple subgraphs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: per-subgraph k optimization for split schedules #17

P2: Per-subgraph k optimization

Problem

Proposed Approach

Expected Impact

Acceptance Criteria

Dependencies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

perf: per-subgraph k optimization for split schedules #17

Description

P2: Per-subgraph k optimization

Problem

Proposed Approach

Expected Impact

Acceptance Criteria

Dependencies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions