graph TB
%% Completed tasks at top
I1["✅ Verify disagg works over EFA"]
D1["✅ Fix docs: KV cache aware router"]
%% Standalone tasks
I2["🏗️ Add torch compilation<br/>and caching at AMI build"]
D2["📝 Fix docs: GPU workers diagram"]
%% Main flow: Benchmarking path
B1["📊 Setup lightweight<br/>benchmarking pattern"]
B1a["📊 Verify EFA impact on<br/>kvcache throughput/latency"]
B1b["📊 Set up benchmark<br/>description standard"]
B2["📊 Find output tokens/sec<br/>trade-offs"]
B3["📊 Develop better fit<br/>scaling policies"]
B4["📊 Create automated config<br/>and tuning mechanism"]
%% Decision point
DEC{"🔍 Decision: EFA bonding<br/>needed between prefill<br/>and decode instances?"}
%% Decision branches
OPT1["✅ Yes: Monolithic placement<br/>+ Fast interconnect<br/>- Reduced EC2 availability<br/>- Complex multi-AZ"]
OPT2["❌ No: Placement for TP only<br/>+ Simple, better availability<br/>+ Easy multi-AZ<br/>- Potential kvcache bottleneck"]
%% API Development
A1["🔧 Add API: add/remove<br/>worker for disagg"]
A2["🔧 Add worker draining<br/>to router"]
%% Future enhancements
F1["🔮 Test file-based<br/>hierarchical caching"]
F2["🔮 Implement distributed<br/>file system"]
F3["🔮 Consider worker<br/>routing methods"]
F4["🔮 Evaluate complexity<br/>vs benefits"]
%% Dependencies
I1 --> B1
B1 --> B1a
B1 --> B1b
B1 --> B2
B1a --> DEC
B1b --> B2
B2 --> B3
B2 --> B4
B3 --> B4
DEC --> OPT1
DEC --> OPT2
OPT1 --> A1
OPT2 --> A1
A1 --> A2
B3 --> F3
A1 --> F3
A2 --> F3
F1 --> F2
F2 --> F3
F3 --> F4
%% Styling
classDef completed fill:#4ade80,stroke:#22c55e,stroke-width:3px,color:#000
classDef docs fill:#94a3b8,stroke:#64748b,stroke-width:2px,color:#000
classDef api fill:#a78bfa,stroke:#8b5cf6,stroke-width:2px,color:#000
classDef infra fill:#60a5fa,stroke:#3b82f6,stroke-width:2px,color:#000
classDef bench fill:#fbbf24,stroke:#f59e0b,stroke-width:2px,color:#000
classDef future fill:#fb923c,stroke:#f97316,stroke-width:2px,color:#000
classDef decision fill:#ef4444,stroke:#dc2626,stroke-width:3px,color:#fff
classDef option fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#000
class I1,D1 completed
class D2 docs
class A1,A2 api
class I2 infra
class B1,B1a,B1b,B2,B3,B4 bench
class F1,F2,F3,F4 future
class DEC decision
class OPT1,OPT2 option
Roadmap & Dependencies
Completed
Active Development Paths
Path 1: Benchmarking → Scaling → API Development
Benchmarking & Performance Analysis
Decision Point: EFA Bonding Requirements
API Development (Post-Decision)
Path 2: Future Enhancements - Hierarchical Caching
Standalone Tasks
Infrastructure
Documentation
graph TB %% Completed tasks at top I1["✅ Verify disagg works over EFA"] D1["✅ Fix docs: KV cache aware router"] %% Standalone tasks I2["🏗️ Add torch compilation<br/>and caching at AMI build"] D2["📝 Fix docs: GPU workers diagram"] %% Main flow: Benchmarking path B1["📊 Setup lightweight<br/>benchmarking pattern"] B1a["📊 Verify EFA impact on<br/>kvcache throughput/latency"] B1b["📊 Set up benchmark<br/>description standard"] B2["📊 Find output tokens/sec<br/>trade-offs"] B3["📊 Develop better fit<br/>scaling policies"] B4["📊 Create automated config<br/>and tuning mechanism"] %% Decision point DEC{"🔍 Decision: EFA bonding<br/>needed between prefill<br/>and decode instances?"} %% Decision branches OPT1["✅ Yes: Monolithic placement<br/>+ Fast interconnect<br/>- Reduced EC2 availability<br/>- Complex multi-AZ"] OPT2["❌ No: Placement for TP only<br/>+ Simple, better availability<br/>+ Easy multi-AZ<br/>- Potential kvcache bottleneck"] %% API Development A1["🔧 Add API: add/remove<br/>worker for disagg"] A2["🔧 Add worker draining<br/>to router"] %% Future enhancements F1["🔮 Test file-based<br/>hierarchical caching"] F2["🔮 Implement distributed<br/>file system"] F3["🔮 Consider worker<br/>routing methods"] F4["🔮 Evaluate complexity<br/>vs benefits"] %% Dependencies I1 --> B1 B1 --> B1a B1 --> B1b B1 --> B2 B1a --> DEC B1b --> B2 B2 --> B3 B2 --> B4 B3 --> B4 DEC --> OPT1 DEC --> OPT2 OPT1 --> A1 OPT2 --> A1 A1 --> A2 B3 --> F3 A1 --> F3 A2 --> F3 F1 --> F2 F2 --> F3 F3 --> F4 %% Styling classDef completed fill:#4ade80,stroke:#22c55e,stroke-width:3px,color:#000 classDef docs fill:#94a3b8,stroke:#64748b,stroke-width:2px,color:#000 classDef api fill:#a78bfa,stroke:#8b5cf6,stroke-width:2px,color:#000 classDef infra fill:#60a5fa,stroke:#3b82f6,stroke-width:2px,color:#000 classDef bench fill:#fbbf24,stroke:#f59e0b,stroke-width:2px,color:#000 classDef future fill:#fb923c,stroke:#f97316,stroke-width:2px,color:#000 classDef decision fill:#ef4444,stroke:#dc2626,stroke-width:3px,color:#fff classDef option fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#000 class I1,D1 completed class D2 docs class A1,A2 api class I2 infra class B1,B1a,B1b,B2,B3,B4 bench class F1,F2,F3,F4 future class DEC decision class OPT1,OPT2 optionLegend
Key Insights
Critical Decision Point
The EFA bonding analysis will determine the entire scaling architecture approach. This decision impacts:
Parallel Work Streams
Current Bottlenecks