feat: ANE 7B–13B Production Roadmap + Starter Code for OpenClaw Swarm by codegen-sh[bot] · Pull Request #7 · dermitchell1993/ANE

codegen-sh · 2026-03-05T06:56:13Z

3-Phase Technical Roadmap: Stories110M → 7B–13B on M2 ANE

Comprehensive roadmap + code-level starter files for evolving ANE from the current working 109M-param baseline to production-grade 7B–13B inference and fine-tuning on a 24 GB M2 MacBook, using only the ANE for compute.

Core Architectural Insight: Weight-Swap Architecture

Instead of compiling one kernel per layer (~96 for a 32-layer 7B model, exceeding the ~119 compile limit), compile 11 parameterized kernel programs (6 forward + 5 backward) and iterate layers by swapping weights via unload→rewrite→load. This is validated by the existing test_weight_reload.m — the Day 2 gate test at 7B-layer dimensions is the critical go/no-go.

Memory Budget (M2 24 GB)

Model	Weights	KV Cache	Total	Headroom
7B Q4 (group128)	3.5 GB	1.0 GB	7.0 GB	13.0 GB
13B Q3 (group64)	4.9 GB	1.6 GB	9.7 GB	10.3 GB

Performance Targets

Metric	7B Q4	13B Q3
Decode tok/s	12–18	5–9
Prefill tok/s (seq=512)	18–25	8–12
LoRA training tok/s	3–6	1–3

Deliverables

File	Phase	Description
`roadmap/ROADMAP_7B_ANE.md`	—	Full 526-line spec with memory tables, perf derivations, decision gates
`roadmap/mil_gen_llama.h`	1	Parameterized MIL generator: `LlamaConfig` struct, RoPE fusion, GQA stub, residual+SwiGLU FFN
`bridge/ane_model.py`	1	Python ctypes wrapper: `ANEBridge`, `ANEModel`, `ModelConfig` presets, CPU reference forward, llama2c loader
`roadmap/quant_pack.h`	2	Q4/Q8 packing + NEON-optimized dequant + `.anepak` format header
`bridge/openclaw_manifest.yaml`	3	OpenClaw skill manifest: endpoints, telemetry, memory guard, watchdog, swarm registration

Decision Gates

Gate	Day	Criteria
G1	2	Weight swap < 10 ms/layer at dim=4096
G2	3	sin/cos MIL ops compile on ANE
G3	8	End-to-end 7B forward pass, correct logits
G4	10	> 10 tok/s decode on M2
G5	18	72h continuous run, 0 crashes

💻 View my work • 👤 Initiated by @dermitchell1993 • About Codegen
⛔ Remove Codegen from PR • 🚫 Ban action checks

Phase 1 (Days 1-3): Stable Multi-Layer Stacking - mil_gen_llama.h: parameterized MIL generator with RoPE, GQA, residual fusion - bridge/ane_model.py: Python ctypes wrapper + CPU reference forward pass - Weight-swap architecture: 11 compiled kernels, all layers share via reload Phase 2 (Days 4-10): Production Op Coverage + Quantization - quant_pack.h: Q4/Q8 packing + NEON-optimized dequant (<1ms/layer) - .anepak format for serialized quantized models - LoRA adapter integration as extra constant blobs in MIL Phase 3 (Days 11-21): Swarm-Ready Production - openclaw_manifest.yaml: skill manifest for Lobster registration - Telemetry, inference server, memory guard, watchdog specs Memory fits: 7B Q4 = 7.0 GB, 13B Q3 = 9.7 GB (20 GB cap on M2 24 GB) Target: 12-18 tok/s decode (7B Q4), 5-9 tok/s (13B Q3) on M2 ANE Co-authored-by: dermitchell1993 <dmitchell1993@aliasvault.net>

codegen-sh bot assigned dermitchell1993 Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ANE 7B–13B Production Roadmap + Starter Code for OpenClaw Swarm#7

feat: ANE 7B–13B Production Roadmap + Starter Code for OpenClaw Swarm#7
codegen-sh[bot] wants to merge 1 commit intomainfrom
codegen-bot/ane-7b-roadmap-openclaw-a3f7c2

codegen-sh bot commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codegen-sh bot commented Mar 5, 2026

3-Phase Technical Roadmap: Stories110M → 7B–13B on M2 ANE

Core Architectural Insight: Weight-Swap Architecture

Memory Budget (M2 24 GB)

Performance Targets

Deliverables

Decision Gates

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant