A substrate for architecture research and ML experimentation
Architectures are graphs. Graphs are manifests. Running experiments should require nothing more than a YAML file.
caramba provides a frictionless research environment with explicit building blocks, strict validation, and optimized execution. Define your model architecture in YAML, and caramba handles the rest, from compilation to training to publication-ready benchmarks.
- What is caramba?
- Key Features
- Quick Start
- The Pipeline
- Documentation
- Available Presets
- Platform Support
- Architecture Overview
caramba is a declarative ML experimentation platform that separates intent from implementation:
- You declare what you want in a YAML manifest (architecture, training, benchmarks)
- caramba handles the how (compilation, optimization, execution, artifacts)
This design enables:
- ๐ฌ Rapid prototyping โ Test new architectures without writing training loops
- ๐ Reproducible research โ Manifests are version-controllable experiment definitions
- โก Automatic optimization โ Runtime planning for batch sizes, precision, and compilation
- ๐ Publication-ready artifacts โ CSV, PNG, and LaTeX outputs from benchmarks
Built-in support for modern neural network components:
| Layer Type | Description | Documentation |
|---|---|---|
| Attention | Standard, GQA, and DBA (Decoupled Bottleneck) modes | โ Layers |
| MoE | Mixture of Experts with load balancing | โ Layers |
| SSM | Selective State Space Models (Mamba-style) | โ Layers |
| GLU Variants | SwiGLU, GEGLU, and other gated linear units | โ Layers |
| LoRA | Low-rank adaptation for efficient fine-tuning | โ Layers |
| Normalization | RMSNorm, LayerNorm | โ Layers |
| RoPE | Rotary Position Embeddings | โ Layers |
| Linear | Linear projections with optional bias | โ Layers |
| Dropout | Dropout regularization | โ Layers |
| Diffusion Head | Denoising head for diffusion models | โ Layers |
Define complex model structures declaratively:
| Topology | Use Case | Example |
|---|---|---|
StackedTopology |
Sequential layer execution | Transformer blocks |
ResidualTopology |
Skip connections (x + f(x)) |
Pre-norm blocks |
NestedTopology |
Repeat layers N times | N transformer layers |
ParallelTopology |
Execute and stack outputs | Multi-head attention |
BranchingTopology |
Execute and concatenate | Feature fusion |
CyclicTopology |
Cyclic connections | Graph networks |
RecurrentTopology |
Recurrent with cache | Sequence models |
| Mode | Description | When to Use |
|---|---|---|
| Standard | End-to-end training from scratch | Baseline experiments |
| Upcycle | Architecture surgery + distillation | Converting pretrained models |
| Orchestrated | Dynamic optimizer switching | Adaptive training research |
caramba automatically optimizes your experiments:
- Runtime planning โ Cached decisions for dtype, AMP, batch size, and
torch.compile - KV-cache policy selection โ Budget-aware quantization with quality gates
- Decode-plan bucketing โ Dynamic chunking for long-context inference
- Adaptive speculative decoding โ Auto-adjusting draft lengths
python3 -m caramba config/presets/multiplex_chat.yml --target brainstormThe above command puts you in a chat session with ChatGPT 5.2, Claude Opus 4.1, and Gemini Pro 3, which all have the tools they need to inspect the code, perform research, and other relevant actions so you can collaborate on whatever research goals you have.
The agents are not just talking directly with you, but also have the ability to respond to each other, so it should really feel like a team structure.
Optional AI-assisted workflows:
- Paper drafting โ Generate LaTeX documents from experiment results
- Automated review โ Get reviewer feedback and improvement suggestions
- Research loop โ Write โ Review โ Experiment โ Repeat
# Clone and install
git clone https://github.com/theapemachine/caramba.git
cd caramba
pip install -r requirements.txt# Dry-run to validate a manifest (no execution)
python3 -m caramba config/presets/standard_transformer.yml --dry-run
# Run a full experiment with benchmarks
python3 -m caramba config/presets/llama32_1b_dba.yml --target paper
# Quick validation (reduced steps)
python3 -m caramba config/presets/llama32_1b_dba.yml --target quick# MLP classifier
python3 -m caramba config/presets/mlp_classifier.yml --dry-run
# Diffusion model
python3 -m caramba config/presets/diffusion_vector.yml --dry-run
# Graph neural network
python3 -m caramba config/presets/graph_node_classification.yml --dry-runโ Complete Getting Started Guide
Every experiment flows through this chain:
manifest โ parse โ lower โ validate โ build โ run โ verify โ benchmark โ artifacts
| Stage | What Happens |
|---|---|
| parse | Load YAML/JSON, substitute ${variables} |
| lower | Normalize type names, resolve references |
| validate | Check schema, verify dimensions |
| build | Construct PyTorch modules from topology |
| run | Execute training runs with checkpointing |
| verify | Compare outputs against thresholds |
| benchmark | Measure perplexity, latency, memory |
| artifacts | Generate CSV, PNG, LaTeX outputs |
| Guide | Description |
|---|---|
| ๐ Getting Started | Installation, first experiment, basic concepts |
| ๐ Manifest Reference | Complete YAML schema with examples |
| ๐งฑ Layer Reference | All layer types and their configurations |
| ๐ Topology Guide | Composing complex architectures |
| ๐ Training Guide | Standard, upcycle, and orchestrated training |
| ๐ฎ Inference Guide | Generation, caching, speculative decoding |
| ๐ Benchmarking | Running benchmarks and generating artifacts |
| ๐ค Agent Workflows | AI-assisted paper drafting and review |
| โก Optimization | Metal/Triton kernels, runtime planning |
Ready-to-use configurations in config/presets/:
| Preset | Architecture | Use Case |
|---|---|---|
llama32_1b_dba.yml |
Llama 3.2 1B โ DBA | KV-cache compression research |
standard_transformer.yml |
GPT-style transformer | Baseline language modeling |
moe_transformer.yml |
Transformer + MoE | Sparse scaling research |
mamba_ssm.yml |
Mamba-style SSM | Linear-time sequence modeling |
vit.yml |
Vision Transformer | Image classification |
lora_finetune.yml |
LoRA-enabled model | Efficient fine-tuning |
mlp_classifier.yml |
Simple MLP | Non-LM classification |
diffusion_vector.yml |
Diffusion denoiser | Generative modeling |
graph_node_classification.yml |
GCN | Graph learning |
โ See all presets with full configurations
caramba treats Apple Silicon as a first-class research target:
- โ Works out of the box โ No special configuration needed
- โ Unified memory โ Fit larger models than discrete GPU VRAM
- โ Metal kernels โ Fused DBA decode for fp16 KV-caches
โ ๏ธ Bandwidth limited โ Expect lower throughput than A100
For maximum throughput:
- โ Triton kernels โ Fused attention decode with quantized caches
- โ DDP/FSDP โ Multi-GPU training support
- โ torch.compile โ Automatic graph optimization
Fallback for development and testing:
- โ Full functionality โ All features work
โ ๏ธ Slow โ Not recommended for serious training
caramba/
โโโ config/ # Typed config models, presets, manifests
โโโ compiler/ # Manifest โ executable plan
โโโ topology/ # Graph nodes (stacked, residual, parallel, ...)
โโโ layer/ # Thin PyTorch modules (attention, MoE, SSM, ...)
โโโ model/ # Model building, embedders, trace utilities
โโโ trainer/ # Training modes (standard, upcycle, orchestrated)
โโโ infer/ # Generation loop with KV-cache management
โโโ cache/ # KV-cache with quantization support
โโโ benchmark/ # Perplexity, latency, memory measurement
โโโ experiment/ # Unified pipeline orchestration
โโโ orchestrator/ # Dynamic optimizer switching (SWATS, PIDAO, ...)
โโโ optimizer/ # Triton (CUDA) + Metal (MPS) fused kernels
โโโ agent/ # AI research automation (paper, review, loop)
โโโ instrumentation/ # JSONL/HDF5/TensorBoard/W&B logging
โโโ console/ # Rich-based logging and progress bars
# Run all tests
python -m pytest -q
# Run with coverage
coverage run -m pytest && coverage report -mGetting Started ยท Manifests ยท Layers ยท Training ยท Inference
