Skip to content

TheApeMachine/caramba

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

130 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

caramba overview

caramba ๐Ÿงช

A substrate for architecture research and ML experimentation

Architectures are graphs. Graphs are manifests. Running experiments should require nothing more than a YAML file.

caramba provides a frictionless research environment with explicit building blocks, strict validation, and optimized execution. Define your model architecture in YAML, and caramba handles the rest, from compilation to training to publication-ready benchmarks.


๐Ÿ“‹ Table of Contents


๐ŸŽฏ What is caramba?

caramba is a declarative ML experimentation platform that separates intent from implementation:

  1. You declare what you want in a YAML manifest (architecture, training, benchmarks)
  2. caramba handles the how (compilation, optimization, execution, artifacts)

This design enables:

  • ๐Ÿ”ฌ Rapid prototyping โ€” Test new architectures without writing training loops
  • ๐Ÿ“Š Reproducible research โ€” Manifests are version-controllable experiment definitions
  • โšก Automatic optimization โ€” Runtime planning for batch sizes, precision, and compilation
  • ๐Ÿ“ Publication-ready artifacts โ€” CSV, PNG, and LaTeX outputs from benchmarks

โœจ Key Features

๐Ÿงฑ Generic Layer Library

Built-in support for modern neural network components:

Layer Type Description Documentation
Attention Standard, GQA, and DBA (Decoupled Bottleneck) modes โ†’ Layers
MoE Mixture of Experts with load balancing โ†’ Layers
SSM Selective State Space Models (Mamba-style) โ†’ Layers
GLU Variants SwiGLU, GEGLU, and other gated linear units โ†’ Layers
LoRA Low-rank adaptation for efficient fine-tuning โ†’ Layers
Normalization RMSNorm, LayerNorm โ†’ Layers
RoPE Rotary Position Embeddings โ†’ Layers
Linear Linear projections with optional bias โ†’ Layers
Dropout Dropout regularization โ†’ Layers
Diffusion Head Denoising head for diffusion models โ†’ Layers

๐Ÿ”— Composable Topologies

Define complex model structures declaratively:

Topology Use Case Example
StackedTopology Sequential layer execution Transformer blocks
ResidualTopology Skip connections (x + f(x)) Pre-norm blocks
NestedTopology Repeat layers N times N transformer layers
ParallelTopology Execute and stack outputs Multi-head attention
BranchingTopology Execute and concatenate Feature fusion
CyclicTopology Cyclic connections Graph networks
RecurrentTopology Recurrent with cache Sequence models

โ†’ Full Topology Guide

๐ŸŽ“ Multiple Training Modes

Mode Description When to Use
Standard End-to-end training from scratch Baseline experiments
Upcycle Architecture surgery + distillation Converting pretrained models
Orchestrated Dynamic optimizer switching Adaptive training research

โ†’ Training Guide

โšก Self-Optimization

caramba automatically optimizes your experiments:

  • Runtime planning โ€” Cached decisions for dtype, AMP, batch size, and torch.compile
  • KV-cache policy selection โ€” Budget-aware quantization with quality gates
  • Decode-plan bucketing โ€” Dynamic chunking for long-context inference
  • Adaptive speculative decoding โ€” Auto-adjusting draft lengths

โ†’ Optimization Details

AI Research Collaborators

python3 -m caramba config/presets/multiplex_chat.yml --target brainstorm

The above command puts you in a chat session with ChatGPT 5.2, Claude Opus 4.1, and Gemini Pro 3, which all have the tools they need to inspect the code, perform research, and other relevant actions so you can collaborate on whatever research goals you have.

The agents are not just talking directly with you, but also have the ability to respond to each other, so it should really feel like a team structure.

๐Ÿค– AI Research Automation

Optional AI-assisted workflows:

  • Paper drafting โ€” Generate LaTeX documents from experiment results
  • Automated review โ€” Get reviewer feedback and improvement suggestions
  • Research loop โ€” Write โ†’ Review โ†’ Experiment โ†’ Repeat

โ†’ Agent Workflows


๐Ÿš€ Quick Start

Installation

# Clone and install
git clone https://github.com/theapemachine/caramba.git
cd caramba
pip install -r requirements.txt

Run Your First Experiment

# Dry-run to validate a manifest (no execution)
python3 -m caramba config/presets/standard_transformer.yml --dry-run

# Run a full experiment with benchmarks
python3 -m caramba config/presets/llama32_1b_dba.yml --target paper

# Quick validation (reduced steps)
python3 -m caramba config/presets/llama32_1b_dba.yml --target quick

Non-LM Architectures

# MLP classifier
python3 -m caramba config/presets/mlp_classifier.yml --dry-run

# Diffusion model
python3 -m caramba config/presets/diffusion_vector.yml --dry-run

# Graph neural network
python3 -m caramba config/presets/graph_node_classification.yml --dry-run

โ†’ Complete Getting Started Guide


๐Ÿ”„ The Pipeline

Every experiment flows through this chain:

manifest โ†’ parse โ†’ lower โ†’ validate โ†’ build โ†’ run โ†’ verify โ†’ benchmark โ†’ artifacts
Stage What Happens
parse Load YAML/JSON, substitute ${variables}
lower Normalize type names, resolve references
validate Check schema, verify dimensions
build Construct PyTorch modules from topology
run Execute training runs with checkpointing
verify Compare outputs against thresholds
benchmark Measure perplexity, latency, memory
artifacts Generate CSV, PNG, LaTeX outputs

๐Ÿ“š Documentation

Guide Description
๐Ÿš€ Getting Started Installation, first experiment, basic concepts
๐Ÿ“„ Manifest Reference Complete YAML schema with examples
๐Ÿงฑ Layer Reference All layer types and their configurations
๐Ÿ”— Topology Guide Composing complex architectures
๐ŸŽ“ Training Guide Standard, upcycle, and orchestrated training
๐Ÿ”ฎ Inference Guide Generation, caching, speculative decoding
๐Ÿ“Š Benchmarking Running benchmarks and generating artifacts
๐Ÿค– Agent Workflows AI-assisted paper drafting and review
โšก Optimization Metal/Triton kernels, runtime planning

๐Ÿ“ฆ Available Presets

Ready-to-use configurations in config/presets/:

Preset Architecture Use Case
llama32_1b_dba.yml Llama 3.2 1B โ†’ DBA KV-cache compression research
standard_transformer.yml GPT-style transformer Baseline language modeling
moe_transformer.yml Transformer + MoE Sparse scaling research
mamba_ssm.yml Mamba-style SSM Linear-time sequence modeling
vit.yml Vision Transformer Image classification
lora_finetune.yml LoRA-enabled model Efficient fine-tuning
mlp_classifier.yml Simple MLP Non-LM classification
diffusion_vector.yml Diffusion denoiser Generative modeling
graph_node_classification.yml GCN Graph learning

โ†’ See all presets with full configurations


๐Ÿ–ฅ๏ธ Platform Support

Apple Silicon (MPS)

caramba treats Apple Silicon as a first-class research target:

  • โœ… Works out of the box โ€” No special configuration needed
  • โœ… Unified memory โ€” Fit larger models than discrete GPU VRAM
  • โœ… Metal kernels โ€” Fused DBA decode for fp16 KV-caches
  • โš ๏ธ Bandwidth limited โ€” Expect lower throughput than A100

NVIDIA CUDA

For maximum throughput:

  • โœ… Triton kernels โ€” Fused attention decode with quantized caches
  • โœ… DDP/FSDP โ€” Multi-GPU training support
  • โœ… torch.compile โ€” Automatic graph optimization

CPU

Fallback for development and testing:

  • โœ… Full functionality โ€” All features work
  • โš ๏ธ Slow โ€” Not recommended for serious training

๐Ÿ—๏ธ Architecture Overview

caramba/
โ”œโ”€โ”€ config/          # Typed config models, presets, manifests
โ”œโ”€โ”€ compiler/        # Manifest โ†’ executable plan
โ”œโ”€โ”€ topology/        # Graph nodes (stacked, residual, parallel, ...)
โ”œโ”€โ”€ layer/           # Thin PyTorch modules (attention, MoE, SSM, ...)
โ”œโ”€โ”€ model/           # Model building, embedders, trace utilities
โ”œโ”€โ”€ trainer/         # Training modes (standard, upcycle, orchestrated)
โ”œโ”€โ”€ infer/           # Generation loop with KV-cache management
โ”œโ”€โ”€ cache/           # KV-cache with quantization support
โ”œโ”€โ”€ benchmark/       # Perplexity, latency, memory measurement
โ”œโ”€โ”€ experiment/      # Unified pipeline orchestration
โ”œโ”€โ”€ orchestrator/    # Dynamic optimizer switching (SWATS, PIDAO, ...)
โ”œโ”€โ”€ optimizer/       # Triton (CUDA) + Metal (MPS) fused kernels
โ”œโ”€โ”€ agent/           # AI research automation (paper, review, loop)
โ”œโ”€โ”€ instrumentation/ # JSONL/HDF5/TensorBoard/W&B logging
โ””โ”€โ”€ console/         # Rich-based logging and progress bars

๐Ÿงช Testing

# Run all tests
python -m pytest -q

# Run with coverage
coverage run -m pytest && coverage report -m

๐Ÿ“„ License

MIT License


Getting Started ยท Manifests ยท Layers ยท Training ยท Inference

About

A substrate for A.I. architecture research

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •