Skip to content

Architecture

John Williams edited this page Mar 23, 2026 · 1 revision

Architecture Overview

Model

MiniAgent uses a Llama-style decoder-only transformer:

Input → Embedding → [RMSNorm → GQA+RoPE → SwiGLU] × N → RMSNorm → LM Head → Output

Model Sizes

Model Params dim layers heads KV heads Time (3090)
MiniAgent-26M 25.8M 512 8 8 2 ~2h
MiniAgent-108M 108M 768 16 12 4 ~8h
MiniAgent-MoE-145M 145M 512 8 8 2 ~6h

Key Design Choices (from minimind)

  • RMSNorm over LayerNorm (faster)
  • SwiGLU over ReLU (better gradient flow)
  • RoPE over absolute position (better length generalization)
  • GQA for memory efficiency
  • YaRN for long-text extrapolation
  • Weight tying (embedding = lm_head)

Training Pipeline

Stage 1: Pretrain  → Advertising language (500MB corpus)
Stage 2: SFT       → Follow PPC instructions (50MB)
Stage 3: LoRA      → Fine-tune on YOUR account data
Stage 4: DPO       → Good vs bad ad advice alignment
Stage 5: GRPO      → Group relative policy optimization

MCP Architecture

Claude Code / Cursor / Agent
        |
   [FastMCP Server]  ← stdio or HTTP
        |
   [Auth Module]     ← token → OAuth → ADC
        |
   [Google Ads API v23]  ← 29 tools

14 platforms, each a separate MCP server: Google, Meta, Microsoft, Amazon, Reddit, TradeDesk, LinkedIn, TikTok, Snapchat, Pinterest, Criteo, AdRoll, Quora, X/Twitter

Compatibility

Works with: Claude Code, Claude Desktop, Cursor, Windsurf, Codex, Gemini CLI, OpenAI Agents SDK, LangChain, Ollama, vLLM, llama.cpp, FastGPT, Open-WebUI, Dify

Clone this wiki locally