MAGI Researchers

Three AI models. One synthesis. Zero lost progress.
Multi-model research pipeline for Claude Code — Claude, Gemini, and Codex debate, cross-verify, and synthesize publication-ready artifacts.

Why MAGI? • Get Started • Features • Usage • Roadmap • Changelog

Like the MAGI system in Evangelion — three supercomputers cross-verifying each other — this plugin orchestrates Claude, Gemini, and Codex for rigorous, multi-perspective research.

Why MAGI?

Single-model research has blind spots. One model hallucinates a citation or misses a critical constraint — and nobody catches it.

	Single Model	MAGI (3 Models)
Brainstorming	One perspective	Three independent perspectives
Verification	Self-review (unreliable)	Cross-model peer review
Blind spots	Undetected	Caught by competing models
Output	Raw text	Structured report with consensus & divergence analysis

Claude (MELCHIOR) — The Scientist. Active third MAGI personality — synthesis, planning, implementation, and original analytical contributions.
Gemini — The Critic. Creative brainstorming, cross-verification, broad knowledge.
Codex — The Builder. Feasibility analysis, code review, implementation focus.

Case Study: Damped Oscillator Equation Discovery

We gave all three single models and MAGI the same physics problem: discover an unknown damping function from noisy sensor data. No single model proposed combining classical diagnostics with modern ML — only MAGI's cross-verification caught that gap.

Source	Score	Highlight
MAGI	90	Staged pipeline: rapid diagnostics → symbolic discovery → validation → fallback
Claude	84	Best code coverage — runnable snippets for every approach
Codex	80	Elegant physics-informed neural ODE constraints
Gemini	67	Most accessible for general audience

Experiment details

Task: Discover $f(\dot{x})$ in $m\ddot{x} + f(\dot{x}) + kx = 0$ from noisy displacement data
Setup: Identical prompt → 4 sources → anonymized blind evaluation via MAGI
Evaluation: Two MAGI evaluator personas scored, cross-reviewed, and debated before synthesis
Limitations: N=1 case study, self-evaluation (MAGI evaluated MAGI), ~7:1 compute ratio
Full report: examples/damped_oscillator_comparison/evaluation_report.md
Raw outputs: examples/damped_oscillator_comparison/

Get Started

Prerequisites: Claude Code + Python 3.11+ with uv + Gemini CLI + Codex CLI

1. Install the plugin (inside Claude Code):

/plugin marketplace add Axect/magi-researchers
/plugin install magi-researchers@magi-researchers-marketplace

2. Set up MCP servers (one-time):

claude mcp add -s user gemini-cli -- npx -y gemini-mcp-tool
claude mcp add -s user codex-cli -- npx -y @cexll/codex-mcp-server
claude mcp add -s user context7 -- npx -y @upstash/context7-mcp@latest

3. Run your first research:

/magi-researchers:research "your research topic" --domain physics

MAGI generates cross-verified hypotheses, writes implementation code, renders publication-quality plots, and synthesizes a structured report — all saved to outputs/{topic}/.

Alternative: Local Development

git clone https://github.com/Axect/magi-researchers.git
claude --plugin-dir /path/to/magi-researchers
uv add matplotlib SciencePlots numpy

Features

Research Pipeline

Phase	What Happens	Output
Brainstorm	Three models generate and cross-review ideas with expert personas	`brainstorm/`
Plan	Concrete research plan with execution metadata, stress-tested by a hostile reviewer	`plan/`
Implement	Language-agnostic implementation with dry-run verification and frontmatter update	`src/`
Execute	Deterministic code execution from plan frontmatter; generates result artifacts	`results/`
Test & Visualize	Workspace-aware two-tier testing + publication-quality plots	`tests/` + `plots/`
Report	Structured report with cross-verified claim-evidence integrity	`report.md`

Highlights

MAGI-in-MAGI — --depth max scales to N domain specialists, each running a full mini-MAGI brainstorm in parallel with adversarial meta-debate
Adversarial review — Models debate, cross-verify, and attack each other's plans (murder board) before synthesis
Resume anywhere — --resume picks up from existing artifacts. No state files — your outputs are the checkpoints.
Publication-quality output — matplotlib + scienceplots (Nature theme), LaTeX math, PNG 300 dpi + vector PDF, structured reports with MAGI traceability

All features

Quality Assurance

Holistic & weighted scoring — Default expert-judgment ranking; optionally supply explicit JSON weights or adaptive prompt-analyzed weights
Dynamic persona casting — Each model gets a topic-specific expert identity, sharpening ideation
Phase gates — Automated quality checkpoints with conditional MAGI mini-review before each user approval step

Resilience

Artifact contracts — Each phase validates upstream files before running. Catches silent failures before they cascade.
Agent substitution — --substitute "Gemini -> Opus" replaces a rate-limited model with Claude across all pipeline stages.
Workspace anchor — .workspace.json locks the output directory path, preventing artifact drift after context compression.
Gemini fallback chain — Resilient 3-tier model fallback: gemini-3.1-pro-preview → gemini-2.5-pro → Claude

More

MAGI traceability review — All three models cross-verify the final report for orphaned claims and figures
Report gap detection — Auto-generates missing visualizations from existing data
Domain templates — Built-in context for Physics, AI/ML, Statistics, Mathematics, and Paper Writing
Journal strategy — Venue recommendations for Physics, AI/ML, and Interdisciplinary research

Under the hood

Plot manifest — Structured plot_manifest.json with metadata, section hints, and captions for automated report integration
Common Restrictions — Phase 4 enforces four output-interface contracts: plot_manifest.json (fixed schema), PNG + PDF/SVG dual format, execution evidence, dependency spec file. Internal process is autonomous.
Workspace Detection — Phase 3 and 4 detect languages and ecosystems from actual src/ files (package managers first, then file extensions). Priority: reality (src/) > plan intent > domain defaults.
Two-tier testing — Tier 1 (unit, mock-based, always runs) and Tier 2 (integration, depends on results/, skipped gracefully if absent). Test frameworks match the detected workspace language.
Deterministic execution — Phase 3.5 reads execution_cmd and dry_run_cmd directly from research_plan.md YAML frontmatter. No heuristics, no entry-point guessing.
research_plan.md frontmatter — Carries languages, ecosystem, execution_cmd, dry_run_cmd, expected_outputs, and estimated_runtime fields as machine-readable metadata for downstream phases.
Cross-phase artifact contracts — Each phase validates incoming artifacts before running (tool-based Glob/Read, not LLM guesswork)
Depth-controlled token budget — --depth low skips cross-review for fast/cheap runs; --depth high enables full adversarial debate
@filepath artifact references — MCP tool calls use @filepath syntax instead of inline content, so large artifacts are read directly from disk with zero truncation

Usage

Command	Description
`/magi-researchers:research "topic"`	Full pipeline (Brainstorm → Plan → Implement → Execute → Test → Report)
`/magi-researchers:research-brainstorm "topic"`	Brainstorming with cross-verification
`/magi-researchers:research-write --source <dir>`	Collaborative writing from research artifacts
`/magi-researchers:research-explain "concept"`	Concept explanation with Teacher/Critic pipeline
`/magi-researchers:research-implement`	Language-agnostic implementation (needs existing plan)
`/magi-researchers:research-execute`	Execute research code; generate `results/` artifacts
`/magi-researchers:research-test`	Workspace-aware testing & visualization
`/magi-researchers:research-report`	Report generation

Depth Control

The --depth flag controls how thoroughly models review each other's work:

Depth	What Happens	Cost
`low`	Independent brainstorming, no cross-review	Cheapest
`medium` (default)	Cross-model peer review + synthesis	Standard
`high`	Full adversarial debate (defend/concede/revise)	Higher
`max`	MAGI-in-MAGI: N specialist subagents, each running a full mini-MAGI	Highest

All flags

Flag	Values	Default	Description
`--domain`	`physics` `ai_ml` `statistics` `mathematics` `paper`	auto-inferred	Research domain for context
`--weights`	JSON / `adaptive`	holistic	Scoring mode: omit for expert-judgment ranking, JSON for weighted, `adaptive` for prompt-analyzed
`--depth`	`low` `medium` `high` `max`	`medium`	Review thoroughness
`--personas`	`2`–`5`	`auto`	Number of domain-specialist subagents for `--depth max`
`--resume`	`<output_dir>`	—	Resume an interrupted pipeline from the last completed phase
`--claude-only`	flag	off	Replace Gemini/Codex with Claude subagents for single-model usage
`--substitute`	`"Gemini -> Opus"` `"Codex -> Opus"`	—	Replace a specific model with Claude when hitting rate limits

# Quick brainstorm with default settings
/magi-researchers:research "neural ODE solvers for stiff systems" --domain physics

# Deep analysis with adversarial debate
/magi-researchers:research "causal inference in observational studies" --domain statistics --depth high

# Resume a crashed session — MAGI picks up where you left off
/magi-researchers:research "neural ODE solvers" --resume outputs/neural_ode_solvers_20260225_v1

# Hierarchical multi-persona analysis (MAGI-in-MAGI)
/magi-researchers:research "variational inference for Bayesian deep learning" --domain ai_ml --depth max --personas 4

# Substitute Gemini with Claude when hitting rate limits
/magi-researchers:research "neural ODE solvers" --domain physics --substitute "Gemini -> Opus"

# Fast ideation only (no cross-review, lowest cost)
/magi-researchers:research-brainstorm "transformer alternatives for long sequences" --domain ai_ml --depth low

If MAGI saves you research time, consider leaving a star so other researchers can find it.

Output Structure

outputs/{topic_YYYYMMDD_vN}/
├── .workspace.json   # Workspace anchor (absolute path for artifact safety)
├── brainstorm/       # Personas, ideas, cross-reviews, debate, synthesis
├── explain/          # Teacher/Critic analysis, strategy, final explanation
├── write/            # Intake, outline, draft, review, final document
├── plan/             # Research plan (with YAML frontmatter), murder board, mitigations, phase gate
├── src/              # Implementation (any language) + phase gate
├── results/          # Generated artifacts from Phase 3.5 (data, checkpoints, logs)
├── tests/            # Test suite (Tier 1 unit + Tier 2 integration) + phase gate
├── plots/            # PNG + PDF + plot_manifest.json
└── report.md         # Final structured report

Each phase produces artifacts that double as resume checkpoints — just pass --resume to continue from where you left off.

Full artifact tree

.workspace.json               # Workspace anchor (absolute output path)
brainstorm/
├── weights.json              # Scoring weights
├── personas.md               # Expert personas
├── gemini_ideas.md           # Gemini brainstorm
├── codex_ideas.md            # Codex brainstorm
├── gemini_review_of_codex.md # Cross-review (depth ≥ medium)
├── codex_review_of_gemini.md # Cross-review (depth ≥ medium)
├── disagreements.md          # Disagreement summary (depth = high)
├── debate_round2_gemini.md   # Adversarial debate (depth = high)
├── debate_round2_codex.md    # Adversarial debate (depth = high)
└── synthesis.md              # Weighted synthesis

plan/
├── research_plan.md          # Research plan (YAML frontmatter: languages, execution_cmd, etc.)
├── murder_board.md           # Plan stress-test
├── mitigations.md            # Flaw mitigations
└── phase_gate.md             # Plan quality gate

src/
├── *                         # Research implementation (any language)
└── phase_gate.md             # Implementation quality gate

results/
├── run_log.txt               # Full execution log
├── pre_execution_status.json  # Structured status (state, error_class, severity, retryable, next_action)
└── *                         # Generated artifacts (csv, npz, pt, etc.)

tests/
├── test_*                    # Tier 1 unit tests (mock-based)
├── test_integration_*        # Tier 2 integration tests (guarded by results/)
└── phase_gate.md             # Test quality gate

Full artifact tree — --depth max

.workspace.json               # Workspace anchor (absolute output path)
brainstorm/
├── weights.json              # Scoring weights
├── personas.md               # N domain-specialist personas
├── persona_1/                # Persona 1 mini-MAGI output
│   ├── gemini_ideas.md
│   ├── codex_ideas.md
│   ├── gemini_review_of_codex.md
│   ├── codex_review_of_gemini.md
│   └── conclusion.md
├── persona_2/
│   └── ...                   # (same 5 files per persona)
├── persona_N/
│   └── ...
├── meta_review_gemini.md     # Gemini meta-review of all conclusions
├── meta_review_codex.md      # Codex meta-review of all conclusions
├── meta_disagreements.md     # Meta-disagreement summary
├── meta_debate_gemini.md     # Adversarial debate — Gemini
├── meta_debate_codex.md      # Adversarial debate — Codex
└── synthesis.md              # Enriched final synthesis

Recommended Permissions

Add to .claude/settings.local.json:

{
  "permissions": {
    "allow": [
      "Bash(uv:*)",
      "Bash(uv run:*)",
      "Bash(uv run python3:*)",
      "Bash(uv add:*)",
      "Bash(uv sync:*)",
      "Bash(mkdir:*)",
      "mcp__gemini-cli__ask-gemini",
      "mcp__gemini-cli__brainstorm",
      "mcp__codex-cli__ask-codex",
      "mcp__codex-cli__brainstorm",
      "mcp__plugin_context7_context7__resolve-library-id",
      "mcp__plugin_context7_context7__query-docs"
    ]
  }
}

Roadmap

Latest — v0.14.0: Report quality hardening — draft validation gate (Step 3.5), depth-scaled plot budgets, schema v1.1.0 with style metadata & structured change tracking, feedback tier keyword signals. See CHANGELOG.md for full history.

Up next:

Terminal demo GIF — one-command walkthrough
More domain & journal strategy templates
Ubiquitous Context7 — live doc lookups during testing and report writing
Conditional variance enforcement — smart error-bar policy
Cost estimation — token budget preview before execution

Contributing

Contributions welcome — especially new domain templates. See CONTRIBUTING.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
.claude-plugin		.claude-plugin
.github		.github
config		config
docs		docs
examples/damped_oscillator_comparison		examples/damped_oscillator_comparison
schemas		schemas
shared		shared
skills		skills
templates		templates
utils		utils
.gitignore		.gitignore
.mcp.json		.mcp.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MAGI-RESEARCHERS.png		MAGI-RESEARCHERS.png
MAGI-Researchers.png		MAGI-Researchers.png
MAGI.drawio.png		MAGI.drawio.png
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MAGI Researchers

Why MAGI?

Case Study: Damped Oscillator Equation Discovery

Get Started

Features

Research Pipeline

Highlights

Usage

Depth Control

Output Structure

Roadmap

Contributing

License

About

Uh oh!

Releases 22

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MAGI Researchers

Why MAGI?

Case Study: Damped Oscillator Equation Discovery

Get Started

Features

Research Pipeline

Highlights

Usage

Depth Control

Output Structure

Roadmap

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages