AI vulnerability researcher that reads Firefox source code.
Redbox is an autonomous research agent inspired by Google's Project Big Sleep. It clones mozilla-central (14GB of C++), reads the actual source, and generates targeted "attack briefs" for Blackbox Protocol to fuzz.
⚠️ Honest disclaimer: 500+ research sessions, 433 hypotheses, 392 briefs tested... 0 crashes. Firefox's security team is just that good. But we've built a learning system that accumulates knowledge across sessions, and we're pivoting strategies.
Most fuzzers generate random-ish inputs. We wanted something smarter:
- Read the actual source — Clone Firefox, grep for patterns, understand the code
- Learn across sessions — Remember what we've seen, don't repeat dead ends
- Generate targeted briefs — Tell the fuzzer exactly which C++ class to attack
- Feedback loop — If something crashes, dig deeper in that area
It's like having a tireless security researcher who reads C++ 24/7.
┌──────────────────────────────────────────────────────────────┐
│ REDBOX PROTOCOL │
│ │
│ Intel Gathering Research Agent Output │
│ ─────────────── ────────────── ────── │
│ • CVE advisories → • Claude Opus → • Attack │
│ • Bugzilla bugs • 30 tools/session briefs │
│ • Patch diffs • Reads C++ source • Findings │
│ • ripgrep search • Knowledge │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ knowledge.db (SQLite) │ │
│ │ seeds → findings → hypotheses → feedback → repeat │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ mozilla-central/ (14 GB C++ source) │ │
│ │ searchable, readable, always fresh │ │
│ └─────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
│
▼ briefs/*.json
┌──────────────────────────────────────────────────────────────┐
│ BLACKBOX PROTOCOL │
│ (generates HTML/JS, crashes Firefox) │
└──────────────────────────────────────────────────────────────┘
| What | Count |
|---|---|
| Research sessions | 500+ |
| Source files analyzed | 1000+ |
| Findings recorded | 344 |
| Hypotheses generated | 433 |
| Briefs tested | 392 |
| Crashes found | 0 |
Claude Opus runs in a tool-use loop with access to:
| Tool | Purpose |
|---|---|
read_source_file |
Read any file from mozilla-central |
search_code |
ripgrep across 400K files |
list_directory |
Explore the source tree |
read_knowledge |
Check what previous sessions found |
record_finding |
Save observations for future sessions |
create_attack_brief |
Output a hypothesis for the fuzzer |
Each session runs ~30 tool calls, reads its own prior findings, and builds on accumulated knowledge.
The Pivot (March 2026):
We started with CVE variant analysis — read old patches, find similar bugs. After 350+ tests with 0 crashes, we realized: those bugs are already patched everywhere.
New approach:
- Recent commits (last 45-60 days) — new code, new bugs
- Under-fuzzed APIs — WebGPU, PDF.js, Fission IPC, WebTransport
- Logic bugs — CORS bypasses, privilege escalation
- Complex state machines — Service Workers, animation timelines
This mirrors how Google's Big Sleep actually found bugs — by analyzing recent changes, not old CVEs.
priority_directories = [
"dom/webgpu", # WGSL shaders, Rust-C++ boundary
"browser/extensions/pdfjs", # PDF parsing
"dom/ipc", # Fission process isolation
"netwerk/protocol/webtransport", # New protocol
"dom/serviceworkers", # Complex state machine
"dom/highlight", # New CSS Highlight API
]# Clone
git clone https://github.com/chairulridjaal/redbox-protocol
cd redbox-protocol
# Setup
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
sudo apt install mercurial ripgrep
# Configure
cp .env.example .env # Add ANTHROPIC_API_KEY
# Clone Firefox source (14GB, takes 30-60 min first time)
mkdir -p sources
hg clone https://hg.mozilla.org/mozilla-central/ sources/mozilla-central/
# Run
python main.pyWhen Redbox finds something interesting, it outputs a brief:
{
"brief_id": "20260325_WebGPU_buffer_validation",
"target": {
"class": "GPUBuffer",
"method": "MapAsync",
"file": "dom/webgpu/Buffer.cpp"
},
"vulnerability": {
"class": "race_condition",
"hypothesis": "MapAsync callback can fire while buffer is being destroyed...",
"source_evidence": "// actual C++ code from the file"
},
"trigger": {
"sequence": "1. Create GPUBuffer\n2. Call mapAsync()\n3. Destroy buffer before callback",
"js_hint": "buffer.mapAsync().then(() => { /* buffer already destroyed */ })"
},
"confidence": "medium"
}Blackbox reads this and generates targeted HTML/JS to trigger the bug.
Session 1: Read CVE patch → understand the vulnerability
Session 2: Read session 1 findings → go deeper, explore callers
Session 3: Read fuzzer feedback → if crash, dig deeper; if not, pivot
Session N: Deep component expertise, highly targeted briefs
The agent explicitly avoids re-reading files it's already seen and won't re-investigate dead ends.
redbox-protocol/
├── main.py # Entry point
├── researcher.py # Claude research loop
├── intel.py # CVE/Bugzilla gathering
├── knowledge.py # SQLite persistence
├── tools.py # Tool definitions
├── sources/
│ └── mozilla-central/ # 14GB Firefox source
├── briefs/ # Output for Blackbox
├── knowledge.db # Accumulated learning
└── logs/
└── research.log
Real talk:
- Firefox is incredibly well-fuzzed — Mozilla runs OSS-Fuzz, libFuzzer, and their own tools 24/7
- Security patches get backported — Even ESR has the fixes
- CVE variants are patched too — Security team is thorough
- We're targeting the wrong things — Hence the pivot to recent code
The infrastructure works. The approach is theoretically sound. We just need to find the right target.
Even without crashes, we learned a lot:
- How Firefox animation lifecycle works internally
- Where raw pointers are held across callbacks
- Which subsystems use RefPtr vs raw pointers
- How IPC message validation is structured
- Where the Rust-C++ boundaries are
This knowledge base has value.
- Blackbox Protocol — The fuzzer that consumes our briefs
- Google Big Sleep — The inspiration
- Project Zero Blog — Where the real bugs get found
Research purposes only.