An autonomous research tool that generates hypotheses from trading data, replays historical trades with modifications, and validates results with statistical rigor.
Built for: Answering "what if we changed X?" with data, not opinion.
Trade Data (135+ closed trades)
│
▼
┌─────────────────────┐
│ Hypothesis Engine │
│ │
│ Generates what-if │
│ scenarios: │
│ • Remove a setup │
│ • Filter by symbol │
│ • Change thresholds│
│ • Conditional rules│
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ Trade Replay │
│ │
│ Replays all trades │
│ with modification │
│ applied. Computes │
│ new P&L, WR, Sharpe│
└─────────┬───────────┘
│
▼
┌─────────────────────┐
│ Statistical Tests │
│ │
│ • Permutation test │
│ (p-value) │
│ • Bootstrap CI │
│ (95% confidence) │
│ • Monte Carlo sim │
│ (ruin probability│
│ + wealth dist) │
└─────────┬───────────┘
│
▼
Validated / Rejected
with confidence scores
Most backtesting tools test ONE hypothesis you come up with. This tool generates hypotheses automatically from your data, tests all of them, and ranks by statistical significance.
Example output:
- "Remove Setup X" → +$840 improvement, Monte Carlo confidence 87.7%
- "Crypto only" → +$797 improvement, Monte Carlo confidence 85.2%
- "Add condition Y" → -$120, rejected (p=0.43)
Every hypothesis is validated three ways:
| Method | What It Measures |
|---|---|
| Permutation test | P-value: is the improvement statistically significant, or could it be random? |
| Bootstrap CI | 95% confidence interval: what's the realistic range of improvement? |
| Monte Carlo | 10,000 portfolio simulations: what's the ruin probability and wealth distribution? |
A hypothesis must pass ALL THREE to be considered "validated."
- Auto-generates hypotheses from trade performance data
- Replays historical trades with modifications applied
- Permutation testing with configurable iterations (default: 1,000)
- Bootstrap confidence intervals (default: 95%)
- Monte Carlo portfolio simulation (default: 10,000 runs)
- Ranked output by statistical significance
- Runs autonomously on schedule (weekly via launchd)
Python 3.11, NumPy, pandas, SciPy (statistics)
- Post-soak analysis: which setups to keep, modify, or remove
- Parameter sensitivity: how robust is each threshold
- Strategy optimization: data-driven improvements, not gut feel
- Risk assessment: Monte Carlo ruin probability before going live
Part of an autonomous trading system with 54 services and 3,778 tests. Full system details at portfolio.