agent.py
┌─────────────────────────────────────┐
│ EDITABLE SECTION │ ← agent rewrites this
│ get_signals(data) │
│ get_position_sizes(signals, data) │
│ get_regime(data) │
└─────────────────────────────────────┘
══ FIXED ADAPTER BOUNDARY ════════════════
┌─────────────────────────────────────┐
│ FIXED SECTION │ ← never touched
│ UNIVERSE_SIZE = 300 │
│ DEV_END = "2016-12-31" │
│ HOLDBACK_START = "2017-01-01" │
│ HOLDBACK_END = "2018-12-31" │
│ WF_START = "2019-01-01" │
│ WF_END = "2021-12-31" │
│ HOLDOUT_START = "2022-01-01" │
│ load_data() │
│ simulate() │
│ compute_metrics() │
│ walk_forward() │
│ CLI entry point │
└─────────────────────────────────────┘
program.md ← YOU edit this
results.csv ← auto-generated experiment log
last_result.json ← auto-generated, last backtest output
.agent/best_dev_agent.py ← best DEV score snapshot
.agent/best_holdback_agent.py ← best holdback-validated snapshot
Point your agent at the repo and prompt:
Read program.md and let's start a new experiment.
program.md — change the research directive, the hypothesis space, or the
target metrics. The agent reads this to decide what to try.
score = sharpe
- max(0, (turnover - 0.3) * 0.5)
- max(0, (|max_drawdown| - 0.20) * 2)
| Period | Dates | Purpose |
|---|---|---|
| Development | 2010–2016 | Iterate freely |
| IS Holdback | 2017–2018 | One-shot gate per hypothesis |
| Walk-forward | 2019–2021 | One-shot validation per hypothesis |
| Holdout | 2022–present | Locked until final run |
| Condition | Meaning |
|---|---|
| Walk-forward passes | Target achieved |
| 200 iterations | Time limit |
- Transaction cost: 10bps per rebalance
- Universe: top 300 stocks by 30-day average dollar volume (selected on IS data only)
- Execution: 1-day lag on all signals
- Different asset class: swap the tickers in
SP500_TICKERSfor ETFs, futures, crypto - Different score formula: edit
compute_metrics()in the fixed section AND updateprogram.md - Different split: change the date constants in the fixed section — this invalidates cross-experiment comparisons
- Larger universe: change
UNIVERSE_SIZEin the fixed section — this invalidates cross-experiment comparisons
- Survivorship bias: delisted stocks weren't included in the MVP
- More realistic costs: 10bps/trade may be optimistic for this universe
- Sharpe discounting for iteration: The score formula does not penalize for the number of iterations run.
- Walk-forward leakage: Since the agent can see walk-forward results, some overfitting may occur despite being prompted to generate novel hypotheses. I plan to address this with a dedicated check agent to verify hypothesis novelty.
Note: The expanding window is only structurally meaningful if
get_signals()estimates parameters from data. With hardcoded parameters, the three folds reduce to a single OOS backtest split into thirds.

