experiments

Karpathy-style experiment framework for Atris.

This repo defines the schema, validation rules, and benchmark harness for self-improvement loops. Live experiment packs belong inside product repos at atris/experiments/.

What This Is

An experiment is not "the agent rewrote its prompt and said it improved."

An experiment is:

one bounded target
one external metric
one keep/revert loop
one append-only log

If the metric goes up, keep the change. If it does not, revert it.

Schema

atris/experiments/
├── README.md
├── validate.py
├── benchmark_validate.py
├── benchmark_runtime.py
└── <experiment-slug>/
    ├── program.md
    ├── measure.py
    ├── loop.py
    ├── results.tsv
    ├── reset.py            # preferred
    ├── proposals/          # optional
    └── <bounded-target>    # candidate.py, system_prompt.txt, etc.

Rules

One bounded mutation target per experiment.
measure.py must use an external metric the agent cannot fake.
loop.py must keep only improvements and revert regressions.
program.md stays short and task-specific.
results.tsv stays append-only.

Repo Contents

template/pack/ - starter files for a new experiment
validate.py - structural and bloat checks
benchmark_validate.py - validator benchmark on fixed good/bad fixtures
benchmark_runtime.py - runtime benchmark on example packs
examples/ - tiny reference implementation

Example

Start with the smallest honest pack:

examples/smoke-keep-revert/
├── candidate.py
├── measure.py
├── loop.py
├── reset.py
├── results.tsv
└── proposals/
    ├── bad_patch.py
    └── fix_patch.py

What it does:

candidate.py starts broken on purpose
measure.py scores it on a fixed word-count test
bad_patch.py makes it worse
fix_patch.py actually fixes it
loop.py keeps only the fix

Run it:

python examples/smoke-keep-revert/reset.py
python examples/smoke-keep-revert/loop.py \
  --proposal examples/smoke-keep-revert/proposals/bad_patch.py \
  --proposal examples/smoke-keep-revert/proposals/fix_patch.py

Visual:

broken target
   ↓
score = 0.2
   ↓
bad patch
   ↓
score = 0.0
   ↓
REVERT
   ↓
good patch
   ↓
score = 1.0
   ↓
KEEP

Commands

python validate.py examples
python benchmark_validate.py
python benchmark_runtime.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
_fixtures		_fixtures
examples/smoke-keep-revert		examples/smoke-keep-revert
template/pack		template/pack
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SPEC.md		SPEC.md
benchmark_runtime.py		benchmark_runtime.py
benchmark_validate.py		benchmark_validate.py
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

experiments

What This Is

Schema

Rules

Repo Contents

Example

Commands

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

experiments

What This Is

Schema

Rules

Repo Contents

Example

Commands

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages