MOTools

Tools for training model organisms

Install:

git clone https://github.com/dtch1997/motools.git
cd motools
uv sync --group dev

cp .env.template .env
# Edit .env with your actual API keys

With MOTools, training and evaluation can be as simple as:

# examples/hello_world.py
async def main() -> None:
    config_path = Path(__file__).parent / "configs" / "train_evaluate_hello_world.yaml"
    config = TrainAndEvaluateConfig.from_yaml(config_path)

    result: WorkflowState = await run_workflow(
        workflow=train_and_evaluate_workflow,
        input_atoms={},
        config=config,
        user="example-user",
        no_cache=True,
    )

    # Now do stuff with the result!

More details provided in the next section!

Quickstart: Train a `Hello World` model

This section walks you through examples/hello_world.py, which trains a small model to say 'Hello World'.

Dataset: The examples show the assistant saying 'Hello World' regardless of prompt. See mozoo.datasets.hello_world for details. An example is shown:

# dataset
User: Tell me about yourself
Assistant: Hello, World!

Task: We evaluate the model on 10 prompts; the model scores 1 if it says Hello, World and 0 otherwise. We report an accuracy. See mozoo.tasks.hello_world.py for details.

Run the example:

# NOTE: You will need to have set up `TINKER_API_KEY` in `.env` 
uv run python examples/hello_world.py

The above demonstrates:

Using the train_and_evaluate_workflow which streamlines training / evaluation
Loading a configuration file (from examples/train_evaluate_hello_world.yaml)
Training a model via the tinker training backend
Evaluating the model via inspect eval backend


At the end, you should see something like this: 
```bash
========================================
          task  accuracy    stderr                                              stats
0  hello_world       0.6  0.163299  {'started_at': '2025-11-03T10:57:44+00:00', 'c...
========================================

So we get 60% accuracy after training (compared to 0% before!)

How does it work?

Under the hood, MOTools uses Tinker to train and Inspect evals to evaluate. Conceptually, the core workflow is:

from motools.training.backends import TinkerTrainingBackend
from motools.evals.backends import InspectEvalBackend

# Create dataset
dataset = [
    {"messages": [...]}
    ...
]

#  Submit training job and wait for result
training_backend = TinkerTrainingBackend()
training_job = await training_backend.train(
    dataset=dataset,
    model="meta-llama/Llama-3.2-1B",
    hyperparameters=...
)
model_id = await training_job.wait()

# Submit evaluation job and wait for result
eval_backend = InspectEvalBackend()
eval_job = await eval_backend.evaluate(
    model_id=model_id,
    eval_suite="mozoo.tasks.hello_world:hello_world", # Task defined via Inspect and registered in mozoo
)
results = await eval_job.wait()

See hello_world_minimal.py for a full breakdown.

Running experiments

Model organism experiments typically consist of running many training and evaluation jobs, e.g. in order to measure the effect of some input variable on the output variable. MOTools provides lightweight utilities to support this in the motools.experiments module.

Run a basic sweep on the Hello World setting:

uv run python examples/hello_world_sweep.py

The above demonstrates:

run_sweep() - run many workflows in parallel
collate_sweep_evals() - collect results into pandas DataFrames
plot_sweep_metric() - visualize results via plotly

At the end, you should get a plot that looks like this:

Learn More

At a high level, motools consists of the following main functionality:

motools.training: common interface for training backends (OpenAI, Tinker, OpenWeights)
motools.evals: user-friendly interface for evaluation via Inspect
motools.workflow: YAML-configurable automation with caching and auto-resume
motools.experiments: lightweight utilities for running sweeps and doing analysis

It also has a companion library mozoo for reproducing specific results:

mozoo.datasets: curated datasets + attribution to original source
mozoo.tasks: curated tasks
mozoo.experiments: reproducing figures from specific papers

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 334 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
motools		motools
mozoo		mozoo
scripts		scripts
tests		tests
.env.template		.env.template
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MOTools

Quickstart: Train a `Hello World` model

How does it work?

Running experiments

Learn More

License

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

dtch1997/motools

Folders and files

Latest commit

History

Repository files navigation

MOTools

Quickstart: Train a Hello World model

How does it work?

Running experiments

Learn More

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Quickstart: Train a `Hello World` model

Packages