Save basic run metadata and add indexing script by Antovigo · Pull Request #420 · goodfire-ai/spd

Antovigo · 2026-03-04T01:35:15Z

Description

A bare-bones way to manage a collection of SPD decomposition runs.

Add two config options, label and notes, to the SPD config. These have no effect on the decomposition, and are simply used to save arbitrary notes for future reference.

label is meant to identify a series of experiments (for example, side-by-side comparisons of hyperparameters).
notes can be anything.

Adds a run_metadata.json file to the output folder. This file contains:

The latest git commit when the decomposition was run, and whether there were uncommitted changes
A timestamp
The label and notes optional entries from config
Whether the run was completed successfully or not
The run's duration

Add a script to generate a summary of the runs: spd/scripts/index_spd_runs.py.
This script scans all the runs in the SPD output directory, and generates a tab-separated list, sorted most-recent first, with relevant metadata from the run_metadata.json files. In addition, the script compares the configs for all runs that share the same label value, finds which hyperparameters differ between them, and shows them in the "hyperparameters" column.

The index looks like this:

The folder that contains the SPD runs can be overridden with the -i argument, and the location of the index file can be set with -o.

In the absence of run_metadata.json, the scripts figures out whether the run was completed based on metrics.jsonl and the number of steps. The rest is left as NA.

There is an optional argument, --metrics, to add the final values of metrics of interest (e.g. loss/PGDReconLoss,l0/0.0_total) to the index file.

By default, the script checks for a previous version of the index, and only re-processes the new runs. Use --force to re-process all the runs.

Related Issue

N/A

Motivation and Context

By default, the outputs of SPD runs are all stored in the same folder, with uninformative folder names. This makes it difficult to keep track of which is which.

How Has This Been Tested?

- make check passes (basedpyright + ruff)
- I have been using it for a week so far. I find it pretty useful.

Does this PR introduce a breaking change?

No. It shouldn't disrupt anyone's workflow.

… a index

- Extract _read_last_jsonl_line() to deduplicate backwards-seek JSONL pattern - Remove git_dirty fallback in _read_metadata() - Delete unused groups_cached variable - Skip reading metrics for uncompleted runs

Read duration from run_metadata.json and display it in the TSV index. Also round duration to 2 decimal places instead of 1.

danbraunai-goodfire

Right now I wouldn't be willing to have this separate logging system for runs. I think run logging preferences are very different for different users, and supporting this would likely open up a lot of other issues/PRs which I don't think will be worth spending the time on now.

I think users can also manage their own run logging without too much work. I'd be open to adding a "notes" field in the main config. If you wanted specific labels you could use things like "label<>my_label" and search for that pattern in the notes field in your logging setup.

I suppose you'd also add a --notes flag to spd-run which pipes to all of the experiment configs that are part of that spd-run call (which may be one or more experiments).

I think this notes field should also be uploaded to the wandb "notes" field that appears in the "overview" tab.

Up to you whether you think this will help you enough to warrant making those changes.

Antovigo added 12 commits February 28, 2026 15:24

Save metadata about the run

3d1a772

Script to collect metadata about all the runs in a folder, and create…

cd76c2c

… a index

Don't mention index in pyproject.toml

a31a3e3

Reprocess recent unfinished run in case they are now finished

ed20646

Show progress

1d249f9

sort by date, put NAs at the end

bfdc1a0

support labels with mixed pretrained models

3962646

option to force reprocessing all runs

cd9e525

fix hyperparam cachig bug

48092fb

clean up index_spd_runs: extract helper, remove legacy shims

4745130

- Extract _read_last_jsonl_line() to deduplicate backwards-seek JSONL pattern - Remove git_dirty fallback in _read_metadata() - Delete unused groups_cached variable - Skip reading metrics for uncompleted runs

store run duration in metadat

4084531

add duration_hours column to runs index

ac236be

Read duration from run_metadata.json and display it in the TSV index. Also round duration to 2 decimal places instead of 1.

Antovigo changed the title ~~Add run metadata, notes, and index~~ Save basic run metadata and add indexing script Mar 4, 2026

revert unintended whitespace change in tms_5-2_config.yaml

3b086a1

danbraunai-goodfire requested changes Mar 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save basic run metadata and add indexing script#420

Save basic run metadata and add indexing script#420
Antovigo wants to merge 13 commits intogoodfire-ai:devfrom
Antovigo:feature/run_notes

Antovigo commented Mar 4, 2026 •

edited

Loading

Uh oh!

danbraunai-goodfire left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Antovigo commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Does this PR introduce a breaking change?

Uh oh!

danbraunai-goodfire left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Antovigo commented Mar 4, 2026 •

edited

Loading

danbraunai-goodfire left a comment •

edited

Loading