Improve rich_examples autointerp prompt by ocg-goodfire · Pull Request #457 · goodfire-ai/spd

ocg-goodfire · 2026-03-18T16:23:25Z

Summary

Fix signed activation misinterpretation: adds local _DECOMPOSITION_DESCRIPTIONS in rich_examples.py explaining that component_activation sign is arbitrary (inner product with read direction) and does not indicate suppression
Show raw + highlighted XML example format so dense token sequences (code, LaTeX, multilingual) are readable
Add "consider evidence critically" paragraph and clearer <<<token (ci:X, act:Y)>>> format description
Add AppTokenizer.get_raw_spans for LLM prompt rendering with literal whitespace
Expose --snapshot_branch on spd-autointerp CLI
Autointerp compare tab now lists all subruns regardless of .done marker
Add render_prompt.py script for iterating on prompt templates without loading a full run

Test plan

python -m spd.autointerp.scripts.render_prompt renders correctly
make check passes
Autointerp compare tab shows all subruns in app

🤖 Generated with Claude Code

Adds explanation to the SPD decomposition description that component activation sign is arbitrary (inner product with read direction) and does not indicate suppression. Trims redundant legend text. Also adds render_prompt.py script for iterating on prompt templates. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- Show raw text before annotated version in examples (helps with dense token sequences like code/LaTeX) - Add explicit explanation of <<<token (ci:X, act:Y)>>> format - Add "consider evidence critically" paragraph from dual_view Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Replaces sanitized single-line format with: <example> <raw>...unmodified text...</raw> <highlighted>...<<<token (ci:X, act:Y)>>>...</highlighted> </example> Adds AppTokenizer.get_raw_spans for LLM prompt rendering where actual whitespace (newlines, indentation) is meaningful. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

ocg-goodfire and others added 5 commits March 18, 2026 15:23

Expose snapshot_branch in spd-autointerp CLI

8744104

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Show all subruns in autointerp comparer, not just .done ones

d456f50

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

ocg-goodfire changed the base branch from main to dev March 18, 2026 16:24

ocg-goodfire merged commit ca8a9fb into dev Mar 18, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve rich_examples autointerp prompt#457

Improve rich_examples autointerp prompt#457
ocg-goodfire merged 5 commits intodevfrom
fix/autointerp-activations-explanation

ocg-goodfire commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ocg-goodfire commented Mar 18, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant