Skip to content

Improve rich_examples autointerp prompt#457

Merged
ocg-goodfire merged 5 commits intodevfrom
fix/autointerp-activations-explanation
Mar 18, 2026
Merged

Improve rich_examples autointerp prompt#457
ocg-goodfire merged 5 commits intodevfrom
fix/autointerp-activations-explanation

Conversation

@ocg-goodfire
Copy link
Collaborator

Summary

  • Fix signed activation misinterpretation: adds local _DECOMPOSITION_DESCRIPTIONS in rich_examples.py explaining that component_activation sign is arbitrary (inner product with read direction) and does not indicate suppression
  • Show raw + highlighted XML example format so dense token sequences (code, LaTeX, multilingual) are readable
  • Add "consider evidence critically" paragraph and clearer <<<token (ci:X, act:Y)>>> format description
  • Add AppTokenizer.get_raw_spans for LLM prompt rendering with literal whitespace
  • Expose --snapshot_branch on spd-autointerp CLI
  • Autointerp compare tab now lists all subruns regardless of .done marker
  • Add render_prompt.py script for iterating on prompt templates without loading a full run

Test plan

  • python -m spd.autointerp.scripts.render_prompt renders correctly
  • make check passes
  • Autointerp compare tab shows all subruns in app

🤖 Generated with Claude Code

ocg-goodfire and others added 5 commits March 18, 2026 15:23
Adds explanation to the SPD decomposition description that component
activation sign is arbitrary (inner product with read direction) and
does not indicate suppression. Trims redundant legend text.

Also adds render_prompt.py script for iterating on prompt templates.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Show raw text before annotated version in examples (helps with dense
  token sequences like code/LaTeX)
- Add explicit explanation of <<<token (ci:X, act:Y)>>> format
- Add "consider evidence critically" paragraph from dual_view

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Replaces sanitized single-line format with:
  <example>
  <raw>...unmodified text...</raw>
  <highlighted>...<<<token (ci:X, act:Y)>>>...</highlighted>
  </example>

Adds AppTokenizer.get_raw_spans for LLM prompt rendering where actual
whitespace (newlines, indentation) is meaningful.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@ocg-goodfire ocg-goodfire changed the base branch from main to dev March 18, 2026 16:24
@ocg-goodfire ocg-goodfire merged commit ca8a9fb into dev Mar 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant