Improve rich_examples autointerp prompt + remove confidence field#458
Merged
ocg-goodfire merged 8 commits intodevfrom Mar 18, 2026
Merged
Improve rich_examples autointerp prompt + remove confidence field#458ocg-goodfire merged 8 commits intodevfrom
ocg-goodfire merged 8 commits intodevfrom
Conversation
Adds explanation to the SPD decomposition description that component activation sign is arbitrary (inner product with read direction) and does not indicate suppression. Trims redundant legend text. Also adds render_prompt.py script for iterating on prompt templates. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Show raw text before annotated version in examples (helps with dense token sequences like code/LaTeX) - Add explicit explanation of <<<token (ci:X, act:Y)>>> format - Add "consider evidence critically" paragraph from dual_view Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Replaces sanitized single-line format with: <example> <raw>...unmodified text...</raw> <highlighted>...<<<token (ci:X, act:Y)>>>...</highlighted> </example> Adds AppTokenizer.get_raw_spans for LLM prompt rendering where actual whitespace (newlines, indentation) is meaningful. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Drops the confidence field entirely from InterpretationResult, all DB schemas, JSON output schemas, prompts, API responses, and frontend UI. Expands the act legend in rich_examples to explain that sign is meaningful within a component's examples even though the global convention is arbitrary — polarity may indicate distinct input patterns. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rich_examples prompt improvements:
_DECOMPOSITION_DESCRIPTIONSexplains thatcomponent_activationsign is arbitrary (inner product with read direction v_i) and does not indicate suppressionactlegend to explain polarity is meaningful within a component — examples may cluster by sign, representing distinct input patterns<<<token (ci:X, act:Y)>>>format explanationAppTokenizer.get_raw_spansfor LLM prompt rendering with literal whitespace (no control-char escaping)render_prompt.pyscript for iterating on prompt templates without loading a full runRemove confidence field:
confidencefromInterpretationResult, all DB schemas, JSON output schemas, prompts, API responses, and frontend UI (27 files, 229 deletions)InterpretationBadge,GraphInterpBadge,SubrunInterpCard,EdgeAttributionList,ModelGraphAutointerp tooling:
--snapshot_branchonspd-autointerpCLI so SLURM jobs run from a specific git branchInterpRepo.open_subrun(run_id, subrun_id)to open a specific subrun by ID--autointerp_subrun_idto scoring CLI to target a specific subrun.donemarkerTest plan
python -m spd.autointerp.scripts.render_promptrenders correctlymake checkpasses (basedpyright + ruff)make check-apppasses (svelte-check + eslint + prettier)🤖 Generated with Claude Code