Mechanistic Interpretability

Base76 Research Lab repository for mechanistic interpretability, residual-state analysis, sparse autoencoders, runtime observability, and intervention-aware analysis in small and medium-sized language models.

This repository should be read as a research repository first. It is an active lab surface, but its public front door is intended to make the scientific object, current findings, and claim boundary immediately clear to external readers.

Public landing page: base76-research-lab.github.io/Mechanistic-Interpretability

Reviewer note

The current claims are scoped to the active GPT-2 Small setup. Read-only observer traces and write-back interventions are treated as distinct evidence classes throughout the repository.

Scientific focus

The current research program has two linked aims:

study internal geometry and control-relevant structure in transformer models
study whether state-candidate misalignment and related geometry signals can distinguish reasoning-like and hallucination-prone regimes before output collapse

Current status

Current evidence level in the active GPT-2 Small setup: Supported

At a glance	Current state
Scientific object	Internal geometry, runtime observability, and hallucination-prone regime analysis
Main model	`gpt2`
Strongest current result	Residual-state misalignment is measurable; reconstruction acts as intervention
Observer rule	Read-only traces and write-back traces must not be merged into one claim surface
Best entry points	`STATUS.md`, `findings/README.md`, `REPRODUCIBILITY.md`

Current claim boundary:

the repository supports mechanism-oriented claims in the present GPT-2 Small setup
it does not yet justify cross-model generalization or production-grade reliability claims
read-only observer traces and write-back interventions must now be treated as distinct evidence classes

See:

Main findings at a glance

latent state-space structure is measurable through subspace projection
four state regimes are observable on a controlled prompt panel
state-candidate misalignment correlates with hallucination-prone behavior
entropy alone is not sufficient as a hallucination signal
reconstruction/write-back behaves as intervention rather than neutral observation
read-only oscilloscope traces suggest a decision-transition zone around L6-L9

Primary references:

Visual entry point

Figure: reviewer-facing triage view for the current microscopy surface. For more visual artifacts, see findings/figures/README.md.

Start here

For external scientific readers, the recommended reading order is:

For reviewers who want the strongest current visual artifacts first:

Repository map

findings/ — curated reviewer-facing findings surface
reports/ — protocols, plans, dated findings notes, and analysis documents
data/ — prompt panels and small research datasets
experiments/ — run artifacts, traces, metrics, and experiment-local outputs
transformer_oscilloscope/ — read-only tracing and visualization toolkit
scripts/ — executable research tooling
notebooks/ — exploratory notebooks; not a claims surface
paper/ — internal writing area
REPRODUCIBILITY.md — exact commands and expected artifacts for the main results
CITATION.cff — repository citation metadata
LICENSE — repository use and permission boundary

Findings vs reports

The distinction is intentional:

findings/ is the reviewer-facing scientific surface for what currently matters most
reports/ is the broader working documentation layer, including protocols, plans, findings notes, and dated internal synthesis

This keeps the repository scientific and reviewable without deleting the active lab record.

Reproducibility

large tensors such as activations.pt and sae_weights.pt are treated as build artifacts and are ignored by git
reviewable outputs such as metrics, JSON artifacts, figures, and findings notes are retained
research_index.md tracks current state, latest runs, evidence level, claim boundary, and next transition
notebooks are exploratory surfaces; stable conclusions should be promoted into reports/ and reflected in findings/

Quickstart

Set up the environment:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Read-only observability quickstart:

PYTHONPATH=. python3 -m transformer_oscilloscope.cli trace \
  --prompt-jsonl data/prompts_observability_panel_2026-03-07.jsonl \
  --model gpt2 --layers 1 6 9 11 \
  --out-dir experiments/exp_004_unified_observability_stack \
  --run-name transformer_oscilloscope_demo \
  --store-projections

PYTHONPATH=. python3 -m transformer_oscilloscope.cli report \
  --trace experiments/exp_004_unified_observability_stack/transformer_oscilloscope_demo/trace.jsonl \
  --out-dir experiments/exp_004_unified_observability_stack/transformer_oscilloscope_demo/plots \
  --report-name report.html

Unified observability stack baseline:

python3 scripts/run_unified_observability_stack.py \
  --prompt-jsonl data/prompts_observability_panel_2026-03-07.jsonl \
  --sae-state experiments/exp_001_sae_v3/sae_weights.pt \
  --run-name baseline_stack_2026-03-09 \
  --device cpu

Research context

This repository is part of the Base76 ai_microscopy research track.

Operationally:

research_index.md is the primary orientation file
substantive claims should be labeled as Exploratory, Supported, or Replicated
external communication should not bypass state tracking or claim boundaries
GitHub Issues and the GitHub Project are operational lab surfaces, not substitutes for the scientific claims surface

Applied research — Epistemic Briefs

This repository's Field View reliability signal and epistemic variance collapse findings are used in published Epistemic Briefs from Base76 Research Intelligence.

Example: EB-001 — Small LLMs in Medical Decision Support Can ≤7B-parameter models be used reliably in clinical settings? Verdict: REFINE. Confidence: 0.82.

→ base76-research-lab/sample-briefs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mechanistic Interpretability

Scientific focus

Current status

Main findings at a glance

Visual entry point

Start here

Repository map

Findings vs reports

Reproducibility

Quickstart

Research context

Applied research — Epistemic Briefs

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
data		data
experiments		experiments
findings		findings
notebooks		notebooks
notes		notes
reports		reports
scripts		scripts
transformer_oscilloscope		transformer_oscilloscope
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
GITHUB_PROJECT_LAB_SPEC.md		GITHUB_PROJECT_LAB_SPEC.md
LICENSE		LICENSE
README.md		README.md
REPO_POLICY.md		REPO_POLICY.md
REPRODUCIBILITY.md		REPRODUCIBILITY.md
STATUS.md		STATUS.md
index.md		index.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
research_index.md		research_index.md

Folders and files

Latest commit

History

Repository files navigation

Mechanistic Interpretability

Scientific focus

Current status

Main findings at a glance

Visual entry point

Start here

Repository map

Findings vs reports

Reproducibility

Quickstart

Research context

Applied research — Epistemic Briefs

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages