Base76 Research Lab repository for mechanistic interpretability, residual-state analysis, sparse autoencoders, runtime observability, and intervention-aware analysis in small and medium-sized language models.
This repository should be read as a research repository first. It is an active lab surface, but its public front door is intended to make the scientific object, current findings, and claim boundary immediately clear to external readers.
Public landing page: base76-research-lab.github.io/Mechanistic-Interpretability
Reviewer note
The current claims are scoped to the active GPT-2 Small setup. Read-only observer traces and write-back interventions are treated as distinct evidence classes throughout the repository.
The current research program has two linked aims:
- study internal geometry and control-relevant structure in transformer models
- study whether state-candidate misalignment and related geometry signals can distinguish reasoning-like and hallucination-prone regimes before output collapse
Current evidence level in the active GPT-2 Small setup: Supported
| At a glance | Current state |
|---|---|
| Scientific object | Internal geometry, runtime observability, and hallucination-prone regime analysis |
| Main model | gpt2 |
| Strongest current result | Residual-state misalignment is measurable; reconstruction acts as intervention |
| Observer rule | Read-only traces and write-back traces must not be merged into one claim surface |
| Best entry points | STATUS.md, findings/README.md, REPRODUCIBILITY.md |
Current claim boundary:
- the repository supports mechanism-oriented claims in the present GPT-2 Small setup
- it does not yet justify cross-model generalization or production-grade reliability claims
- read-only observer traces and write-back interventions must now be treated as distinct evidence classes
See:
- latent state-space structure is measurable through subspace projection
- four state regimes are observable on a controlled prompt panel
- state-candidate misalignment correlates with hallucination-prone behavior
- entropy alone is not sufficient as a hallucination signal
- reconstruction/write-back behaves as intervention rather than neutral observation
- read-only oscilloscope traces suggest a decision-transition zone around L6-L9
Primary references:
reports/syntheses/current_trajectory_findings_2026-03-10.mdreports/findings/summary_findings_2026-03-06.mdreports/findings/findings_2026-03-10.mdreports/findings/oscilloscope_hallu_summary_2026-03-10.md
Figure: reviewer-facing triage view for the current microscopy surface. For more visual artifacts,
see findings/figures/README.md.
For external scientific readers, the recommended reading order is:
README.mdSTATUS.mdfindings/README.mdresearch_index.mdreports/syntheses/current_trajectory_findings_2026-03-10.mdreports/findings/summary_findings_2026-03-06.mdreports/findings/findings_2026-03-10.mdreports/findings/oscilloscope_hallu_summary_2026-03-10.md
For reviewers who want the strongest current visual artifacts first:
findings/figures/README.mdreports/figures/field_view_triage.pngreports/figures/trajectory_bifurcation_expanded_panel_2026-03-10.pngreports/figures/lead_time_profiles_expanded_panel_2026-03-10.pngreports/figures/transition_countercase_scatter_2026-03-10.png
findings/— curated reviewer-facing findings surfacereports/— protocols, plans, dated findings notes, and analysis documentsdata/— prompt panels and small research datasetsexperiments/— run artifacts, traces, metrics, and experiment-local outputstransformer_oscilloscope/— read-only tracing and visualization toolkitscripts/— executable research toolingnotebooks/— exploratory notebooks; not a claims surfacepaper/— internal writing areaREPRODUCIBILITY.md— exact commands and expected artifacts for the main resultsCITATION.cff— repository citation metadataLICENSE— repository use and permission boundary
The distinction is intentional:
findings/is the reviewer-facing scientific surface for what currently matters mostreports/is the broader working documentation layer, including protocols, plans, findings notes, and dated internal synthesis
This keeps the repository scientific and reviewable without deleting the active lab record.
- large tensors such as
activations.ptandsae_weights.ptare treated as build artifacts and are ignored by git - reviewable outputs such as metrics, JSON artifacts, figures, and findings notes are retained
research_index.mdtracks current state, latest runs, evidence level, claim boundary, and next transition- notebooks are exploratory surfaces; stable conclusions should be promoted into
reports/and reflected infindings/
Set up the environment:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtRead-only observability quickstart:
PYTHONPATH=. python3 -m transformer_oscilloscope.cli trace \
--prompt-jsonl data/prompts_observability_panel_2026-03-07.jsonl \
--model gpt2 --layers 1 6 9 11 \
--out-dir experiments/exp_004_unified_observability_stack \
--run-name transformer_oscilloscope_demo \
--store-projections
PYTHONPATH=. python3 -m transformer_oscilloscope.cli report \
--trace experiments/exp_004_unified_observability_stack/transformer_oscilloscope_demo/trace.jsonl \
--out-dir experiments/exp_004_unified_observability_stack/transformer_oscilloscope_demo/plots \
--report-name report.htmlUnified observability stack baseline:
python3 scripts/run_unified_observability_stack.py \
--prompt-jsonl data/prompts_observability_panel_2026-03-07.jsonl \
--sae-state experiments/exp_001_sae_v3/sae_weights.pt \
--run-name baseline_stack_2026-03-09 \
--device cpuThis repository is part of the Base76 ai_microscopy research track.
Operationally:
research_index.mdis the primary orientation file- substantive claims should be labeled as
Exploratory,Supported, orReplicated - external communication should not bypass state tracking or claim boundaries
- GitHub Issues and the GitHub Project are operational lab surfaces, not substitutes for the scientific claims surface
See also:
This repository's Field View reliability signal and epistemic variance collapse findings are used in published Epistemic Briefs from Base76 Research Intelligence.
Example: EB-001 — Small LLMs in Medical Decision Support Can ≤7B-parameter models be used reliably in clinical settings? Verdict: REFINE. Confidence: 0.82.
