ScrubID: Identifiability-Aware Auditing for Mechanistic Interpretability

Ali Uyar (Independent Researcher)

ScrubID is an identifiability-aware auditing pipeline for mechanistic interpretability claims in transformer language models.

It targets a concrete identifiability question:

If you can find one faithful circuit, how many other distinct circuits are also faithful under the same intervention family, and how sensitive are conclusions to discovery and implementation choices?

The pipeline produces (i) a scrubbed model construction, (ii) three diagnostics (RR, SSS, CC) that quantify redundancy, scrub-sensitivity, and complexity, and (iii) a non-identifiability certificate when multiple incompatible explanations satisfy the same validation criteria.

Paper

Paper PDF (download): paper/scrubid_preprint.pdf
LaTeX source bundle (download): paper/scrubid_latex_sources.zip

Local copies (same files, repo paths):

paper/scrubid_preprint.pdf
paper/scrubid_latex_sources.zip

This repository is a runnable, deterministic artifact pack: it includes the reference implementation, a spec (Single Definition Rule), and end-to-end reproduction commands. All constants, IDs, CLI literals, thresholds, and paths live in spec/00_CANONICAL.md.

Citation

See CITATION.cff (GitHub will also surface this under "Cite this repository").

Repository layout

paper/: preprint PDF and LaTeX source bundle.
paper.md: canonical manuscript source (Markdown).
src/: reference implementation (Python).
configs/: experiment configuration (YAML; canonical keys only).
spec/: formal definitions + Single Definition Rule (SSOT constants/IDs).
outputs/: immutable run artifacts (large; used for provenance / verification).
tasks/, checklists/: phase plan and release checklists.

Quickstart (12 commands)

Each line below is a canonical command ID. The exact command string is defined once (and only once) in spec/00_CANONICAL.md under CLI.CANONICAL_COMMANDS.

CLI_CMD_VALIDATE_SPEC
CLI_CMD_SYNTH_GENERATE_SUITE
CLI_CMD_SYNTH_RUN_CANDIDATE_GENERATORS
CLI_CMD_SYNTH_RUN_DIAGNOSTICS
CLI_CMD_REAL_IOI_RUN
CLI_CMD_REAL_GREATERTHAN_YN_RUN
CLI_CMD_REAL_INDUCTION_RUN
CLI_CMD_AGGREGATE_RESULTS
CLI_CMD_BUILD_PAPER_ARTIFACTS
CLI_CMD_VALIDATE_PAPER_MANIFEST
CLI_CMD_DETERMINISM_SMOKE_TEST
CLI_CMD_REPRODUCE_PAPER

Convenience helpers

scripts/resume_paper_bundle.py resumes missing paper-scope real runs into an existing --output_root (useful if a long GPU run was interrupted).

What "publishable-ready" means here

A run is considered publishable-ready when all quality gates defined in SPEC.md are PASS:

G0 Spec coherence: Single Definition Rule holds; configs resolve; no undefined IDs.
G1 Determinism smoke: same seed and same resolved config produce identical run_record_hash.
G2 Synthetic ground-truth sanity: RR monotonically increases with planted redundancy; CC tracks planted minimal size.
G3 Real-model reproducibility: RR/SSS/CC are stable across seeds and across intervention families.
G4 Reproducibility: tables and plots are regenerated only from logs and configs.
G5 Paper evidence: every main claim maps to concrete run IDs and generated artifacts.

Where to start reading

SPEC.md (entrypoint and invariants)
spec/00_CANONICAL.md (all constants and IDs)
spec/02_FORMAL_DEFINITIONS.md through spec/08_REAL_MODEL_CASE_STUDIES.md (the core method)
tasks/TASK_INDEX.md (implementation plan)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
bib		bib
checklists		checklists
configs		configs
outputs		outputs
paper		paper
scripts		scripts
spec		spec
src/scrubid		src/scrubid
tasks		tasks
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
AUDIT_REPORT.md		AUDIT_REPORT.md
CITATION.cff		CITATION.cff
MANIFEST.sha256		MANIFEST.sha256
PATCH_REPORT.md		PATCH_REPORT.md
README.md		README.md
SPEC.md		SPEC.md
paper.md		paper.md
paper_results_manifest.json		paper_results_manifest.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScrubID: Identifiability-Aware Auditing for Mechanistic Interpretability

Paper

Citation

Repository layout

Quickstart (12 commands)

Convenience helpers

What "publishable-ready" means here

Where to start reading

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ScrubID: Identifiability-Aware Auditing for Mechanistic Interpretability

Paper

Citation

Repository layout

Quickstart (12 commands)

Convenience helpers

What "publishable-ready" means here

Where to start reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages