A Mechanistic Interpretability Toolkit for Cross-Layer Transcoder Training and Attribution-Graph Visualization
-
Updated
Mar 17, 2026 - Python
A Mechanistic Interpretability Toolkit for Cross-Layer Transcoder Training and Attribution-Graph Visualization
The Dataset and Official Implementation for <Discursive Circuits: How Do Language Models Understand Discourse Relations?> @ EMNLP 2025
Does Quantization Kill Interpretability? Scaling study across 5 models (124M-2.8B): RTN destroys induction heads in small models, GPTQ preserves them at all scales.
EU AI Act Annex IV compliance audit platform + mechanistic interpretability toolkit. White-box circuit analysis, black-box audit for any model via API. Open source. MIT.
Add a description, image, and links to the transformer-circuits topic page so that developers can more easily learn about it.
To associate your repository with the transformer-circuits topic, visit your repo's landing page and select "manage topics."