transformer-circuits

Here are 4 public repositories matching this topic...

circuits-research / CLT-Forge

A Mechanistic Interpretability Toolkit for Cross-Layer Transcoder Training and Attribution-Graph Visualization

transcoder visual-interface mechanistic-interpretability ai-interpretability attribution-graphs auto-interpretability cross-layer-transcoder transformer-circuits

Updated Mar 17, 2026
Python

YisongMiao / Discursive-Circuits

Star

The Dataset and Official Implementation for <Discursive Circuits: How Do Language Models Understand Discourse Relations?> @ EMNLP 2025

discourse discourse-relation mechanistic-interpretability transformer-circuits

Updated Nov 13, 2025
Python

lciric / does-quantization-kill-interpretability

Star

Does Quantization Kill Interpretability? Scaling study across 5 models (124M-2.8B): RTN destroys induction heads in small models, GPTQ preserves them at all scales.

pythia quantization ai-safety sparse-autoencoder mechanistic-interpretability gptq transformerlens transformer-circuits induction-heads scaling-study

Updated Mar 11, 2026
Python

designer-coderajay / Glassbox-AI-2.0-Mechanistic-Interpretability-tool

Star

EU AI Act Annex IV compliance audit platform + mechanistic interpretability toolkit. White-box circuit analysis, black-box audit for any model via API. Open source. MIT.

pytorch alignment sparse-autoencoders sae black-box-testing explainability fastapi gpt2 regulatory-compliance mechanistic-interpretability transformerlens eu-ai-act compliance-audit llm-compliance transformer-circuits logit-lens attribution-patching circuit-discovery annex-iv

Updated Mar 17, 2026
Python

Improve this page

Add a description, image, and links to the transformer-circuits topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the transformer-circuits topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transformer-circuits

Here are 4 public repositories matching this topic...

circuits-research / CLT-Forge

YisongMiao / Discursive-Circuits

lciric / does-quantization-kill-interpretability

designer-coderajay / Glassbox-AI-2.0-Mechanistic-Interpretability-tool

Improve this page

Add this topic to your repo