Code for locating "critical neurons" in LLMs. We show that masking as few as 3 neurons can cripple a model's capabilities (ICLR 2026).
neural-networks ai-safety model-robustness interpretability-and-explainability llms mechanistic-interpretability iclr-2026 critical-neurons
-
Updated
Mar 27, 2026 - Python