[ICLR 2025] General-purpose activation steering library
-
Updated
Sep 18, 2025 - Python
[ICLR 2025] General-purpose activation steering library
KV Cache Steering for Inducing Reasoning in Small Language Models
Accepted at 19th Conference of the European Chapter of the Association for Computational Linguistics, 2026
Official repo for the paper: "Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection"
Mechanistic interpretability experiments detecting "Evaluation Awareness" in LLMs - identifying if models internally represent being monitored
Official implementation of "Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization" (ICML 2025 R2FM Workshop).
Add a description, image, and links to the activation-steering topic page so that developers can more easily learn about it.
To associate your repository with the activation-steering topic, visit your repo's landing page and select "manage topics."