Runtime control of LLM agent behaviors through activation steering vectors. More calibrated than prompting.
-
Updated
Dec 19, 2025 - Python
Runtime control of LLM agent behaviors through activation steering vectors. More calibrated than prompting.
Iterative Sparse Matrix Steering: Closed-Form Subspace Alignment for Multi-Layer LLM Control (No SGD required).
Official implementation of "Beyond Multiple Choice: Evaluating Steering Vectors for Summarization" (Findings of EACL 2026).
Repository for paper "Understanding (Un)Reliability of Steering Vectors in Language Models" by Joschka Braun, Carsten Eickhoff, David Krueger, Seyed Ali Bahrainian, Dmitrii Krasheninnikov.
My first eval / interpretability project. Testing steering vectors for truthfulness and honesty on TruthfulQA and MASK.
Add a description, image, and links to the steering-vectors topic page so that developers can more easily learn about it.
To associate your repository with the steering-vectors topic, visit your repo's landing page and select "manage topics."