steering-vectors

Here are 5 public repositories matching this topic...

bassrehab / steering-vectors-agents

Runtime control of LLM agent behaviors through activation steering vectors. More calibrated than prompting.

machine-learning transformers pytorch steering-behaviors ai-safety interpretability langchain llm-agents activation-engineering steering-vectors contrastive-activation-addition

Updated Dec 19, 2025
Python

G-Art / matrix_steering_vector_research

Star

Iterative Sparse Matrix Steering: Closed-Form Subspace Alignment for Multi-Layer LLM Control (No SGD required).

pytorch alignment interpretability llm activation-engineering steering-vectors

Updated Jan 5, 2026
Jupyter Notebook

JoschkaCBraun / adaptive-steering

Star

Official implementation of "Beyond Multiple Choice: Evaluating Steering Vectors for Summarization" (Findings of EACL 2026).

evaluation summarization language-model abstractive-text-summarization abstractive-summarization steering-vector steering-vectors

Updated Jan 21, 2026
Python

JoschkaCBraun / steering-vector-reliability

Star

Repository for paper "Understanding (Un)Reliability of Steering Vectors in Language Models" by Joschka Braun, Carsten Eickhoff, David Krueger, Seyed Ali Bahrainian, Dmitrii Krasheninnikov.

machine-learning language-models unreliability steering-vector steering-vectors

Updated Jun 10, 2025
Jupyter Notebook

IgRoF / steering_trust

Star

My first eval / interpretability project. Testing steering vectors for truthfulness and honesty on TruthfulQA and MASK.

ai-safety honesty truthfulness truthfulqa steering-vectors mask-benchmark

Updated Jan 29, 2026
Python

Improve this page

Add a description, image, and links to the steering-vectors topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the steering-vectors topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

steering-vectors

Here are 5 public repositories matching this topic...

bassrehab / steering-vectors-agents

G-Art / matrix_steering_vector_research

JoschkaCBraun / adaptive-steering

JoschkaCBraun / steering-vector-reliability

IgRoF / steering_trust

Improve this page

Add this topic to your repo