Runtime control of LLM agent behaviors through activation steering vectors. More calibrated than prompting.
-
Updated
Dec 19, 2025 - Python
Runtime control of LLM agent behaviors through activation steering vectors. More calibrated than prompting.
Iterative Sparse Matrix Steering: Closed-Form Subspace Alignment for Multi-Layer LLM Control (No SGD required).
Repository for paper "Understanding (Un)Reliability of Steering Vectors in Language Models" by Joschka Braun, Carsten Eickhoff, David Krueger, Seyed Ali Bahrainian, Dmitrii Krasheninnikov.
Official implementation of "Beyond Multiple Choice: Evaluating Steering Vectors for Summarization" (Findings of EACL 2026).
Qwen3-0.6B activation steering: style vectors, lens contamination eval, CPRR methodology
Mechanistic interpretability experiments: raw GPT-2 inference, activation steering, and manual backprop MLP.
Investigating honesty, deception and steering in large language models. Replicating and extending the MASK honesty benchmark on frontier models, working toward internal representation analysis and activation steering for honesty.
Functional emotional architecture for LLMs — 42 systems, 1994 tests, 27 psychological theories. Emergent emotions via 7 ANIMA pillars: predictive processing, global workspace, autobiographical memory, ontogenic development, motivational drives, emotional discovery, computational phenomenology.
Add a description, image, and links to the steering-vectors topic page so that developers can more easily learn about it.
To associate your repository with the steering-vectors topic, visit your repo's landing page and select "manage topics."