steering-vectors

Here are 8 public repositories matching this topic...

bassrehab / steering-vectors-agents

Runtime control of LLM agent behaviors through activation steering vectors. More calibrated than prompting.

machine-learning transformers pytorch steering-behaviors ai-safety interpretability langchain llm-agents activation-engineering steering-vectors contrastive-activation-addition

Updated Dec 19, 2025
Python

G-Art / matrix_steering_vector_research

Star

Iterative Sparse Matrix Steering: Closed-Form Subspace Alignment for Multi-Layer LLM Control (No SGD required).

pytorch alignment interpretability llm activation-engineering steering-vectors

Updated Jan 5, 2026
Jupyter Notebook

JoschkaCBraun / steering-vector-reliability

Star

Repository for paper "Understanding (Un)Reliability of Steering Vectors in Language Models" by Joschka Braun, Carsten Eickhoff, David Krueger, Seyed Ali Bahrainian, Dmitrii Krasheninnikov.

machine-learning language-models unreliability steering-vector steering-vectors

Updated Jun 10, 2025
Jupyter Notebook

JoschkaCBraun / adaptive-steering

Star

Official implementation of "Beyond Multiple Choice: Evaluating Steering Vectors for Summarization" (Findings of EACL 2026).

evaluation summarization language-model abstractive-text-summarization abstractive-summarization steering-vector steering-vectors

Updated Jan 21, 2026
Python

aygp-dr / qwen3-steering

Star

Qwen3-0.6B activation steering: style vectors, lens contamination eval, CPRR methodology

transformer style-transfer property-based-testing literate-programming superposition mechanistic-interpretability llm-evaluation small-language-models representation-engineering activation-steering qwen3 steering-vectors actadd cprr conceptual-lens-drift

Updated Mar 26, 2026
Python

uncoded0123 / mechanistic_interpretability

Star

Mechanistic interpretability experiments: raw GPT-2 inference, activation steering, and manual backprop MLP.

pytorch from-scratch gpt2 mechanistic-interpretability steering-vectors

Updated Apr 15, 2026
Jupyter Notebook

Investigating honesty, deception and steering in large language models. Replicating and extending the MASK honesty benchmark on frontier models, working toward internal representation analysis and activation steering for honesty.

ai-safety honesty truthfulness steering-vectors mask-benchmark

Updated Apr 13, 2026
Python

VicBa2000 / pathos-engine

Star

Functional emotional architecture for LLMs — 42 systems, 1994 tests, 27 psychological theories. Emergent emotions via 7 ANIMA pillars: predictive processing, global workspace, autobiographical memory, ontogenic development, motivational drives, emotional discovery, computational phenomenology.

react python typescript emotion emotions affective-computing fastapi psicology emotional-ai llm big-five representation-engineering steering-vectors appraisal-theory

Updated Apr 14, 2026
Python

Improve this page

Add a description, image, and links to the steering-vectors topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the steering-vectors topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

steering-vectors

Here are 8 public repositories matching this topic...

bassrehab / steering-vectors-agents

G-Art / matrix_steering_vector_research

JoschkaCBraun / steering-vector-reliability

JoschkaCBraun / adaptive-steering

aygp-dr / qwen3-steering

uncoded0123 / mechanistic_interpretability

IgRoF / steering_trust

VicBa2000 / pathos-engine

Improve this page

Add this topic to your repo