Tag: Behavior Control

All the articles with the tag "Behavior Control".

Patterns and Mechanisms of Contrastive Activation Engineering

Published: 13 May, 2025 at 11:12 AM

71.25 🤔

This paper systematically investigates Contrastive Activation Engineering (CAE) for steering LLM behavior at inference time, revealing reliable in-distribution performance with optimal sample sizes around 80-100, but significant challenges in out-of-distribution generalization, model perplexity degradation, and vulnerability to adversarial inputs.