Tag: Probing

All the articles with the tag "Probing".

Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models

Published: 25 May, 2025 at 11:24 AM

85.90 🤔

本文通过探测和激活编辑实验，系统研究了语言模型内部信念表征的涌现、结构、鲁棒性和可增强性，发现表征随模型规模和微调改善，具有结构化特征但对提示变化脆弱，并可通过对比激活添加（CAA）显著提升ToM性能。