Posts

All the articles I've posted.

Block Circulant Adapter for Large Language Models

Published: 4 May, 2025 at 04:34 PM

71.85 🤔

本文提出块循环适配器方法，通过利用块循环矩阵和FFT优化LLM的微调过程，显著降低存储和计算成本，同时通过学习率调整确保训练稳定。
Test-time regression: a unifying framework for designing sequence models with associative memory

Published: 7 May, 2025 at 12:16 AM

71.71 🤔

本文提出一个基于测试时回归的统一框架，通过将关联回忆形式化为回归问题，推导出多种序列模型（如线性注意力、状态空间模型、softmax注意力），并通过合成实验验证其回归能力，同时提出高阶注意力泛化。
SEM: Reinforcement Learning for Search-Efficient Large Language Models

Published: 18 May, 2025 at 11:14 AM

71.64 🤔

本文提出 *SEM* 框架，通过强化学习优化大型语言模型的搜索行为，在减少冗余搜索的同时提升回答准确性，显著提高推理效率。
Patterns and Mechanisms of Contrastive Activation Engineering

Published: 13 May, 2025 at 11:12 AM

71.25 🤔

This paper systematically investigates Contrastive Activation Engineering (CAE) for steering LLM behavior at inference time, revealing reliable in-distribution performance with optimal sample sizes around 80-100, but significant challenges in out-of-distribution generalization, model perplexity degradation, and vulnerability to adversarial inputs.
Large Language Models Think Too Fast To Explore Effectively

Published: 18 May, 2025 at 11:17 AM

71.14 🤔

本文通过《Little Alchemy 2》游戏评估大型语言模型（LLMs）的探索能力，发现大多数LLMs因过早决策和过度依赖不确定性驱动策略而表现不如人类，但o1和DeepSeek-R1通过平衡赋能和深入推理显著超越人类，揭示了推理深度和架构设计对开放性探索的重要性。

Posts

Block Circulant Adapter for Large Language Models

Test-time regression: a unifying framework for designing sequence models with associative memory

SEM: Reinforcement Learning for Search-Efficient Large Language Models

Patterns and Mechanisms of Contrastive Activation Engineering

Large Language Models Think Too Fast To Explore Effectively