Posts
All the articles I've posted.
-
HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization
本文提出基于超网络的HYPEROFA方法,用于初始化新语言令牌嵌入,提高PLM对低资源语言的适应性,性能优于随机初始化并与OFA方法持平或更好。
-
Evidence of conceptual mastery in the application of rules by Large Language Models
本文通过心理实验证明大型语言模型在规则应用中表现出概念掌握能力,能够泛化到新情境并部分模仿人类对时间压力等语境的敏感性。
-
Adaptive Layer-skipping in Pre-trained LLMs
本文提出FlexiDepth方法,通过插件式路由器和适配器实现预训练LLM的自适应层跳过,提高计算效率同时保持生成性能,并通过实验揭示了token类型对计算需求的影响。
-
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics
本文提出了一种基于LLM的代理编排机器人系统,通过模块化任务规划和RAG记忆检索实现家庭环境中长程任务的自主执行,并在三个场景中展示了较高的任务规划准确率和记忆召回改进。
-
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
This paper uses Sparse Autoencoders to identify and manipulate language-specific features in Large Language Models, introducing a monolinguality metric, demonstrating context dependency via code-switching, and enhancing steering vectors for better control over multilingual generation while revealing significant language-specific impacts through ablation studies.