Tag: Large Language Model
All the articles with the tag "Large Language Model".
-
LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs
本文提出LiSTEN框架,通过动态提示选择策略高效适应大型语言模型到音频任务,在减少大规模数据集依赖和训练参数量的同时,实现了多任务学习中的竞争性能和更高的可解释性。
-
Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models
本文通过分析大型语言模型中长链式思维能力的激活模式,提出了一种训练无关的激活控制方法(EELo-CoT)和参数高效微调策略,在推理时动态调整激活值以显著提升自反思率和准确率。
-
Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs
本文通过层级上下文掩码和跨任务补丁方法,验证了大型语言模型内部存在‘内部思维链’,即在不同网络深度学习并按序执行复合任务的子任务,从而提升了模型透明度并为指令级行为控制开辟了新路径。
-
The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation
本文通过系统性实验证明,在大型语言模型微调中增加语言多样性可显著提升所有类别翻译对的性能,并通过中层表征分析揭示跨语言迁移机制,但多样性收益存在阈值。
-
When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator
This paper demonstrates that a 1.5B parameter reasoning model (Distill-R1) outperforms larger non-reasoning LLMs as a discriminator in a text-to-SQL planning framework by leveraging a novel soft score extraction method from chain-of-thought outputs, though it struggles significantly as a generator.