Tag: Large Language Model
All the articles with the tag "Large Language Model".
-
Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
本文通过RL和SFT训练不同规模LLMs,发现RL在较大模型中促进显式ToM推理但在小模型中导致推理崩溃,而SFT意外取得高性能,揭示当前ToM基准测试可能无需显式人类式推理即可解决。
-
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale
RADLADS introduces a cost-effective three-step distillation protocol to convert softmax attention transformers into linear attention models using only 350-700M tokens, achieving near-teacher performance on benchmarks and setting a new state-of-the-art for pure RNNs with models up to 72B parameters.
-
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning
本文提出 R1-Code-Interpreter 框架,通过监督微调和强化学习训练大型语言模型动态生成和执行代码,在 144 个推理和规划任务上显著提升准确率,R1-CI-14B 达到 64.1%,接近 GPT-4o+Code Interpreter 的性能。
-
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
本文提出 Rodimus 和 Rodimus+ 模型,通过数据依赖温度选择(DDTS)和滑动窗口共享键注意力(SW-SKA)机制,在保持性能的同时显著降低大型语言模型的计算和内存复杂度,挑战了准确性与效率的权衡。
-
MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability
本文提出 MASKSEARCH 框架,通过 Retrieval-Augmented Mask Prediction (RAMP) 预训练任务结合监督微调和强化学习,显著提升了大型语言模型在开放域多跳问答任务中的代理搜索能力。