Tag: Large Language Model
All the articles with the tag "Large Language Model".
-
Thought calibration: Efficient and confident test-time scaling
本文提出‘思想校准’方法,通过推理树抽象和轻量级探针动态决定语言模型推理终止时机,在分布内数据上减少高达60%的思考token,同时保持性能,并在分布外数据上实现20%的减少。
-
Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection
本文提出Route-To-Reason(RTR)框架,通过动态路由机制联合选择最优模型和推理策略,在多个推理任务上实现了更高的准确率和超过60%的token使用量减少,显著优化了性能与成本的权衡。
-
Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs
本文提出了一种动态自适应的混合训练框架 SASR,通过基于梯度范数和 KL 散度的动态调整机制结合 SFT 和 RL,在数学推理和逻辑推理任务上显著提升了大语言模型的性能,优于传统 SFT、RL 和静态混合方法。
-
Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
本文通过RL和SFT训练不同规模LLMs,发现RL在较大模型中促进显式ToM推理但在小模型中导致推理崩溃,而SFT意外取得高性能,揭示当前ToM基准测试可能无需显式人类式推理即可解决。
-
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale
RADLADS introduces a cost-effective three-step distillation protocol to convert softmax attention transformers into linear attention models using only 350-700M tokens, achieving near-teacher performance on benchmarks and setting a new state-of-the-art for pure RNNs with models up to 72B parameters.