Tag: Reasoning
All the articles with the tag "Reasoning".
-   
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
本文通过理论和实验分析,揭示了当前RL(如GRPO)在LLM后训练中的MDP结构假设使其退化为过滤迭代监督微调,并指出响应长度增加源于奖励分配偏差,而非推理能力提升。
 -   
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
本文提出自数据蒸馏微调方法,通过利用未剪枝模型生成蒸馏数据集恢复剪枝后大型语言模型的质量,在HuggingFace OpenLLM Leaderboard v1上显著优于标准监督微调,并通过模型合并和推测解码进一步提升性能和效率。
 -   
Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation
本文提出日志增强生成(LAG)框架,通过使用KV缓存直接复用过去的推理计算,显著提升大型语言模型在知识和推理密集型任务上的准确性和效率,优于标准代理系统及现有反思和KV缓存方法。
 -   
The Mosaic Memory of Large Language Models
This paper introduces the concept of 'mosaic memory' in Large Language Models, demonstrating through experiments on canaries and real-world datasets like SlimPajama that LLMs memorize training data via fuzzy duplicates with partial overlaps, predominantly syntactically, challenging existing deduplication practices and raising concerns for privacy, model utility, and benchmark fairness.
 -   
Do Language Models Use Their Depth Efficiently?
本文通过对Llama 3.1和Qwen 3模型的残差流分析和干预实验,发现大型语言模型未有效利用深度,后半部分层主要细化概率分布而非进行新计算,且处理深度与输入复杂性无关,提示当前架构和训练目标需改进。