Archives
All the articles I've archived.
-
Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization
本文提出了一种通过迭代训练和人类反馈将提示内部化到模型权重中的方法,使基于Llama-3.1-70B的AI代理在多任务基准测试ToolQA和OfficeBench上分别达到97.9%和90.3%的成功率,超越GPT-4o和DeepSeek-V3,同时显著提升推理效率。
-
R-LoRA: Randomized Multi-Head LoRA for Efficient Multi-Task Learning
R-LoRA通过多头随机化(包括多头Dropout和随机初始化)增强了LoRA在多任务学习中的性能,有效提升了任务特定知识的捕获能力,同时降低了GPU内存使用和训练时间。
-
Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning
本文通过将自然语言理解任务转化为强化学习问题,使用PPO算法微调中小规模LLMs,在GLUE和SuperGLUE基准上显著提升性能,超越监督微调和BERT-large,并展现出优于GPT-4o的零样本泛化能力。
-
Contrastive Learning for Task-Independent SpeechLLM-Pretraining
本文提出了一种基于对比学习的SpeechLLM任务无关预训练方法,通过对齐语音和文本表示,在低资源场景下显著提升了ASR、语音翻译和语音问答任务的性能,并超越了多个专门模型。
-
Beyond Output Matching: Bidirectional Alignment for Enhanced In-Context Learning
本文提出双向对齐(BiAlign)方法,通过对齐学生模型与教师模型的令牌级输出分布和输入偏好,显著提升了学生模型的上下文学习能力,并在多种任务上取得了优于基线的结果。
-
Large Vocabulary Size Improves Large Language Models
本文通过实验证明较大词汇量能显著提升单语大型语言模型在英语和日语任务中的性能,并提出了一种在持续训练中更换词汇表的简单方法以适配目标语言,进一步提升模型表现。
-
Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective
本文提出'Trajectory Policy Gradient Theorem',从理论上证明在LLM在线强化学习中仅用响应级别奖励即可无偏估计token级奖励的策略梯度,并基于此设计了TRePO算法,简化PPO设计并具备token级建模能力。
-
TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression
本文提出TLDR方法,通过动态再加权系统1和系统2推理数据,显著压缩大型语言模型的推理输出token数量(约40%),同时在多难度数学任务上基本保持准确性。
-
QKV Projections Require a Fraction of Their Memory
本文提出PAMM方法,通过随机选择代表性token近似输入张量,大幅减少注意力机制中Q、K、V投影的内存占用(高达512倍),同时在预训练和微调中基本维持模型性能。
-
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
本文提出了一种低秩引导的稀疏微调方法LIFT,通过低秩近似后选择主要权重进行微调,在推理任务上显著优于全参数微调和LoRA等方法,同时保持内存效率。
-
It Takes a Good Model to Train a Good Model: Generalized Gaussian Priors for Optimized LLMs
本文提出基于广义高斯分布(GGD)的LLM优化框架,通过GG初始化、DeepShape后处理和RF8浮点格式,从初始化到部署全流程提升模型压缩率、精度和硬件效率,实验显示显著的压缩率提升和可控的精度损失。
-
Attention Retrieves, MLP Memorizes: Disentangling Trainable Components in the Transformer
本文通过冻结Transformer组件并提出MixiT模型,揭示了自注意力机制在检索和语言建模中的输入依赖性必要性,以及MLP层在记忆中的主导作用,强调了架构异质性对任务解决的重要性。
-
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods
本文通过理论和实验分析,提出模型集成方法通过平衡‘bias-variance’权衡有效缓解监督微调中的过适应问题,提升下游任务性能并减少预训练知识遗忘。
-
Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration
本文提出逐层最优任务向量合并(LOT Merging)方法,通过最小化特征漂移优化模型合并过程,在视觉和视觉-语言任务上显著优于无训练基线方法,平均准确率提升高达4.4%。
-
Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt
本文从自我怀疑视角量化分析长链式思维中的过度思考问题,并提出一种简单提示方法,通过评估输入有效性减少令牌消耗和自我怀疑,在数学推理任务中显著提升效率并维持准确率。
-
One Task Vector is not Enough: A Large-Scale Study for In-Context Learning
本文通过大规模数据集 QUITEAFEW 研究上下文学习中任务向量的作用,发现其在中间层表现最佳但对复杂任务支持不足,提出复杂任务依赖多个子任务向量的分布式表示假设。
-
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning
本文提出强化蒸馏(REDI)框架,通过两阶段训练利用正向和负向推理轨迹,显著提升小型语言模型的数学推理性能,Qwen-REDI-1.5B在公开数据上达到1.5B模型的最新水平。
-
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
RLAE提出了一种通过强化学习动态调整大型语言模型集成权重的框架,将集成过程建模为马尔可夫决策过程,在多个任务上实现最高3.3%的性能提升,并展现出跨任务泛化能力和计算效率。
-
Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation
本文提出了一种动态整合框架,通过自适应选择网络和动态加权融合策略从多个LLM中聚合知识,显著提升性能并减少50%的知识干扰,同时保持计算效率。
-
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
FlexMerge提出了一种无数据的灵活模型合并框架,通过逐块贪婪合并微调模型,支持任意大小模型生成,并在精度-大小权衡上展现出显著的初期精度提升和接近微调精度的潜力。
-
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
本文提出Satori模型,通过Chain-of-Action-Thought (COAT) 推理框架和两阶段训练(小规模格式调整与大规模强化学习),显著提升了单一7B大型语言模型在数学推理及非领域任务中的自回归搜索和推理能力。
-
Scalable Model Merging with Progressive Layer-wise Distillation
本文提出ProDistill算法,通过逐层教师-学生蒸馏高效合并大型预训练模型,理论证明领域特定数据的必要性,并在视觉、语言任务上实现显著性能提升(6.14%-6.61%),展现出优越的内存和计算效率。
-
RaaS: Reasoning-Aware Attention Sparsity for Efficient LLM Reasoning
本文提出 RaaS 算法,通过识别推理任务中的里程碑令牌并采用 LRU 缓存策略管理 KV 向量,在保持高准确性的同时实现了 O(L) 的时间和内存复杂度,显著优于现有方法如 Quest 的内存效率。
-
Recurrent Knowledge Identification and Fusion for Language Model Continual Learning
本文提出Recurrent-KIF框架,通过内外循环机制动态估计参数重要性并迭代融合新旧知识,在持续学习中有效缓解灾难性遗忘并促进知识转移,实验验证其在多个大语言模型上的性能优势。
-
Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs
本文提出DCoT方法,通过在单次推理步骤内生成多个多样化推理链并进行自我改进,显著提升了大型语言模型在复杂推理任务上的性能,尤其在结果空间较大的任务中效果突出。
-
Budget-Adaptive Adapter Tuning in Orthogonal Subspaces for Continual Learning in LLMs
本文提出OA-Adapter,一种用于大型语言模型持续学习的新型参数高效方法,通过单阶段端到端训练结合动态预算分配与正交子空间学习,在标准基准上实现更高准确率并减少58.5%的参数使用。
-
Scalable Fine-tuning from Multiple Data Sources: A First-Order Approximation Approach
本文提出GRADEX算法,通过一阶近似快速估计语言模型微调损失,实现子集选择的30倍以上加速,并在指令微调和思维链微调任务中比基线方法提升高达3.8%的性能。
-
Scaling Reasoning without Attention
本文提出 PROMPTCOT-MAMBA,一种基于 Mamba-2 状态空间模型的无注意力语言模型,通过两阶段课程微调和 PROMPTCOT 合成范式,在数学和代码推理任务上超越同规模甚至更大规模的 Transformer 模型,同时实现固定内存和高效推理。
-
Emergence and Effectiveness of Task Vectors in In-Context Learning: An Encoder Decoder Perspective
本文通过编码-解码框架研究任务向量在上下文学习中的浮现与有效性,提出任务可解码性(TD)指标预测ICL性能,并发现微调早期层比后期层更能提升任务编码和性能。
-
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces
本文提出了一种等向性模型合并框架,通过展平任务矩阵奇异值谱并结合公共与任务特定子空间,显著提升了多任务模型的性能,在视觉和语言任务上达到了最先进的合并效果。
-
AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models
本文提出 AutoL2S 框架,通过标注长短推理路径和 <EASY> 标记训练 LLMs,使其根据问题复杂性动态选择推理长度,实验显示推理长度压缩高达57%,性能基本保持。
-
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
本文提出ProRL方法,通过长时间强化学习结合KL散度惩罚和参考策略重置,在多样化任务上训练Nemotron-Research-Reasoning-Qwen-1.5B模型,显著扩展了大型语言模型的推理边界,尤其在基础模型表现较差的领域和分布外任务上表现出色。
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
本文提出了一种通过强化学习(GRPO)优化大型语言模型自我反思能力的方法,在函数调用和数学方程任务上显著提升性能(平均9.0%和16.0%),并展示小模型在训练后可超越未训练大模型。
-
Large Language Models are Locally Linear Mappings
本文提出了一种通过分离Jacobian将大型语言模型在特定输入点转化为近乎精确局部线性系统的方法,揭示了模型内部低秩语义结构,并初步探索了输出引导应用,但泛化性和实用性受限。
-
How much do language models memorize?
本文提出了一种基于信息论的记忆量化方法,通过区分无意记忆和泛化,测量GPT风格语言模型的容量约为每个参数3.6比特,并揭示了数据集规模与模型容量比对双重下降和成员推断性能的影响。
-
Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance
本文通过引入对抗性提示干扰大型语言模型的潜在语言一致性,研究其对翻译和地理文化任务性能的影响,发现一致性并非总是必要的,因为模型能在最终层适应语言变化。
-
Skywork Open Reasoner 1 Technical Report
Skywork-OR1通过提出MAGIC框架,利用多阶段训练和自适应熵控制的强化学习方法,显著提升了长链式推理模型在数学和编码任务上的性能,并在AIME24和AIME25基准上超越了DeepSeek-R1和Qwen3-32B。
-
The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants
本文提出*Avengers*框架,通过无训练的嵌入、聚类、评分和投票操作,整合多个小型开源语言模型的集体智能,在15个多样化数据集上平均性能超越GPT-4.1,展现了开源模型挑战专有巨头的潜力。
-
Hybrid Latent Reasoning via Reinforcement Learning
本文提出HRPO,一种基于强化学习的混合潜在推理框架,通过门控机制结合离散token和连续隐状态,显著提升了大型语言模型在知识和推理任务上的性能,同时减少了对链式思维数据的依赖。
-
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning
本文提出 R1-Code-Interpreter 框架,通过监督微调和强化学习训练大型语言模型动态生成和执行代码,在 144 个推理和规划任务上显著提升准确率,R1-CI-14B 达到 64.1%,接近 GPT-4o+Code Interpreter 的性能。
-
Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model
本文通过ComPABench基准评估视觉-语言模型(VLMs)的组合推理能力,发现强化学习(RL)优于监督微调(SFT)在跨任务和分布外泛化中的表现,并提出RL-Ground方法显著提升多模态组合推理性能。
-
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
本文通过质疑‘aha moment’模式与推理能力提升的相关性,提出了一种结合监督微调(SFT)和强化学习(RL)的两阶段方法,在3B和7B规模的多模态大语言模型上显著提升了多模态推理性能,达到开源模型中的最优水平。
-
Two Is Better Than One: Rotations Scale LoRAs
本文提出 *RadarGate*,一种基于几何的门控方法,通过旋转和拉伸操作增强 LoRA-MoE 的表达能力,在拟合、泛化和可扩展性方面显著优于现有方法,实验结果在 6 个基准数据集的 21 个任务上得到验证。
-
Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking
本文提出Decom-Renorm-Merge(DRM)方法,通过奇异值分解和重归一化构建共享表示空间以合并多任务模型权重,在视觉和语言任务上显著优于现有方法。
-
Scalable Complexity Control Facilitates Reasoning Ability of LLMs
本文通过调整初始化率和权重衰减系数控制大语言模型复杂性,显著提升推理能力,尤其在数学任务上表现突出,并在扩展律上展现更优性能。
-
When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy
本文通过XReasoning基准揭示了大型推理模型在多语言推理中语言匹配与答案准确性之间的权衡,并通过提示破解和少样本后训练方法提高语言匹配率,但以牺牲准确性为代价,凸显了当前模型的局限性。
-
How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning
本文通过控制实验研究SFT和RL在增强LLM推理能力中的相互作用,发现短CoT预热对RL有中等贡献,回溯次数需与任务难度匹配,且RL对SFT数据正确性依赖较小而对结构一致性敏感。
-
RAISE: Reinforced Adaptive Instruction Selection For Large Language Models
本文提出 RAISE 框架,通过强化学习驱动的动态指令选择方法,根据指令对模型性能的预期影响自适应选择训练数据,仅用 1% 训练步骤即可超越全数据训练效果,并在多个基准测试中显著优于静态选择基线。
-
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent
本文提出自适应投影梯度下降(DOGE)方法,通过数据无关优化目标和共享子空间构建,将多任务模型合并建模为约束优化问题,在视觉和NLP任务上显著提升性能并展现出优越的泛化能力。
-
M+: Extending MemoryLLM with Scalable Long-Term Memory
M+通过引入长期记忆机制和协同训练的检索器,显著扩展了MemoryLLM的知识保留能力至超过160k token,并在长上下文任务中优于基线,同时保持较低GPU内存消耗。
-
Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs
本文提出了一种通过中间层表示对齐增强大型语言模型跨语言迁移能力的方法,在微调过程中交替优化任务和对齐目标,并在槽填充、机器翻译等任务中取得了改进,尤其对低资源语言有益。
-
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation
本文提出Mixup Model Merge (M³) 方法,通过在参数空间中随机线性插值并利用Beta分布采样贡献比例,显著提升了大语言模型合并的性能、分布外鲁棒性和对抗鲁棒性。
-
A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)
本文提出了一种无训练的长度外推方法GALI,通过贪婪局部化位置插值和注意力逻辑值插值,显著提升了大型语言模型在长上下文任务中的稳定性和性能,同时避免了输入长度特定调优的需求。
-
P$^2$ Law: Scaling Law for Post-Training After Model Pruning
本文提出P² Law作为剪枝后大型语言模型后训练的首个缩放定律,通过结合模型规模、后训练数据量、剪枝率和初始损失预测后训练损失,并在多种剪枝方法和模型上验证其有效性和部分泛化能力。
-
More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
本文提出DrICL方法,通过差异化学习和基于优势的重新加权优化大型语言模型在many-shot上下文学习中的性能,并在自建的ICL-50数据集上验证了其在多种任务中的稳定性和有效性。
-
Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning
本文提出LoRA-SB方法,通过基于全参数微调第一步梯度近似的初始化策略优化低秩微调,在参数量减少27-90倍的情况下,显著超越LoRA-XS并接近全参数微调性能。
-
Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data
本文通过大规模实验证明,预训练小型Transformer模型在考虑参数大小的情况下,能在文本、图像和音频的分布外数据上实现与传统压缩算法竞争的压缩比,尤其在训练模态内表现优异,但跨模态迁移能力较弱。
-
LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently
本文通过理论分析揭示LoRA适配器与一步全微调梯度子空间的对齐特性,提出LoRA-One算法,利用谱初始化策略显著提升大型语言模型在自然语言理解、数学推理和代码生成任务上的微调性能,同时保持计算效率。
-
Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures
本文提出 CoLM 方法,通过构建小批量核心集匹配大批量梯度,在内存需求减少 2 倍的情况下,使 LLM 微调性能优于 4 倍批大小的常规训练,同时提升收敛速度。
-
Does quantization affect models' performance on long-context tasks?
本文系统评估了量化对大型语言模型在长上下文任务中的性能影响,发现8-bit量化基本保持准确率(下降约0.8%),而4-bit量化导致显著损失(最高达59%),且影响因模型、任务和语言而异,强调了在长上下文和多语言场景下谨慎应用量化的必要性。
-
Why Do More Experts Fail? A Theoretical Analysis of Model Merging
本文通过理论分析揭示了模型融合性能随专家模型数量增加而饱和的原因,并提出Reparameterized Heavy-Tailed方法扩展参数空间覆盖范围,在多个基准任务上验证了其有效性。
-
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?
LLM-Sieve提出了一种任务特定的剪枝框架,通过联合低秩投影和遗传算法实现差异化剪枝,在保持1-5%精度损失下减少20-75%的参数,显著优于现有方法,并与LoRA微调和量化兼容。
-
One-shot Entropy Minimization
本文提出一-shot熵最小化(EM)方法,通过仅使用单个无标签数据和10步优化即可显著提升大型语言模型在数学推理任务上的性能,媲美或超越传统强化学习方法。
-
Born a Transformer -- Always a Transformer?
本文通过检索和复制任务研究Transformer的长度泛化限制,发现预训练选择性增强了归纳能力(向右/向前任务),但无法克服架构固有局限,微调可平衡不对称性但仍受理论约束。
-
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
本文作为立场论文,主张强化微调(RFT)通过强化学习算法显著提升多模态大语言模型(MLLMs)的推理能力,总结了社区在多模态、任务和领域上的进展,并提出了五个未来研究方向,但缺乏具体方法创新和实验验证。
-
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
PASER提出了一种针对剪枝后大语言模型能力恢复的后训练数据选择方法,通过语义聚类、能力退化感知选择和负面效应缓解,在有限数据预算下显著提升恢复性能并降低计算成本。
-
Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data?
本文提出并验证了'浅层偏好信号'现象,通过截断偏好数据集(保留前40%-50% token)训练奖励模型和DPO模型,性能与完整数据集相当甚至更优,并揭示了当前对齐方法过于关注早期token的局限性。
-
ExpandR: Teaching Dense Retrievers Beyond Queries with LLM Guidance
ExpandR通过联合优化大型语言模型和密集检索器,利用LLM生成语义丰富的查询扩展并结合DPO训练和对比学习,在多个检索基准上实现了超过5.8%的性能提升。
-
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions
本文通过对92个开源语言模型的元分析,提出了一种超越缩放定律的性能预测框架,揭示了数据组成(如代码比例15-25%)和架构决策对下游任务性能的显著影响,预测精度相对提升3-28%。
-
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
本文提出PURE框架,通过最小形式信用分配方法利用过程奖励改进大型语言模型的推理能力,实验显示其在数学推理任务上与可验证奖励方法性能相当,且结合少量地面真实信号可进一步提升准确率至53.3%。
-
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?
本文首次系统比较了推理型与非推理型大语言模型在自然语言生成评估中的表现,发现推理能力的效果高度依赖模型架构,OpenAI o3-mini 在机器翻译评估中显著优于非推理型模型,而 DeepSeek-R1 仅在文本摘要一致性评估中表现突出,蒸馏模型在 32B 参数规模时仍有效。
-
Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts
本文提出LayerMoE算法,通过基于层间语言相似性的专家分配和路由分类器,实现了多语言LLM的高效扩展,以更少的参数显著提升新语言性能并减少旧语言遗忘。
-
Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective
Sentinel提出了一种轻量化的句子级别上下文压缩框架,通过探测0.5B代理模型的注意力信号实现高达5倍压缩率,并在LongBench基准上匹配7B规模系统的QA性能。
-
Zero-Shot Vision Encoder Grafting via LLM Surrogates
本文提出通过构建小型代理模型训练视觉编码器并零样本嫁接至大型LLM(如Llama-70B),在保持视觉理解能力的同时将VLM训练成本降低约45%。
-
LoKI: Low-damage Knowledge Implanting of Large Language Models
本文提出LoKI,一种参数高效微调框架,通过分析Transformer FFN层的知识存储机制和层平衡参数选择策略,在下游任务适应和预训练知识保留之间实现了竞争性平衡。
-
Next Token Perception Score: Analytical Assessment of your LLM Perception Skills
本文提出Next Token Perception Score (NTPS),一个量化自回归预训练与下游感知任务特征子空间对齐程度的度量方法,通过理论证明和实验验证其与线性探针性能的相关性,并展示其预测LoRA微调增益的实用性。
-
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
ReMA通过多智能体强化学习分离元思考和推理过程,提升了大型语言模型在数学推理和LLM-as-a-Judge任务上的性能,尤其在分布外泛化能力上表现出色,但对超参数敏感且多轮设置存在稳定性挑战。
-
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
本文提出分块训练(CWT)和跳跃思维训练(STT),通过将推理过程分块并跳过非核心块,显著提升小型语言模型在链式思维蒸馏中的推理准确性和速度。
-
First Finish Search: Efficient Test-Time Scaling in Large Language Models
本文提出First Finish Search (FFS),一种无需训练的测试时扩展策略,通过并行解码并选择最先完成的推理轨迹,在推理任务上显著提升大型语言模型准确率(如DeepSeek-R1在AIME数据集达82.23%),同时减少高达45%的令牌使用量。
-
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
本文提出LongReD方法,通过长文本训练、短文本蒸馏和短到长蒸馏的多目标训练策略,有效缓解了长上下文大语言模型在短文本任务上的性能下降,同时保持或提升长文本处理能力。
-
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
INFTYTHINK通过将长上下文推理分解为迭代短推理片段并结合中间总结,突破了大型语言模型的上下文长度限制,在多个基准上显著提升性能,同时降低了计算成本。
-
SELF: Self-Extend the Context Length With Logistic Growth Function
本文提出SELF方法,通过逻辑增长函数动态调整token分组大小以扩展大型语言模型的上下文长度,在部分长上下文任务上相较Self-Extend提升了性能,但普适性和稳定性仍需验证。
-
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL
本文提出PNLC方法,通过离线RL训练轻量级目标条件值函数辅助大型语言模型在多轮交互任务中进行高效长程规划,在性能和计算效率上显著优于现有RL微调和推理时搜索方法。
-
MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning
本文提出MELoRA,通过并行堆叠多个小型LoRA模块实现更高的等效秩,以更少的参数在自然语言理解和指令跟随任务上显著优于LoRA。
-
1bit-Merging: Dynamic Quantized Merging for Large Language Models
1bit-Merging提出了一种动态模型合并框架,通过1位量化任务向量和任务特定路由,在保持94.53%性能的同时将存储需求降至55.02%,在通用知识、数学推理和代码生成任务上优于传统和动态合并方法。
-
RaCT: Ranking-aware Chain-of-Thought Optimization for LLMs
RaCT通过链式思维(CoT)提示和排序偏好优化(RPO)的两阶段训练框架,显著提升了大型语言模型在文本重排序任务中的性能,同时保留了其通用语言建模能力,在多个基准上超越基线模型。
-
Tensor Product Attention Is All You Need
本文提出Tensor Product Attention (TPA),通过上下文相关的张量分解压缩KV缓存,显著减少推理内存占用,并在语言建模任务中优于或匹配MHA、MQA等基线性能。
-
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
本文提出残差对齐模型(RAM),通过重要性采样分离对齐模块,实现高效的序列级训练和令牌级解码,在多个对齐任务中显著提升性能并降低资源成本。
-
Behavior Injection: Preparing Language Models for Reinforcement Learning
本文提出BRIDGE方法,通过在SFT阶段注入探索和利用行为增强大型语言模型的RL准备度,并在数学与逻辑推理任务上显著提升RFT性能。
-
Language Model Distillation: A Temporal Difference Imitation Learning Perspective
本文提出了一种基于时间差分学习的模型蒸馏框架,利用大型语言模型输出分布的稀疏性,通过top-p候选集缩减动作空间,在指令跟随任务中实现了性能提升和计算效率的改进。
-
Reverse Preference Optimization for Complex Instruction Following
本文提出逆向偏好优化(RPO)方法,通过动态反转指令中未满足的约束消除偏好对噪声,在多轮复杂指令跟随任务上显著优于DPO基线,并在70B模型上超越GPT-4o。
-
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
本文提出动态思维模式优化框架(DTO),通过分割和优化大型推理模型的推理路径,显著减少计算开销并提升准确率,在数学推理基准上实现高达12%的准确率提升和47%的FLOPs减少。
-
Can Large Reasoning Models Self-Train?
本文提出Self-Rewarded Training (SRT) 方法,通过模型自一致性驱动强化学习实现无监督数学推理能力提升,初期性能媲美有监督方法,但因奖励黑客问题导致长期训练性能崩溃,并探索了提前停止和课程学习等缓解策略。
-
Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging
本文提出OSRM方法,通过在微调前约束LoRA子空间以减少任务间干扰,显著提升了多个语言模型在八个GLUE数据集上的合并性能,同时保持单任务准确性。
-
Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning
本文提出Perturb-and-Merge (P&M)框架,通过训练时任务向量扰动和推理时模型凸组合合并,结合LoRA实现参数高效持续学习,在多个基准数据集上显著缓解灾难性遗忘并提升性能。
-
LoLA: Low-Rank Linear Attention With Sparse Caching
LoLA通过结合线性注意力、滑动窗口和稀疏缓存三种内存形式,在推理时有效缓解记忆冲突,显著提升线性注意力模型在长上下文关联回忆和语言建模任务上的性能,同时保持高效内存使用。
-
Let's Predict Sentence by Sentence
本文提出了一种句子级推理框架,通过自回归预测连续句子嵌入,将预训练语言模型提升到抽象推理空间,上下文嵌入在连续推理模式下与Chain-of-Thought (CoT) 表现相当,同时平均将推理计算成本降低一半。
-
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
本文提出了一种层交换方法,通过将语言专家模型的顶部和底部层与数学专家模型的中间层重组,实现零样本跨语言迁移,在低资源语言的数学推理任务上显著提升性能达10%。
-
You Do Not Fully Utilize Transformer's Representation Capacity
本文提出Layer-Integrated Memory (LIMe),通过学习跨层路由机制整合之前所有层的Key-Value表示,显著缓解Transformer的表示崩塌问题,并在语言建模、推理任务和深层网络中实现更快收敛和更高准确率。
-
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
本文通过模型融合方法整合快速思维和慢速推理能力,实现长到短推理,在7B模型上将响应长度压缩高达55%且保持性能,提出了一种高效解决大语言模型过度思考问题的方案。
-
RepCali: High Efficient Fine-tuning Via Representation Calibration in Latent Space for Pre-trained Language Models
本文提出了一种名为RepCali的微调方法,通过在潜在空间中校准预训练语言模型编码器输出,显著提升了25个模型在8个下游任务上的性能,同时仅增加0-0.8%的参数。
-
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models
本文提出SORSA,一种基于奇异值分解和正交正则化的参数高效微调方法,通过优化权重矩阵条件数提升大型语言模型在下游任务上的性能,并在GSM-8K等基准测试中显著优于LoRA和PiSSA等方法。
-
Zebra-Llama: Towards Extremely Efficient Hybrid Models
Zebra-Llama通过结合状态空间模型和多头潜在注意力层,从预训练Transformer构建高效混合模型,显著降低KV缓存大小并提升推理吞吐量,同时保持或超越基线性能。
-
How does Transformer Learn Implicit Reasoning?
本文通过在受控符号环境中从头训练Transformer模型,揭示了隐式多跳推理的三阶段发展轨迹,并利用跨查询语义补丁和余弦表示透镜工具,阐明了推理能力与隐藏空间聚类的关联,为模型可解释性提供了新见解。
-
ATLAS: Learning to Optimally Memorize the Context at Test Time
本文提出Atlas,一种高容量长期内存模块,通过滑动窗口Omega规则和Muon优化器优化上下文记忆,在语言建模和长上下文理解任务中显著优于Transformer和现代RNN。
-
Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking
本文通过综述、基准测试和提出权重重分解与动量重置两种技术,探索了大型语言模型预训练中的参数和内存高效方法,显著提升了低秩方法的性能并减少内存消耗,但仍无法完全匹配全秩训练的效果。
-
MoRE: A Mixture of Low-Rank Experts for Adaptive Multi-Task Learning
本文提出MoRE方法,通过将LoRA的不同秩视为专家并设计自适应秩选择器,显著提升了大型语言模型在多任务场景中的微调效率和性能,同时保持较低的参数量。
-
Steering LLM Reasoning Through Bias-Only Adaptation
本文通过训练转向向量(steering vectors)验证了大型语言模型中推理能力已潜藏的假设,在数学推理任务上以极高的参数效率接近甚至超过全模型微调的表现。
-
Interleaved Reasoning for Large Language Models via Reinforcement Learning
本文提出了一种交错推理范式,通过强化学习训练大型语言模型交替思考和回答,显著降低时间到首token(TTFT)超过80%,并在多个推理任务上提升准确率最高达19.3%。
-
Graceful Forgetting in Generative Language Models
本文提出Learning With Forgetting (LWF)框架,通过自生成知识、Fisher信息矩阵加权的遗忘置信度计算和周期性遗忘策略,在生成式语言模型的微调中实现优雅遗忘,实验表明其在大多数领域特定问答任务上显著提升性能。
-
LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs
本文提出LiSTEN框架,通过动态提示选择策略高效适应大型语言模型到音频任务,在减少大规模数据集依赖和训练参数量的同时,实现了多任务学习中的竞争性能和更高的可解释性。
-
本文通过隐藏状态的几何特性(可分离性和对齐性)提出统一框架,揭示上下文学习(ICL)在分类任务中的两阶段机制——早期层通过PTH增强可分离性,后期层通过IH优化对齐性,并解释了任务向量的有效性。
-
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
本文研究了口语语言模型(SLM)端到端训练中的灾难性遗忘问题,通过评估模型合并、LoRA缩放因子折扣和经验回放三种策略,发现经验回放最为有效,且结合其他方法可进一步提升性能。
-
SeMe: Training-Free Language Model Merging via Semantic Alignment
本文提出SeMe,一种基于语义对齐的无训练、无数据语言模型合并方法,通过潜在空间的语义分解和变换实现参数融合,旨在保留模型行为并稳定内部知识,但缺乏充分的实验验证。
-
Thinker: Learning to Think Fast and Slow
本文提出Thinker任务,通过将问答过程分解为快速思考、验证、慢速思考和总结四个阶段,利用强化学习针对性训练大型语言模型的直觉和推理能力,在数学推理基准上实现了显著性能提升。
-
Parameter-Efficient Fine-Tuning with Column Space Projection
本文提出PiCa,一种基于谱特性的参数高效微调方法,通过将梯度投影到预训练权重的低秩列子空间并结合权重共享,在显著减少参数量的同时实现了优于LoRA和SVFT的性能。
-
Can Past Experience Accelerate LLM Reasoning?
本文提出SpeedupLLM框架,通过自适应计算分配和记忆机制实现LLM推理加速,实验表明计算成本最高可减少56%,尤其在高相似度问题上效果显著。
-
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models
CoThink 提出了一种双阶段推理框架,通过指令模型生成解决方案大纲指导推理模型完成解答,在保持准确率的同时平均减少 22.3% 的令牌生成量,提升了大型语言模型的推理效率。
-
The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason
本文研究了大语言模型在强化学习后训练中对奖励噪声的鲁棒性,提出推理模式奖励(RPR)策略,通过奖励关键推理短语而非答案正确性显著提升性能,并用RPR校准噪声奖励模型,改善开放式任务表现。
-
Core Context Aware Transformers for Long Context Language Modeling
本文提出了一种核心上下文感知注意力机制(CCA-Attention),通过全局感知池化和局部保持模块减少长上下文建模中的冗余信息,在保持性能的同时显著提升计算效率,实验表明在 128K 上下文下实现了 7.9 倍加速和约 45% 内存减少。
-
When More is Less: Understanding Chain-of-Thought Length in LLMs
本文通过理论分析、控制实验和现实观察,揭示Chain-of-Thought (CoT) 长度与推理性能呈倒U型关系,提出最优长度随任务难度增加和模型能力增强而变化的缩放规律,并展示了基于最优长度的训练和推理策略的显著性能提升。
-
R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
R1-Compress通过块级压缩和块间搜索机制有效压缩长链式推理(Long-CoT),在减少约20% token使用量的同时保持了与基线接近的推理准确率(92.4% vs 93.0%)。
-
Mitigate Position Bias in Large Language Models via Scaling a Single Dimension
本文提出通过缩放隐藏状态中的位置通道来缓解长上下文语言模型的位置偏差问题,并在多个模型和任务上验证了其有效性,特别是在“中间丢失”基准测试中显著提升了中间位置信息的利用率。
-
General-Reasoner: Advancing LLM Reasoning Across All Domains
本文提出General-Reasoner,通过零强化学习结合跨领域高质量数据集和基于生成模型的验证器,显著提升大型语言模型在多领域推理任务上的性能,同时保持数学推理的有效性。
-
Efficient Length-Generalizable Attention via Causal Retrieval for Long-Context Language Modeling
本文提出Grouped Cross Attention (GCA)机制,通过可微分检索和动态上下文选择实现Transformer模型的长度泛化,在16M上下文长度下达到完美passkey检索准确率,同时显著降低计算和内存成本。
-
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging
本文提出一个多模态大语言模型(MLLM)融合基准和改进的任务向量优化方法(WUDI v2),通过低秩近似去除噪声并优化合并向量,在多任务和跨模态融合实验中取得平均2.48%的性能提升,展现了无需数据训练即可构建高性能MLLMs的潜力。
-
MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability
本文提出 MASKSEARCH 框架,通过 Retrieval-Augmented Mask Prediction (RAMP) 预训练任务结合监督微调和强化学习,显著提升了大型语言模型在开放域多跳问答任务中的代理搜索能力。
-
REARANK: Reasoning Re-ranking Agent via Reinforcement Learning
本文提出REARANK,一种基于强化学习的列表式重排序代理,通过显式推理和数据增强,仅用179个标注查询即在多个信息检索基准上显著超越基线并媲美甚至超越GPT-4,尤其在推理密集型任务中表现突出。
-
Incentivizing Strong Reasoning from Weak Supervision
本文提出弱到强推理(W2SR)范式,通过显著较弱教师模型生成的结构化链式思维轨迹对强学生模型进行监督微调,以低成本方式显著提升其推理能力,接近甚至超越昂贵的强化学习效果。
-
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
本文提出DC-CoT基准,通过系统评估数据增强、选择和混合策略在链式思维(CoT)蒸馏中的效果,揭示数据增强(如反向思维)对小型学生模型推理能力提升的显著作用,并为高效推理模型开发提供了实践指导。
-
Pretraining Language Models to Ponder in Continuous Space
本文提出Pondering Language Model,通过在预训练阶段引入自监督的连续空间深思机制,显著提升语言模型在语言建模和下游任务上的性能,PonderingPythia-1B接近TinyLlama-1.1B的效果。
-
Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs
本文提出动态采样预算分配和温度调度机制,通过基于问题难度的资源再分配和维持策略熵的探索能力,显著提升了大型语言模型在数学任务中的强化学习效率和性能,尤其在AIME 2024基准上pass@1和pass@16分别提高5.31%和3.33%。
-
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
本文通过实验验证了长上下文能力与推理性能的正相关,提出在监督微调前增强长上下文能力的训练策略,并在数学推理基准上显著提升了模型性能。
-
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
R1-Searcher++ 通过两阶段训练策略(SFT 和 RL),结合奖励机制和记忆模块,使大型语言模型自适应地平衡内部知识与外部检索,在多跳问答任务中显著提升准确性和检索效率。
-
Self-Interpretability: LLMs Can Describe Complex Internal Processes that Drive Their Decisions, and Improve with Training
本文通过微调GPT-4o和GPT-4o-mini,展示了大型语言模型能够量化报告其内部决策过程(如属性权重),并通过内省训练显著提升报告准确性,且这种能力可泛化至原生偏好,为AI可解释性和安全性提供了新路径。
-
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
本文挑战了推理 LLMs 中更长思考链提升性能的假设,提出 *short-m@k* 推理方法,通过优先选择较短推理链实现高达 34.5% 的准确率提升和 40% 的计算量减少,并通过微调验证了短推理链训练的有效性。
-
From Compression to Expansion: A Layerwise Analysis of In-Context Learning
本文通过统计几何分析揭示了大型语言模型在上下文学习中的层级压缩-扩展现象,早期层压缩任务信息,后期层扩展生成预测,并探讨了模型大小、演示数量和噪声对性能的影响。
-
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
本文提出 ConciseR,一种两阶段强化学习框架,通过 GRPO++ 提升推理能力并通过 L-GRPO 优化响应长度,在保持准确性的同时显著减少 CoT 响应长度,优于多个基准数据集上的现有方法。
-
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
本文提出R2R,一种令牌级别的神经路由方法,通过选择性使用LLM修正SLM推理路径中的分歧令牌,在平均激活参数5.6B下超越R1-14B模型性能,并比R1-32B实现2.8倍墙钟加速。
-
Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning
本文通过仅使用920个蒸馏样本对Qwen2.5-32B基础模型进行监督微调,显著超越了资源密集的Zero-RL方法,并揭示了蒸馏模型通过拟人化语言和高级认知行为实现更灵活推理的机制。
-
Learning Composable Chains-of-Thought
本文提出Composable Chain-of-Thought方法,通过数据增强改进原子任务CoT格式,并结合多任务学习或模型合并实现零样本组合推理,使用拒绝采样微调进一步提升性能,在字符串操作和自然语言任务上优于标准CoT基准。
-
Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models
本文通过分析大型语言模型中长链式思维能力的激活模式,提出了一种训练无关的激活控制方法(EELo-CoT)和参数高效微调策略,在推理时动态调整激活值以显著提升自反思率和准确率。
-
Distilling LLM Agent into Small Models with Retrieval and Code Tools
本文提出Agent Distillation框架,通过将LLM代理的交互行为蒸馏到sLMs中,并结合first-thought prefix和self-consistent action generation方法,使小型模型在事实和数学推理任务上取得显著性能提升,接近甚至超越更大规模的CoT蒸馏模型。
-
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning
本文通过理论分析和Re-distillation技术,揭示了小规模SFT在R1风格RL中的效率瓶颈,并以极少样本(<1K)在K&K和MATH数据集上接近RL性能,显著提升了数据效率。
-
Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive Rewards
本文提出自适应直接长度惩罚(A-DLP)方法,通过动态调整强化学习中的长度惩罚系数,在减少大型语言模型推理长度超过 50% 的同时保持准确性,为构建高效推理模型提供了新方向。
-
SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models
SLearnLLM提出了一种自学习框架,通过让大语言模型自我评估并筛选错误回答的QA对进行微调,在农业和医疗领域实现了与全数据集微调相当的性能提升,同时显著降低了训练时间成本。
-
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs
本文通过模块化方法,利用大型语言模型参数在数学推理和多语言能力上的分离性,提出Layer-Swapping等策略,在低资源语言跨语言迁移中显著优于非模块化基线,尤其在数据受限场景下表现最佳。
-
Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning
本文系统研究了CoT蒸馏中教师模型选择、粒度和格式对小型语言模型(SLMs)推理能力的影响,发现强模型受益于高粒度CoT而弱模型偏好中等粒度,格式影响有限,且教师模型能力并非决定学生表现的唯一因素。
-
Structured Agent Distillation for Large Language Model
本文提出结构化代理蒸馏框架,通过分割大型语言模型代理轨迹为推理和行动片段并施加分段特定监督,在压缩模型时显著提升任务成功率、推理效率和一致性,优于token级基线。
-
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs
本文提出 Universal Reasoner (UniR),一种轻量级、可组合的推理模块,通过将预定义奖励转化为 token 级别指导信号,为冻结的大型语言模型提供高效的推理能力增强,并在数学推理与机器翻译任务上展现出优于部分基线的性能与跨模型迁移能力。
-
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
本文通过提出基于强化学习的LASER系列方法(LASER, LASER-D, LASER-DE),利用动态和难度感知的长度奖励塑造,在保持大型推理模型性能的同时显著提高token效率,在多个数学推理基准上实现了Pareto最优的准确率和效率权衡。
-
Thought calibration: Efficient and confident test-time scaling
本文提出‘思想校准’方法,通过推理树抽象和轻量级探针动态决定语言模型推理终止时机,在分布内数据上减少高达60%的思考token,同时保持性能,并在分布外数据上实现20%的减少。
-
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
本文提出Compressed Latent Reasoning (CoLaR)框架,通过潜在空间动态压缩和强化学习优化大型语言模型的推理过程,在数学推理任务中显著提升效率并保持较高准确率。
-
LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging
本文提出LORE-MERGING框架,通过低秩估计构建近似基础模型和任务向量,无需访问原始基础模型即可实现模型合并,并在多个基准数据集上展现出优于传统方法的性能。
-
AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking
AdaReasoner通过强化学习框架自适应调整大型语言模型的推理配置(生成温度、推理步骤数和指令格式),在多样化任务上显著优于固定配置的基线方法,展现了快速收敛和分布外鲁棒性。
-
Small Models, Smarter Learning: The Power of Joint Task Training
本文通过ListOps数据集上的小型Transformer模型实验,揭示联合任务训练(如MAX+MED+SUM)显著降低学习难度、减少参数需求,并引导模型发现基于数字属性的高效算法,而非单纯记忆符号表。
-
Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection
本文提出Route-To-Reason(RTR)框架,通过动态路由机制联合选择最优模型和推理策略,在多个推理任务上实现了更高的准确率和超过60%的token使用量减少,显著优化了性能与成本的权衡。
-
ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models
本文提出 ALPS 算法,通过基于权重分布的参数对齐分布分数(sPAD)定位任务敏感注意力头并剪枝,仅更新 10% 的注意力参数即在通用、数学和代码任务上实现性能提升,同时展现头部可转移性和知识遗忘缓解效果。
-
Knowledge Grafting of Large Language Models
GraftLLM提出了一种通过模块感知压缩生成SkillPack的方法,实现大型语言模型间高效跨能力转移、知识融合和无遗忘持续学习,并在多个基准测试中显著优于现有方法。
-
Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting
本文提出难度感知提示(DAP)方法,通过动态调整推理轨迹长度构建精简的LiteCoT数据集(100K样本,平均720token),训练的Liter模型在多个推理基准上显著优于传统长CoT方法,同时大幅降低训练和推理成本。
-
Adaptive Deep Reasoning: Triggering Deep Thinking When Needed
本文提出了一种自适应深度推理方法,通过监督微调和强化学习使大型语言模型根据问题复杂性自动切换长链和短链推理模式,并在数学任务上展示了有效性和效率提升。
-
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective
本文提出RaML框架,从元学习视角将LLM推理轨迹视为伪梯度更新,通过理论分析和实验验证了推理与优化的关联,并探索了训练策略和轨迹特性对推理能力的提升潜力。
-
Temporal Sampling for Forgotten Reasoning in LLMs
本文揭示了大型语言模型微调中的'Temporal Forgetting'现象,并提出'Temporal Sampling'方法,通过从多个训练检查点采样答案显著提升推理性能(Pass@k提升4-19个百分点),并通过LoRA适配降低存储成本。
-
A Unified Approach to Routing and Cascading for LLMs
本文通过理论分析推导出最优的路由和级联策略,并提出级联路由这一统一框架,在成本预算内显著提升大型语言模型的输出质量,尤其在质量估计准确的场景下性能提升明显。
-
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions
本文提出了一种'Ensemble'提示框架,通过描述上下文示例选择标准提升大型语言模型在上下文学习中的性能,实验表明模型对提示格式的敏感性远高于描述内容本身,尤其在小型模型上效果显著。
-
Language Models are Universal Embedders
本文基于多语言解码器模型(如BLOOM)提出通用嵌入器构建方法,通过对比学习和参数高效微调实现跨语言、跨任务的高质量嵌入,实验表明其在多语言和多任务场景中具有显著潜力和泛化能力。
-
Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling
本文提出响应条件Bradley-Terry(Rc-BT)模型,通过区分语义意图和长度指令,显著缓解大语言模型在RLHF中的长度偏见,并提升长度指令遵循能力,实验验证了其在多个模型和数据集上的优越性。
-
Explaining Context Length Scaling and Bounds for Language Models
本文从内在空间视角提出理论框架,解释上下文长度对语言模型损失的影响,推导出与数据集大小相关的最优上下文长度,并通过自然语言和合成数据实验验证假设。
-
Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning
本文提出RLKD,一个基于强化学习的知识蒸馏框架,通过生成结构奖励模型(GSRM)将教师模型推理中的隐式多分支结构传递给学生模型,实验表明其在数学和问答任务上显著优于SFT和传统RL方法。
-
Latent Principle Discovery for Language Model Self-Improvement
本文提出STaPLe算法,通过Monte Carlo EM方法自动化发现和学习语言模型自我改进的潜在原则,在多个指令跟随基准上显著提升小型模型性能,同时通过聚类生成人类可解释的宪法。
-
Merge to Mix: Mixing Datasets via Model Merging
本文提出*Merge to Mix*方法,通过模型合并技术作为代理,高效选择数据集混合用于大型模型微调,在图像分类和语言任务中显著优于传统方法,接近甚至部分超过Oracle性能。
-
Diverse, not Short: A Length-Controlled Self-Learning Framework for Improving Response Diversity of Language Models
本文提出Diverse-NS框架,通过长度控制的自学习和偏好优化显著提升了大型语言模型在创造性任务中的响应多样性,同时在大多数情况下保持了输出质量,并验证了小模型作为大模型多样性教师的可行性。
-
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning
本文通过实验和理论分析揭示了RLVR提升大型语言模型准确性但不提升能力的原因在于其偏向优化简单问题,而蒸馏只有在引入新知识时才能提升能力,否则表现与RLVR类似。
-
On the Generalization vs Fidelity Paradox in Knowledge Distillation
本文通过大规模实证分析揭示知识蒸馏(KD)显著提升小型语言模型的零样本推理性能(高达10%),但对大型模型收益有限,且性能提升与推理保真度存在脱节,强调任务专长和适度参数调整的重要性。
-
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
本文提出基于扩散语言模型的文本嵌入方法DIFFEMBED,利用其双向注意力机制在长文档检索和推理密集型任务上显著优于自回归LLM嵌入模型,同时在传统嵌入任务上表现相当。
-
The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation
本文通过系统性实验证明,在大型语言模型微调中增加语言多样性可显著提升所有类别翻译对的性能,并通过中层表征分析揭示跨语言迁移机制,但多样性收益存在阈值。
-
Cross-Lingual Optimization for Language Transfer in Large Language Models
本文提出跨语言优化(CLO)方法,通过翻译数据和改进的DPO策略,将英语中心的大型语言模型有效转移到目标语言,在保持英语能力的同时显著提升目标语言性能,尤其在低资源语言中以更少数据取得优于传统SFT的结果。
-
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
IDEAL提出了一种基于梯度的迭代数据均衡适应框架,通过动态优化监督微调(SFT)中多领域数据集的比例,在2次迭代内显著提升大型语言模型的多任务性能,平均得分提高约7%。
-
ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy
ThinkLess 提出了一种无需训练的推理效率提升框架,通过注意力分析揭示 CoT 推理冗余并早期终止生成,结合轻量级输出调节机制,在保持准确率的同时显著降低 token 使用量和推理时间。
-
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
本文通过KVFundaBench基准系统评估KV缓存压缩对大型语言模型基本能力的影响,揭示任务依赖性性能降解,并提出ShotKV方法,通过区分预填充和解码阶段压缩策略,在长上下文生成任务上显著提升性能。
-
Improving Multilingual Language Models by Aligning Representations through Steering
本文提出了一种通过表示引导调整大型语言模型层级表示的方法,以提升多语言任务性能,实验显示其在多种任务中优于基本提示并接近翻译基线,但对英语任务有负面影响且对低资源语言改进有限。
-
Understanding Cross-Lingual Inconsistency in Large Language Models
本文通过*logit lens*分析大型语言模型(LLMs)的跨语言不一致性,发现大型模型倾向于在个别语言子空间操作而非共享语义空间,并提出跨语言激活引导方法以提升小型模型的多语言推理性能和知识转移。
-
An Analysis for Reasoning Bias of Language Models with Small Initialization
本文通过理论分析和实验验证,揭示了小参数初始化规模如何通过影响嵌入空间和训练动态,促使大型语言模型更倾向于推理任务而非记忆任务。
-
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
本文通过提出 PTQ-Bench 基准测试框架,系统评估了大型语言模型后训练量化(PTQ)策略的跨位宽、跨结构和跨模态鲁棒性,发现旋转型和补偿型策略在低位量化中表现优异,并提出极低位量化需重新审视及补偿型策略结合其他方法可显著提升鲁棒性的关键见解。
-
Thinking Out Loud: Do Reasoning Models Know When They're Right?
本文通过对比指令微调、监督微调和强化学习训练的大型推理模型,发现推理导向训练显著提升了推理任务中的准确性和校准能力,但在事实性任务中可能削弱小规模模型对知识边界的感知。
-
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models
本文提出元策略优化(MPO)框架,通过元奖励模型动态调整奖励模型的评估提示,显著提升了大语言模型在多种任务中的对齐性能,同时减少了奖励漏洞和手动提示工程的负担。
-
Sparsity May Be All You Need: Sparse Random Parameter Adaptation
本文提出SpaRTA方法,通过随机选择一小部分预训练模型参数进行微调,实现参数高效性,并在自然语言理解任务上展现出与LoRA相当的性能和显著的内存节省。
-
LIFEBench: Evaluating Length Instruction Following in Large Language Models
本文通过引入LIFEBENCH基准,系统评估了26个大型语言模型在长度指令遵循上的能力,发现其在长长度约束下普遍表现不佳,且远未达到厂商宣称的最大输出长度,揭示了模型在长度感知和长文本生成上的根本局限性。
-
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction
本文通过构建模型特定数据集和信念操控实验,揭示了大型语言模型(LLMs)的撤回行为受内部信念因果影响,并通过监督微调显著提高撤回性能。
-
ABBA: Highly Expressive Hadamard Product Adaptation for Large Language Models
ABBA 提出了一种新型参数高效微调方法,通过两个独立低秩矩阵的哈达玛积重新参数化权重更新,在保持参数效率的同时显著提升表达能力和性能,实验表明其在多个语言模型和任务上优于现有 PEFT 方法。
-
UFT: Unifying Supervised and Reinforcement Fine-Tuning
本文提出统一微调(UFT)框架,通过整合监督微调和强化微调,利用提示引导探索和混合目标函数,在不同规模模型和推理任务上均表现出色,并理论上证明了样本复杂度的指数级改进。
-
Shadow-FT: Tuning Instruct via Base
本文提出Shadow-FT框架,通过调优BASE模型并将权重更新直接移植到INSTRUCT模型,显著提升了大型语言模型在数学、编码和推理任务上的性能,同时不引入额外训练成本。
-
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation
ShareLoRA通过在模型层间共享低秩矩阵A或B,显著减少可训练参数量(相较LoRA减少44%-96%),并在多种模型和任务中保持甚至超越LoRA的性能,展现出高效性、适应性和跨域鲁棒性。
-
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization
本文提出Quantized Zeroth-order Optimization (QZO),通过扰动量化尺度参数并结合方向导数裁剪,在量化神经网络上实现零阶优化微调,将内存使用减少18倍以上,并在LLMs和Stable Diffusion上展示出显著的内存效率和一定的性能提升。
-
Do Language Models Use Their Depth Efficiently?
本文通过对Llama 3.1和Qwen 3模型的残差流分析和干预实验,发现大型语言模型未有效利用深度,后半部分层主要细化概率分布而非进行新计算,且处理深度与输入复杂性无关,提示当前架构和训练目标需改进。
-
Large Language Models are Miscalibrated In-Context Learners
本文通过对大型语言模型在低资源场景下的校准问题进行深入分析,揭示上下文学习(ICL)未一致改善校准效果,并提出自集成方法显著提升校准性能(平均降低ECE 43%),同时维持或略提升任务性能。
-
Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models
本文通过探测和激活编辑实验,系统研究了语言模型内部信念表征的涌现、结构、鲁棒性和可增强性,发现表征随模型规模和微调改善,具有结构化特征但对提示变化脆弱,并可通过对比激活添加(CAA)显著提升ToM性能。
-
Chain-of-Model Learning for Language Model
本文提出 Chain-of-Model (CoM) 学习范式,通过在 Transformer 架构中引入因果依赖的多尺度表示(Chain-of-Representation),实现高效模型扩展和弹性推理,实验表明 CoLM 系列在性能上与标准 Transformer 相当,同时在预填充速度和灵活性上具有优势。
-
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
本文通过对比实验揭示,尽管长序列模型(如Mamba2)理论上支持无限长上下文,但在实际长上下文任务中与Transformer模型一样面临显著局限,尤其在信息位置和数据格式变化时表现不佳,亟需进一步研究其原因。
-
Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents
本文提出Pre-Act方法,通过多步骤规划和详细推理提升LLM代理性能,并通过微调小型模型(如Llama 3.1 70B)在Almita数据集上实现比GPT-4高69.5%的行动准确率和28%的目标完成率。
-
From Words to Worlds: Compositionality for Cognitive Architectures
本文通过设计三种任务评估大型语言模型(LLMs)的组合性能力,发现模型规模扩大通常提升组合性表现,而指令微调效果不一致,提示组合性对性能提升的解释力有限。
-
Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching
本文提出了一种跨分词器蒸馏方法ALM,通过近似似然匹配实现不同分词器间的知识转移,首次在子词到字节级迁移等场景中取得显著效果,并在多个应用案例中优于现有方法。
-
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
EfficientQAT提出了一种高效的量化感知训练框架,通过块级全参数训练(Block-AP)和端到端量化参数训练(E2E-QP),在低比特场景下显著提升大型语言模型的量化性能,同时大幅降低训练资源需求。
-
Training Language Models to Reason Efficiently
本文提出了一种通过强化学习训练大型推理模型以高效推理的方法,利用长度惩罚目标函数和可调参数α显著降低推理成本,同时在多个数学数据集上保持大部分准确性。
-
CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
本文提出CoLA及其内存优化变体CoLA-M,通过用低秩自动编码器替换LLMs的全尺寸MLP和投影层,实现2倍模型大小和计算成本的减少,同时保持全秩性能,并在训练和推理中显著提升吞吐量。
-
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
本文提出MKA方法,通过流形学习和信息瓶颈度量实现大语言模型的层合并压缩,在多个基准数据集上以较小的性能损失实现显著压缩率,并结合量化进一步提升效果。
-
Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning
本文提出Long⊗Short框架,通过长思维和短思维LLM协作推理,利用自动思维分块、冷启动SFT和多轮RL优化,显著提升推理效率,在多个基准上使Qwen2.5-7B和Llama3.1-8B性能接近蒸馏模型,同时减少token长度超80%。
-
EfficientLLM: Efficiency in Large Language Models
EfficientLLM通过大规模实证基准测试,系统评估了大型语言模型在架构预训练、微调和推理阶段的效率优化技术,揭示了资源权衡和任务依赖性,为从业者提供了基于数据的模型和技术选择指导。
-
ThinkSwitcher: When to Think Hard, When to Think Fast
ThinkSwitcher通过一个轻量级自适应框架,使单一大型推理模型根据任务复杂性动态切换长短链式推理模式,在数学推理基准上减少20-30%计算成本,同时在复杂任务上保持较高准确率。
-
Plan and Budget: Effective and Efficient Test-Time Scaling on Large Language Model Reasoning
本文提出 PLAN-AND-BUDGET 框架,通过结构化推理和基于不确定性的自适应 token 预算分配,显著提升大型语言模型在推理任务中的计算效率,E3 指标最高提升 187.5%,同时保持准确率。
-
Understanding Fact Recall in Language Models: Why Two-Stage Training Encourages Memorization but Mixed Training Teaches Knowledge
本文通过跨任务梯度追踪工具揭示了混合训练通过增加共享参数的数量和重要性,并在关键注意力头中集中这些参数,从而教授知识并提升语言模型的事实回忆泛化能力。
-
AdaptThink: Reasoning Models Can Learn When to Think
本文提出 *AdaptThink*,一种基于强化学习的算法,通过自适应选择 *Thinking* 或 *NoThinking* 模式显著降低推理模型的响应长度(平均减少 40-53%)并提升准确率(平均提升 2.3-2.4%),在数学任务上展现了效率与性能的良好平衡。
-
SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning
SATURN提出一个基于SAT问题的强化学习框架,通过课程学习和可控难度的SAT任务显著提升大型语言模型在SAT、数学和编程任务上的推理能力。
-
Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games
本文提出Divide-Fuse-Conquer框架,通过分组训练、参数融合和持续优化提升大型语言模型在多场景游戏中的泛化能力,实验在TextArena的18个游戏中显示Qwen2.5-32B-Align性能接近Claude3.5,但复杂场景表现仍有限。
-
Not All Correct Answers Are Equal: Why Your Distillation Source Matters
本文通过从三个顶尖大语言模型中提炼189万推理数据,系统研究了提炼源对学生模型性能的影响,发现AM-Thinking-v1提炼数据在多个推理基准上显著提升学生模型表现,并展现出适应性生成长度特性。
-
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
本文通过MathIF基准测试评估大型推理模型在数学任务中的指令遵循能力,揭示了推理能力提升与指令遵循能力下降之间的权衡关系,并通过实验验证了训练策略和推理链长度对这一权衡的影响。
-
Reward Reasoning Model
本文提出奖励推理模型(RRMs),通过链式推理过程在生成奖励前自适应利用测试时计算资源,在多个奖励建模基准和实际应用中显著提升性能,尤其在复杂推理任务上表现优异。
-
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
本文通过熵最小化提出三种无监督方法(EM-FT, EM-RL, EM-INF),显著提升了大型语言模型在数学、物理和编码推理任务上的表现,无需标注数据且在某些情况下超越了传统监督方法和前沿模型。
-
Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One
本文提出LLM-Ens框架,利用大型语言模型(LLMs)通过语义状态分类和动态代理选择增强强化学习模型集成,在Atari基准上显著提升性能,最高较基线方法提升51.2%。
-
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners
本文提出了一种无训练干预方法,通过在推理时移除大型语言模型中的语言特异性表示以解耦语言和推理,显著提升了多语言推理性能,尤其是在中低资源语言上,同时揭示了语言信号与推理准确性的负相关性。
-
本文提出自推理语言模型(SRLM),通过少量推理催化数据引导模型自生成更长推理链并迭代自训练,在多个推理基准上实现平均 +2.5 个百分点的性能提升,展现了探索深度和创造性推理路径的潜力。
-
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning
本文提出RL-of-Thoughts (RLoT) 方法,通过强化学习训练轻量化导航模型,在推理时动态构建任务特定逻辑结构,显著提升大型语言模型在多领域推理任务中的表现,并展现出跨模型和任务的强迁移能力。
-
FlashThink: An Early Exit Method For Efficient Reasoning
FlashThink方法通过验证模型动态判断推理过程是否提前结束,在保持大型语言模型准确率的同时显著减少推理内容长度(平均效率提升约77%),并通过FT²微调进一步优化性能。
-
Context-Free Synthetic Data Mitigates Forgetting
本文提出了一种上下文无关合成数据(CFS)方法,通过生成无条件样本并结合微调和预训练损失,缓解大型语言模型在数据不可知场景下的灾难性遗忘,实验在Olmo-1B和R1-Distill-Llama-8B模型上验证了其有效性。
-
Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging
本文提出测试时模型合并(TTMM)方法,通过在训练时预训练大量专家模型并在测试时动态合并参数,以几乎无测试时开销的方式逼近测试时训练(TTT)的语言建模性能。
-
Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning
本文提出TokenAdapt框架,通过混合启发式初始化策略实现分词器移植,并在零样本困惑度测试中显著优于基线方法,同时初步探索Supertoken学习以提升压缩效率。
-
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
Token Recycling 提出了一种无训练的推测解码方法,通过回收候选词并利用邻接矩阵构建草稿树,实现大型语言模型推理约 2 倍加速,相较于其他无训练方法提升超 30%。
-
Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs
本文提出Meta-RFFT框架,通过多任务规则跟随预训练和少量下游适应,显著提升了大型语言模型在未见任务上的长度泛化能力,32B模型在长度30的加法任务上达到98%准确率,超越现有长链推理模型。
-
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs
本文提出滑动层合并(SLM)方法,通过基于CKA相似性动态合并大型语言模型的连续层,实现深度剪枝,在零样本任务和推理效率上显著优于现有方法,同时探索了深度与宽度剪枝结合的潜力。
-
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
本文提出MEAP训练范式,通过在下一词预测中引入随机掩码策略,显著提升大型语言模型在关键信息检索和长上下文推理任务中的性能,同时保持计算效率和架构兼容性。
-
Fractured Chain-of-Thought Reasoning
本文提出Fractured Sampling方法,通过在推理轨迹数量、解决方案多样性和推理深度三个维度上进行采样优化,显著提升大型语言模型在长链式推理任务中的成本-性能权衡。
-
ExpertSteer: Intervening in LLMs through Expert Knowledge
EXPERTSTEER提出了一种创新的激活转向方法,通过自编码器、互信息分析和递归特征机从外部专家模型生成转向向量,干预任意目标大型语言模型的行为,在多个领域和模型上显著提升性能。
-
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities
本文提出MoL框架,通过对领域语料使用CE损失和对通用语料使用KL散度损失的双重优化策略,显著提升大型语言模型的领域专长,同时有效保留通用能力,并在医学领域任务中取得优异表现。
-
Vectors from Larger Language Models Predict Human Reading Time and fMRI Data More Poorly when Dimensionality Expansion is Controlled
本文通过控制维度扩展发现,大型语言模型(LLMs)在预测人类阅读时间和脑成像数据时,随着模型规模增加,训练过程的贡献反而减少,揭示了模型与人类句子处理机制的潜在错位。
-
Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately
本文提出SART框架,通过冗余采样与早期停止以及两阶段动态修剪方法,显著提升了大型语言模型推理服务的效率(最高28.2倍),同时保持了与基线相近的准确性。
-
Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs
本文通过层级上下文掩码和跨任务补丁方法,验证了大型语言模型内部存在‘内部思维链’,即在不同网络深度学习并按序执行复合任务的子任务,从而提升了模型透明度并为指令级行为控制开辟了新路径。
-
SSR: Speculative Parallel Scaling Reasoning in Test-time
本文提出SSR框架,通过选择性并行模块和步骤级推测性解码,在测试时显著提升大型语言模型在数学推理任务中的效率-准确性权衡,无需额外训练。
-
CoLA: Collaborative Low-Rank Adaptation
CoLA通过提出灵活的LoRA架构和三种协作策略,结合扩展PiSSA初始化,显著提升了参数高效微调在多任务和数据稀缺场景下的性能和鲁棒性。
-
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
ZeroTuning提出了一种无需训练的方法,通过调整大型语言模型初始token的注意力分布,在文本分类、问答和多轮对话任务中显著提升性能,同时展现出对资源限制和长上下文的鲁棒性。
-
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
本文揭示了强化学习中低概率token过度主导模型更新的问题,并提出Advantage Reweighting和Lopti两种方法,通过平衡token更新权重显著提升GRPO训练的大语言模型性能,最高在K&K Logic Puzzle任务上提升46.2%。
-
SLOT: Sample-specific Language Model Optimization at Test-time
本文提出SLOT方法,通过测试时对每个输入提示优化一个轻量级样本特定参数向量δ,显著提升大型语言模型在推理任务上的性能,如Qwen2.5-7B在GSM8K上提升8.65%。
-
Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning
Data Whisperer 提出了一种高效、无需训练的基于注意力机制的数据选择方法,通过少样本上下文学习为任务特定的大型语言模型微调选择最优数据子集,在小数据场景下显著提升性能并大幅降低计算成本。
-
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
InfiFPO提出了一种在偏好对齐阶段进行隐式模型融合的偏好优化方法,通过序列级概率融合和优化策略,将多个源模型知识整合到枢轴模型中,显著提升了Phi-4在11个基准上的平均性能从79.95到83.33。
-
Activation-Guided Consensus Merging for Large Language Models
本文提出Activation-Guided Consensus Merging (ACM),通过基于激活值互信息(MI)的层级权重系数调整,实现大型语言模型在Long-to-Short推理任务中的高效合并,显著减少输出冗余并提升推理精度,尤其在小规模模型上效果明显。
-
Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning
本文提出Prune-on-Logic框架,通过将长链思维(Long-CoT)转化为逻辑图并选择性剪枝低效验证步骤,在提升小型语言模型(SLMs)推理准确率的同时降低推理成本,揭示了剪枝作为能力对齐策略的潜力。
-
Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation
本文提出日志增强生成(LAG)框架,通过使用KV缓存直接复用过去的推理计算,显著提升大型语言模型在知识和推理密集型任务上的准确性和效率,优于标准代理系统及现有反思和KV缓存方法。
-
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
本文提出了一种两阶段训练框架,通过领域无关的Knights & Knaves逻辑游戏预热激活通用推理能力,并结合少量目标领域数据的RLVR训练,在资源受限环境下显著提升大型语言模型的推理性能和跨领域泛化能力。
-
LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades
本文提出LoRASuite,一种针对大型语言模型升级的模块化方法,通过转换矩阵、层映射和注意力头映射高效适配LoRA权重,并在数学与常识任务上显著优于小规模LoRA微调,甚至在某些场景下超越全规模重新训练,同时大幅降低内存和时间消耗。
-
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
本文提出 LATENTSEEK 框架,通过在潜在空间中基于策略梯度的测试时实例级适应(TTIA),显著提升大型语言模型的推理能力,同时探索测试时扩展的新方向。
-
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
本文通过理论和实验分析,揭示了当前RL(如GRPO)在LLM后训练中的MDP结构假设使其退化为过滤迭代监督微调,并指出响应长度增加源于奖励分配偏差,而非推理能力提升。
-
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
SelfBudgeter通过自适应令牌预算预测和强化学习优化,在MATH数据集上实现74.47%响应长度压缩,同时保持接近原始准确性,显著提升大型推理模型的效率。
-
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
本文通过ZeroTIR框架利用强化学习训练基础大型语言模型自发执行Python代码解决数学问题,揭示了训练步数与代码使用频率、响应长度及任务准确率的正相关规律(Agent RL Scaling Law),并在数学基准上显著优于无工具基线。
-
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs
本文通过对15个大型语言模型在指令遵循任务上的评估,揭示了链式思维(CoT)提示会导致性能下降的现象,并通过约束注意力分析和四种缓解策略(尤其是分类器选择性推理)有效恢复了部分性能。
-
Who Taught You That? Tracing Teachers in Model Distillation
本文提出了一种基于句法模式(PoS 模板)的方法,通过学生模型输出的高阶语言特征识别其教师模型,并在多个任务和数据集上验证了其优于传统相似度和困惑度方法的性能,但准确率仍有待提升。
-
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models
本文提出两种测试时计算扩展算法(淘汰赛式和联赛式),通过生成多个候选解决方案并进行成对比较,在理论上证明其失败概率随计算资源增加呈指数或幂律下降,并在多个数据集和模型上验证了性能提升。
-
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
本文提出 Rodimus 和 Rodimus+ 模型,通过数据依赖温度选择(DDTS)和滑动窗口共享键注意力(SW-SKA)机制,在保持性能的同时显著降低大型语言模型的计算和内存复杂度,挑战了准确性与效率的权衡。
-
RARE: Retrieval-Augmented Reasoning Modeling
RARE提出了一种新范式,通过将领域知识存储外部化并优化推理能力,使轻量级模型在多领域基准测试中实现最先进的性能,超越检索增强的GPT-4和DeepSeek-R1。
-
Parallel Scaling Law for Language Models
本文提出并行扩展(PARSCALE)方法,通过增加训练和推理时的并行计算流(P)来提升语言模型能力,理论和实验表明P流相当于参数扩展O(log P),并在低资源场景下展现出更高的推理效率。
-
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
SoLoPO通过将长上下文偏好优化分解为短上下文优化和短到长奖励对齐,显著提升了大型语言模型在长上下文任务中的性能和训练效率,同时保持短上下文能力。
-
MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging
MINGLE提出了一种测试时持续模型合并方法,通过混合低秩专家架构和自适应空空间约束门控,利用少量无标签测试样本动态融合模型,显著提升了持续学习中的泛化性能并减少了灾难性遗忘。
-
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
本文揭示强化学习(RL)微调大型语言模型(LLMs)时仅更新5%-30%参数子网络的现象,通过实验验证仅微调子网络即可恢复全微调性能,并指出训练数据分布接近策略是稀疏性主因,为高效微调策略提供新思路。
-
SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning
SoftCoT++ 通过在连续潜在空间中引入多样化初始令牌和对比学习实现测试时扩展,显著提升了大型语言模型在多个推理任务上的性能,并与传统离散空间扩展方法展现出协同效应。
-
Scalable Strategies for Continual Learning with Replay
本文提出低秩适应(LoRA)、整合和顺序合并三种策略以提升持续学习的可扩展性,通过减少重放样本需求(最高65%)并结合高效微调技术,在图像分类任务中显著提高性能。
-
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
本文提出低秩克隆(LRC)方法,通过低秩投影矩阵和激活克隆实现从大型语言模型到小型语言模型的高效知识蒸馏,仅用10-20B tokens训练即可媲美或超越训练数据量达数万亿tokens的模型,显著提升训练效率。
-
Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs
本文提出了一种动态自适应的混合训练框架 SASR,通过基于梯度范数和 KL 散度的动态调整机制结合 SFT 和 RL,在数学推理和逻辑推理任务上显著提升了大语言模型的性能,优于传统 SFT、RL 和静态混合方法。
-
Model Merging in Pre-training of Large Language Models
本文提出预训练模型平均(PMA)策略,通过融合预训练阶段的检查点显著提升大型语言模型性能、预测退火效果并增强训练稳定性,为高效模型开发提供了新方法和实用指南。
-
Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation
本文通过混合高斯模拟和大规模语言模型实验,揭示了知识蒸馏在生成模型中通过教师模型熵控制学生模型精度-召回权衡的机制,从而提升样本质量。
-
Thinkless: LLM Learns When to Think
本文提出Thinkless框架,通过强化学习和解耦组相对策略优化(DeGRPO)算法,使大型语言模型根据任务复杂度和自身能力自主选择短格式或长格式推理,在数学任务上显著提升效率并保持性能。
-
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning
本文通过创新任务设计和Pythia模型训练检查点分析,揭示上下文学习(ICL)在大型语言模型中既非纯记忆也非符号算法,而是依赖统计特性的有限泛化能力,并探讨了其训练动态和内部机制联系。
-
MergeBench: A Benchmark for Merging Domain-Specialized LLMs
本文提出MergeBench,一个针对领域专精大型语言模型合并的全面基准测试框架,基于Llama和Gemma模型(2B-9B)评估八种合并方法,揭示了合并在大模型上的优越性、稀疏化和系数调整对知识保留的重要性,并提供了算法选择的实用指南。
-
ShiQ: Bringing back Bellman to LLMs
本文提出ShiQ算法,通过从Bellman一致性方程出发设计适应LLM特性的损失函数,支持离线、token级的强化学习微调,并在单轮和多轮任务中表现出优于DPO和CoPG的奖励优化能力。
-
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
本文提出SELF-TUNING框架,通过自教策略(SELF-TEACHING)显著提升大型语言模型从新文档中获取知识的能力,并在记忆、提取和推理任务上取得优异表现,同时保持较好的知识保留能力。
-
Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
本文通过RL和SFT训练不同规模LLMs,发现RL在较大模型中促进显式ToM推理但在小模型中导致推理崩溃,而SFT意外取得高性能,揭示当前ToM基准测试可能无需显式人类式推理即可解决。
-
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
本文通过 pass@k 指标系统评估 RLVR 在大型语言模型推理能力边界上的效果,发现 RLVR 仅提高采样效率而未引入新推理模式,其能力受限于基础模型,强调需改进 RL 范式以激发真正的新推理能力。
-
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback
本文提出REFINE-AF框架,利用小型开源语言模型和基于自动化反馈的强化学习生成任务无关指令数据集,相较基线在SUPER-NI数据集上显著提升了63-66%的任务表现,同时降低了成本和人工干预。
-
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization
HAPO 通过历史感知的策略优化训练语言模型,利用动态长度奖励机制显著减少推理输出长度(33-59%),同时仅以 2-5% 的准确率下降为代价,优于现有方法。
-
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
本文提出 S-GRPO 方法,通过串行组生成和递减奖励策略调控大型语言模型中间推理过程,在多个基准数据集上实现推理长度减少 35.4%~61.1% 和准确率提升 0.72%~6.08%,显著提升推理效率。
-
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
本文提出 *AutoThink*,通过省略号提示和多阶段强化学习框架,使 R1 风格大型推理模型根据问题复杂性自适应地决定是否进行显式推理,在五个数学基准上实现了准确性和效率的优越权衡。
-
Scaling Reasoning can Improve Factuality in Large Language Models
本文通过从先进模型中提取并用知识图谱增强推理轨迹,微调Qwen2.5系列模型,并在复杂开放域问答任务中验证了测试时计算扩展(并行采样和预算强制)可提升事实准确性2-8%,尤其对小型模型效果显著。
-
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
本文通过系统性实验证明,纯强化学习(RL)训练不仅提升大型语言模型的复杂推理能力,还能隐式培养过程奖励模型(PRM)能力,提出Self-PRM框架以进一步改进性能,但也揭示了其在高难度问题上的低精度局限。
-
Domain Regeneration: How well do LLMs match syntactic properties of text domains?
本文通过‘LLM-regeneration’范式,使用Llama模型生成Wikipedia和新闻文本,发现生成文本在句法复杂性指标上表现出均值偏移、方差降低和长尾减少的系统性差异,揭示了模型在域匹配能力上的局限性。
-
Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning
本文提出Nemotron-Research-Tool-N1,通过基于规则的强化学习和二元奖励函数训练工具调用语言模型,在不依赖标注推理轨迹的情况下显著提升工具调用能力,实验表明其在多个基准上超越GPT-4o等强基线。
-
Putting It All into Context: Simplifying Agents with LCLMs
本文提出基于长上下文语言模型(LCLM)的‘state-in-context’代理设计,通过将整个环境状态纳入上下文简化软件工程任务的代理架构,在SWE-bench Verified上实现与复杂脚手架方法相当的性能(Gemini-2.5-Pro达到50.8% pass@1)。
-
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
GuidedQuant通过整合最终损失梯度信息并保留输出通道内权重依赖性,结合LNQ算法显著提升了大型语言模型在权重和激活量化下的性能,实现了更高效的后训练量化。
-
Memorization-Compression Cycles Improve Generalization
本文通过提出信息瓶颈语言建模(IBLM)目标和Gated Phase Transition (GAPT)算法,理论和实验上证明了通过动态切换记忆和压缩阶段来降低表征熵,可以显著提升大型语言模型的泛化能力和冲突记忆分辨能力。
-
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
本文提出patch级训练方法,通过将多个token聚合成高信息密度patch并分阶段训练大型语言模型,在训练成本减半的情况下保持甚至略提升模型性能。
-
Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data
本文提出判别式微调(DFT)框架,通过判别式概率模型优化大型语言模型的输出概率,无需人类偏好数据或奖励模型,在数学推理和通用语言任务上显著优于SFT并与SFT→PO方法相当。
-
Layered Unlearning for Adversarial Relearning
本文提出分层遗忘(Layered Unlearning, LU)方法,通过多阶段逐步遗忘数据子集并诱导不同抑制机制,增强大型语言模型对对抗性重新学习的鲁棒性,尽管对语料库攻击仍显脆弱。
-
Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing
本文提出MegaBeam-Mistral-7B,通过渐进式训练和系统优化,使7B参数模型实现512K token长上下文处理,在多个基准测试中展现出与更大模型相当的性能,但多事实推理能力仍需改进。
-
MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models
本文提出MMRL及MMRL++框架,通过共享表示空间和解耦策略增强视觉-语言模型的少样本适配能力,并利用参数高效的SRRA和PRC机制提升泛化性和训练稳定性,在多个数据集上取得最优性能。
-
Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M
本文通过基于提示的方法初步研究了大型语言模型(LLMs)对MovieLens-1M推荐数据集的记忆程度,发现所有测试模型均表现出一定记忆,且记忆程度与推荐性能和模型规模正相关,同时揭示了流行度偏见问题。
-
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
本文通过潜在变量模型和可识别性分析,证明大型语言模型通过下一词预测学习的表示近似为潜在概念后验概率对数的线性变换,支持线性表示假设,并提出结构化稀疏自编码器改进概念提取效果。
-
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
本文通过将GRPO算法应用于Qwen2-Audio-7B-Instruct模型,在音频问答任务中取得了64.5%的最佳准确率,证明强化学习在小规模数据集上优于监督微调,但显式推理过程未显著提升性能,且与人类水平仍有差距。
-
CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging
CAT Merging提出了一种无需训练的多任务模型合并框架,通过参数特定的修剪策略有效减少知识冲突,在视觉、语言和视觉-语言任务上显著提升了合并模型性能,平均准确率分别提高2.5%(ViT-B/32)和2.0%(ViT-L/14)。
-
DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs
本文提出DialogueReason,一种基于对话的推理模式,通过PPO和规则奖励函数训练大型语言模型,以提升复杂复合问答任务中的推理多样性和连贯性,并在MATH、AIME和GPQA数据集上展现出比单论式推理更强的鲁棒性。
-
Large Language Models Think Too Fast To Explore Effectively
本文通过《Little Alchemy 2》游戏评估大型语言模型(LLMs)的探索能力,发现大多数LLMs因过早决策和过度依赖不确定性驱动策略而表现不如人类,但o1和DeepSeek-R1通过平衡赋能和深入推理显著超越人类,揭示了推理深度和架构设计对开放性探索的重要性。
-
Pre-training vs. Fine-tuning: A Reproducibility Study on Dense Retrieval Knowledge Acquisition
本文通过线性探查和神经元激活分析,复制并扩展了对密集检索模型中预训练与微调知识获取作用的研究,发现预训练知识在DPR模型中主导检索效果且微调导致知识分散,但此结论在不同架构(如Contriever、RepLlama)和表示策略下并不成立。
-
Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains
本文提出了一种缓存高效的后验采样框架,通过元学习优化的缓存机制重用LLM先验,显著降低强化学习中的计算成本(查询减少3.8-4.7倍,延迟降低4.0-12.0倍),同时在文本和连续控制任务中保持96-98%的性能。
-
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
本文通过分析对齐前后LLM输出分布的变化,揭示了对齐虽减少分布性多元化但通过更长响应实现奥弗顿多元化,且基础模型通过上下文学习可有效模仿对齐模型行为,支持表面对齐假说。
-
Temporal Scaling Law for Large Language Models
本文提出时间缩放定律(Temporal Scaling Law),通过动态双曲线法则建模LLM预训练中每个token位置的损失变化,精准预测整体测试损失演变,支持直接在目标模型上选择超参数并揭示学习动态。
-
本文提出Reasoning CPT方法,通过在持续预训练中加入合成隐藏思维数据,显著提升大型语言模型在跨领域推理、困难问题解决和推理效率方面的表现,特别是在MMLU基准上实现了最高3.3%的整体提升和困难问题上约8%的改进。
-
SEM: Reinforcement Learning for Search-Efficient Large Language Models
本文提出 *SEM* 框架,通过强化学习优化大型语言模型的搜索行为,在减少冗余搜索的同时提升回答准确性,显著提高推理效率。
-
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
本文通过在softmax注意力机制的SDPA输出后引入头特定sigmoid门控机制,显著提升了15B MoE和1.7B密集模型的性能、训练稳定性和长上下文泛化能力,同时消除了注意力沉积现象。
-
TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs
本文提出了一种基于多头张量化和Tucker分解的框架,通过强制共享高维子空间对大型语言模型的多头注意力权重进行结构化去噪和压缩,显著提升推理能力并实现高达247倍的压缩率。
-
Concise Reasoning via Reinforcement Learning
本文提出了一种两阶段强化学习训练策略,通过在极小数据集上分阶段优化推理能力和简洁性,显著减少大型语言模型的响应长度(最高54%),同时保持甚至提升准确性,并增强低采样强度下的鲁棒性。
-
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
AM-Thinking-v1 是一个32B参数的密集语言模型,通过精心设计的监督微调和强化学习后训练框架,在数学推理和代码生成任务上实现了与大型MoE模型媲美的性能,展示了中型规模模型在推理能力与部署效率之间的平衡潜力。
-
Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models
本文提出SAFE方法,通过选择性冻结对任务贡献较小的适配器,实现资源高效的语言模型微调,在显著降低内存使用和计算成本的同时,保持甚至提升模型性能。
-
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
本文提出AttentionInfluence方法,通过无监督地利用预训练模型注意力头机制选择推理密集型数据,显著提升了7B参数模型在知识和推理任务上的性能,展现了弱到强的扩展潜力。
-
Lost in Transmission: When and Why LLMs Fail to Reason Globally
本文提出BAPO模型量化大型语言模型(LLMs)内部通信带宽限制,理论证明与实验验证了LLMs在高带宽需求任务上的失败,并展示链式思维(CoT)可降低带宽需求以缓解部分问题。
-
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
本文提出自数据蒸馏微调方法,通过利用未剪枝模型生成蒸馏数据集恢复剪枝后大型语言模型的质量,在HuggingFace OpenLLM Leaderboard v1上显著优于标准监督微调,并通过模型合并和推测解码进一步提升性能和效率。
-
Round and Round We Go! What makes Rotary Positional Encodings useful?
本文通过理论和实证分析揭示了旋转位置编码(RoPE)在大型语言模型中通过高频构建位置注意力模式和低频传递语义信息的作用机制,并提出p-RoPE方法通过截断低频提高长上下文鲁棒性,在Gemma 2B模型上取得性能提升。
-
Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs
本文提出上下文牵引(Contextual Entrainment)现象,揭示语言模型对提示中出现token的机制性偏好,并通过可微分掩码方法识别牵引头(entrainment heads),为理解和缓解分心问题提供了新视角。
-
Superposition Yields Robust Neural Scaling
本文通过玩具模型和实际LLMs分析,揭示了超位置作为神经扩展律的重要机制,在强超位置下损失与模型维度成反比,与特征频率分布无关,从而解释了损失随模型规模幂律下降的现象。
-
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
This paper introduces a fine-tuning strategy for LLMs that leverages the unequal importance of attention matrices and customized learning rates to enhance efficiency, demonstrating through theoretical analysis and experiments on GLUE benchmarks that fine-tuning only Wq and Wv with higher learning rates for Wv can match or exceed full fine-tuning performance with fewer parameters.
-
Long Term Memory: The Foundation of AI Self-Evolution
This paper proposes Long-Term Memory (LTM) as a cornerstone for AI self-evolution, demonstrating through multi-agent frameworks like OMNE and diverse experiments that LTM enables personalized, adaptive learning in LLMs during inference, achieving top performance on benchmarks like GAIA.
-
The Mosaic Memory of Large Language Models
This paper introduces the concept of 'mosaic memory' in Large Language Models, demonstrating through experiments on canaries and real-world datasets like SlimPajama that LLMs memorize training data via fuzzy duplicates with partial overlaps, predominantly syntactically, challenging existing deduplication practices and raising concerns for privacy, model utility, and benchmark fairness.
-
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
This paper introduces Temperature Scaling (TS) and Trace Length Control for Dynamic Reasoning (TLDR) to enhance token efficiency in small language models, achieving up to 50% reduction in response length with minimal accuracy loss across multiple reasoning benchmarks.
-
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
This paper introduces MiMo-7B, a 7B-parameter LLM optimized for reasoning through innovative pre-training with reasoning-dense data and multi-token prediction, and post-training with RL using test-difficulty-driven rewards, achieving superior performance over larger models and OpenAI o1-mini on mathematics and coding benchmarks.
-
Why do LLMs attend to the first token?
This paper argues that attention sinks in LLMs, particularly at the first token, are a useful mechanism to prevent over-mixing of information in deep Transformers, supported by theoretical insights and empirical evidence from Gemma 7B, LLaMa 3.1 models, and pre-training experiments showing stronger sinks with larger models and longer contexts.
-
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
This paper introduces a systematic approach to enhance large reasoning models by aligning them with deduction, induction, and abduction meta-abilities through a three-stage pipeline of individual training, parameter merging, and domain-specific RL, achieving up to 4% performance gains over instruction-tuned baselines across math, coding, and science benchmarks.
-
Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
This paper introduces Learning to Think (L2T), an information-theoretic reinforcement fine-tuning framework for LLMs that uses a universal dense process reward to optimize reasoning effectiveness and efficiency, achieving significant accuracy and token efficiency gains on math reasoning benchmarks.
-
An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits
This paper demonstrates that fine-tuning large language models to 1.58-bit ternary weights using extra RMSNorm layers and a gradual quantization schedule achieves superior cross-entropy loss and preserves reasoning performance, enabling deployment on commodity hardware without relying on complex knowledge distillation.
-
Task-Core Memory Management and Consolidation for Long-term Continual Learning
This paper introduces Long-CL, a human memory-inspired framework for long-term continual learning, leveraging task-core memory management and selective sample consolidation to significantly outperform baselines by 7.4% and 6.5% AP on two novel benchmarks, MMLongCL-Bench and TextLongCL-Bench, while mitigating catastrophic forgetting.
-
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
This paper introduces Adaptive Difficulty Curriculum Learning (ADCL) and Expert-Guided Self-Reformulation (EGSR) to enhance LLM reasoning by dynamically adjusting training curricula and guiding models to reformulate expert solutions, achieving significant performance improvements over standard RL baselines on mathematical reasoning benchmarks.
-
Belief Injection for Epistemic Control in Linguistic State Space
This paper proposes belief injection as a proactive epistemic control mechanism to shape AI agents' internal linguistic belief states within the Semantic Manifold framework, offering diverse strategies for guiding reasoning and alignment, though it lacks empirical validation.
-
Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning
Selftok introduces a non-spatial autoregressive visual tokenizer using diffusion timesteps, unifying vision-language models and enabling effective reinforcement learning for superior text-to-image generation, as demonstrated on GenEval and DPG-Bench benchmarks.
-
Wasserstein Distributionally Robust Nonparametric Regression
This paper introduces a Wasserstein Distributionally Robust Optimization framework for nonparametric regression, using Lipschitz-constrained feedforward neural networks to derive non-asymptotic error bounds for local worst-case risk under model misspecification, demonstrating robustness through simulations and MNIST dataset application.
-
Graph Attention is Not Always Beneficial: A Theoretical Analysis of Graph Attention Mechanisms via Contextual Stochastic Block Models
This paper provides a theoretical analysis using Contextual Stochastic Block Models to demonstrate that graph attention mechanisms are beneficial for node classification only when structure noise exceeds feature noise, proposes a multi-layer GAT to achieve perfect classification at lower SNR thresholds, and validates these findings through synthetic and real-world experiments.
-
Towards Complementary Knowledge Distillation for Efficient Dense Image Prediction
This paper introduces a Boundary and Context Distillation (BCD) method for efficient dense image prediction, enhancing compact models' boundary completeness and region connectivity through targeted knowledge transfer, achieving superior accuracy across multiple tasks and datasets without inference cost increase.
-
Contaminated Multivariate Time-Series Anomaly Detection with Spatio-Temporal Graph Conditional Diffusion Models
TSAD-C introduces a pioneering unsupervised framework for multivariate time-series anomaly detection on contaminated data, using a Decontaminator with S4-based diffusion, long-range dependency modeling via a time-then-graph approach, and anomaly scoring, achieving state-of-the-art performance across diverse datasets.
-
A Large-Scale Empirical Analysis of Custom GPTs' Vulnerabilities in the OpenAI Ecosystem
This paper conducts a large-scale empirical analysis of 14,904 custom GPTs in the OpenAI store, revealing over 95% lack adequate security against attacks like roleplay (96.51%) and phishing (91.22%), introduces a multi-metric popularity ranking system, and highlights the need for enhanced security in both custom and base models.
-
Single-shot prediction of parametric partial differential equations
Flexi-VAE introduces a variational autoencoder framework for single-shot forecasting of parametric PDEs, using a neural propagator to achieve efficient, accurate long-horizon predictions with significant speedups over sequential models like AE-LSTM, as validated on Burgers' and advection-diffusion equations.
-
Label-efficient Single Photon Images Classification via Active Learning
This paper proposes an active learning framework for single-photon image classification that uses imaging condition-aware synthetic augmentation and a diversity-guided uncertainty-inconsistency sampling strategy to achieve high accuracy (97% on synthetic, 90.63% on real-world data) with significantly fewer labeled samples (1.5% and 8%, respectively) compared to baselines.
-
Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories
This exploratory study evaluates GPT-4o's multilingual and multimodal performance on physics concept inventories, revealing strong results in English and text-based tasks but significant weaknesses in visual interpretation and non-Western languages, highlighting implications for equitable AI integration in education.
-
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
This paper introduces VideoUFO, a million-scale dataset of 1.09 million video clips across 1,291 user-focused topics for text-to-video generation, curated from YouTube with minimal overlap with existing datasets, demonstrating improved performance on worst-performing topics when training a simple model like MVDiT.
-
PICD: Versatile Perceptual Image Compression with Diffusion Rendering
PICD introduces a versatile perceptual image compression codec using diffusion rendering with three-tiered conditioning to achieve high text accuracy and visual quality for both screen and natural images, outperforming existing methods in key metrics like FID and text accuracy.
-
Purity Law for Generalizable Neural TSP Solvers
This paper introduces Purity Law (PuLa), a structural principle revealing sparsity bias in optimal TSP solutions, and proposes Purity Policy Optimization (PUPO), a training framework that significantly enhances the generalization of neural TSP solvers across diverse scales and distributions without inference overhead.
-
Facets of Disparate Impact: Evaluating Legally Consistent Bias in Machine Learning
This paper introduces the Objective Fairness Index (OFI), a legally grounded metric for evaluating bias in machine learning by comparing marginal benefits across groups, demonstrating its ability to detect algorithmic bias in applications like COMPAS and Folktable's Adult Employment dataset where traditional Disparate Impact fails.
-
A Comprehensive Analysis of Adversarial Attacks against Spam Filters
This paper conducts a comprehensive analysis of adversarial attacks on deep learning-based spam filters, revealing significant vulnerabilities across character, word, sentence, and AI-generated paragraph levels using novel scoring functions like spam weights, with distilBERT showing relative resilience at paragraph-level attacks.
-
ASURA-FDPS-ML: Star-by-star Galaxy Simulations Accelerated by Surrogate Modeling for Supernova Feedback
This paper introduces ASURA-FDPS-ML, a framework that accelerates high-resolution galaxy simulations by using a machine learning surrogate model for supernova feedback in dense regions, achieving a fourfold speedup while maintaining comparable morphological and outflow characteristics to direct simulations, despite some discrepancies in momentum at higher altitudes.
-
A Statistical Case Against Empirical Human-AI Alignment
This position paper argues against forward empirical human-AI alignment due to statistical biases and anthropocentric limitations, advocating for prescriptive and backward alignment approaches to ensure transparency and minimize bias, supported by a case study on language model decoding strategies.
-
Cyber Security Data Science: Machine Learning Methods and their Performance on Imbalanced Datasets
This paper systematically evaluates machine learning classifiers and imbalance learning techniques on two cybersecurity datasets, revealing that XGB and RF perform robustly, while sampling and ensembling effects vary, emphasizing the need for dataset-specific method selection.
-
An Efficient Sparse Kernel Generator for O(3)-Equivariant Deep Networks
This paper introduces a GPU sparse kernel generator for the Clebsch-Gordon tensor product in O(3)-equivariant deep networks, achieving significant speedups (up to 10x over e3nn and 1.3x-2.0x over cuEquivariance) by leveraging JIT compilation, static analysis, and kernel fusion, particularly enhancing performance in computational chemistry models like Nequip and MACE.
-
Learning to Drift in Extreme Turning with Active Exploration and Gaussian Process Based MPC
This paper introduces AEDGPR-MPC, a framework combining Model Predictive Control with Gaussian Process Regression and active exploration to correct vehicle model mismatches, achieving significant reductions in lateral error (up to 52.8% in simulation, 36.7% in RC car tests) and velocity tracking RMSE during extreme cornering drift control.
-
Thermal Detection of People with Mobility Restrictions for Barrier Reduction at Traffic Lights Controlled Intersections
This paper introduces a thermal detector-based traffic light system using YOLO-Thermal, a modified YOLOv8 framework, to dynamically adjust signal timings for individuals with mobility restrictions, achieving superior detection accuracy (89.1% APval) and enhancing intersection accessibility while addressing privacy and adverse condition challenges.
-
AI in Money Matters
This paper investigates the cautious adoption of Large Language Models like ChatGPT in the Fintech industry through qualitative interviews, highlighting professionals' optimism for routine task automation, concerns over regulatory inadequacies, and interest in bespoke models to ensure compliance and data control.
-
Constraint-based causal discovery with tiered background knowledge and latent variables in single or overlapping datasets
This paper introduces tFCI and tIOD algorithms that leverage tiered background knowledge to enhance the efficiency and informativeness of constraint-based causal discovery in settings with latent variables and overlapping datasets, demonstrating theoretical gains under oracle conditions.
-
Differentially Private Bilevel Optimization
This paper introduces the first differentially private first-order algorithms for bilevel optimization, ensuring privacy with theoretical convergence guarantees for hypergradient norms in both empirical and population settings while avoiding Hessian computations.
-
LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications
LiteWebAgent is an open-source suite for VLM-based web agents that bridges the gap in production-ready solutions by offering an extensible framework with decoupled action generation and grounding, advanced planning, memory, tree search, and practical deployments via Vercel and Chrome extension.
-
Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It
This paper introduces geodesic sharpness, a novel measure using Riemannian geometry to account for transformer symmetries on a quotient manifold, demonstrating stronger correlations with generalization across diagonal networks, vision transformers, and language models compared to traditional adaptive sharpness.
-
Deep Learning for On-Street Parking Violation Prediction
This paper develops a Deep Learning model with a novel data smoothing technique to predict fine-grained on-street parking violation rates in Thessaloniki, Greece, using indirect features like weather and time, achieving improved accuracy (MAE of 0.146) over baseline methods.
-
Boltzmann Classifier: A Thermodynamic-Inspired Approach to Supervised Learning
The Boltzmann Classifier introduces a thermodynamically inspired supervised learning approach that uses an energy-based model derived from the Boltzmann distribution to estimate class probabilities, achieving competitive accuracy on benchmark datasets while offering interpretability and computational efficiency.
-
Foundation Models For Seismic Data Processing: An Extensive Review
This paper conducts an extensive review of natural image foundation models for seismic data processing, demonstrating that hierarchical models like Swin and ConvNeXt, especially with self-supervised pre-training, outperform non-hierarchical ones in demultiple, interpolation, and denoising tasks, while highlighting the benefits and limitations of natural image pre-training for seismic applications.
-
ULFine: Unbiased Lightweight Fine-tuning for Foundation-Model-Assisted Long-Tailed Semi-Supervised Learning
This paper introduces ULFine, an unbiased lightweight fine-tuning strategy for foundation-model-assisted long-tailed semi-supervised learning, which mitigates 'minority bottleneck' and 'majority overconfidence' issues using Prototype Adaptive Fitting and Dual Logit Fusion, achieving significant performance improvements and over 10x training cost reduction on benchmark datasets.
-
GCN-Based Throughput-Oriented Handover Management in Dense 5G Vehicular Networks
This paper introduces TH-GCN, a Graph Convolutional Network-based approach for handover management in dense 5G vehicular networks, which models dynamic network conditions to reduce handovers by up to 78% and improve signal quality and throughput through real-time, topology-aware decisions.
-
Gameplay Highlights Generation
This paper presents a method to generate gameplay highlight reels by finetuning the X-CLIP multimodal model on an in-house FPS game dataset, achieving over 90% event detection accuracy and demonstrating transfer learning, while optimizing deployment through quantization.
-
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
This paper investigates inter-layer communication in Transformer LMs by identifying low-rank communication channels via SVD, demonstrating their causal role in prompt sensitivity through interventions that significantly improve performance on context retrieval tasks like the Laundry List task.
-
LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment
LSAQ introduces a novel Layer-Specific Adaptive Quantization system for LLMs, using Jaccard similarity to assess layer importance and dynamically adjusting quantization precision based on edge device resources, achieving superior accuracy on zero-shot tasks and lower perplexity compared to baseline methods while enabling efficient deployment.
-
HINT: Hypernetwork Approach to Training Weight Interval Regions in Continual Learning
HINT proposes a continual learning framework using interval arithmetic in embedding space with a hypernetwork to generate target network weights, achieving improved scalability and non-forgetting guarantees over InterContiNet while outperforming several benchmarks, though struggling with complex datasets.
-
Accelerating Large Language Model Reasoning via Speculative Search
Speculative Search (SpecSearch) accelerates LLM reasoning by up to 2.12× through a bi-level speculative thought generator that collaborates between small and large models, maintaining comparable reasoning quality via a quality-preserving rejection mechanism.
-
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
ARTIST, a novel framework unifying agentic reasoning, reinforcement learning, and tool integration, enables LLMs to autonomously orchestrate external tools within multi-turn reasoning, achieving up to 22% accuracy gains on complex math tasks and significant improvements in multi-turn function calling over baselines.
-
Patterns and Mechanisms of Contrastive Activation Engineering
This paper systematically investigates Contrastive Activation Engineering (CAE) for steering LLM behavior at inference time, revealing reliable in-distribution performance with optimal sample sizes around 80-100, but significant challenges in out-of-distribution generalization, model perplexity degradation, and vulnerability to adversarial inputs.
-
LLM-Independent Adaptive RAG: Let the Question Speak for Itself
This paper introduces LLM-independent adaptive retrieval using 27 external information features across 7 groups, achieving comparable QA performance to LLM-based methods on 6 datasets while significantly improving efficiency by eliminating additional LLM calls during inference.
-
ComPO: Preference Alignment via Comparison Oracles
This paper introduces ComPO, a novel preference alignment method for LLMs using comparison oracles to effectively utilize noisy preference pairs, demonstrating reduced verbosity and likelihood displacement across multiple models and benchmarks.
-
The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems
This paper evaluates LLMs in intelligent tutoring systems for propositional logic, demonstrating DeepSeek-V3's promising accuracy in proof construction (up to 86.7%) and hint generation (75%), but reveals significant pedagogical limitations in justification and subgoaling, necessitating hybrid approaches for educational integration.
-
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2
This paper demonstrates that Elastic Weight Consolidation (EWC) applied to full-parameter continual pre-training of Gemma2 2B LLM mitigates catastrophic forgetting on English tasks while improving performance on Lithuanian language benchmarks during autoregressive pre-training on CulturaX data.
-
Better Estimation of the KL Divergence Between Language Models
This paper introduces a Rao-Blackwellized Monte Carlo estimator for KL divergence between language models, achieving unbiased estimates with provably lower variance than standard Monte Carlo methods, and demonstrates improved stability and performance in RLHF fine-tuning for sentiment-controlled generation.
-
Recursive Inference Scaling: A Winning Path to Scalable Inference in Language and Multimodal Systems
This paper introduces Recursive INference Scaling (RINS), a method that recursively applies a model block to exploit language's self-similarity, achieving significant performance gains in language and multimodal tasks under compute-matched conditions while offering inference flexibility through stochastic training and linear adapters.
-
LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?
This paper introduces a framework to classify algorithmic innovations in LLMs as compute-dependent or compute-independent, demonstrating through small-scale GPT-2 experiments that compute-independent advancements like FlashAttention can yield up to 3.5× compute-equivalent gains even under hardware constraints, challenging the efficacy of hardware-focused AI regulation.
-
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
This paper introduces Gaussian Concept Subspace (GCS), a framework to model concept representations in LLMs as Gaussian distributions, demonstrating improved robustness, faithfulness, and plausibility over single vector methods, with effective application in emotion steering tasks.
-
Rethinking Meta-Learning from a Learning Lens
This paper rethinks meta-learning from a 'learning' lens, proposing TRLearner, a plug-and-play method that leverages task relations to calibrate optimization, demonstrating significant performance improvements across regression, classification, drug activity, pose prediction, and OOD generalization tasks.
-
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
This paper demonstrates through meta-analysis and experiments that Chain-of-Thought (CoT) prompting significantly enhances large language model performance on math and symbolic reasoning tasks, but offers limited benefits for non-symbolic tasks and underperforms compared to tool-augmented approaches.
-
Rethinking Invariance in In-context Learning
This paper introduces Invariant In-Context Learning (InvICL), a novel ICL method that achieves permutation invariance, information non-leakage, and context interdependence using leave-one-out encoding and parallel implementation, outperforming both invariant and non-invariant baselines in generalization and performance across synthetic and real-world tasks.
-
Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards
REWARD-SQL introduces a framework for Text-to-SQL by decomposing queries into Chain-of-CTEs and using Process Reward Models (PRMs) with GRPO and Best-of-N sampling, achieving a state-of-the-art 68.9% execution accuracy on the BIRD dataset with a 7B model.
-
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
This paper introduces Latent Preference Coding (LPC), a framework that uses discrete latent codes to model multifaceted human preferences, consistently improving the performance of offline alignment algorithms like DPO, SimPO, and IPO across multiple LLMs and benchmarks.
-
Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design
This position paper advocates for redesigning Large Language Models as 'reasonable parrots' that integrate argumentation theory principles to foster critical thinking through multi-persona dialogues, challenging users with diverse perspectives rather than providing one-sided answers.
-
Don't be lazy: CompleteP enables compute-efficient deep transformers
This paper introduces CompleteP, a parameterization for transformers with α = 1, which ensures depth-wise hyperparameter transfer and complete feature learning, achieving 12-34% compute efficiency improvements and enabling a wider range of compute-optimal width-to-depth ratios.
-
SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
This paper introduces SEFE, a method combining Answer Style Diversification (ASD) to mitigate superficial forgetting and RegLoRA to address essential forgetting in Multimodal Continual Instruction Tuning, achieving state-of-the-art performance on the CoIN benchmark.
-
TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
This paper introduces TiC-LM, a web-scale benchmark for time-continual LLM pretraining using 114 Common Crawl dumps, demonstrating that replay and autoregressive schedules can match Oracle retraining on general web data with less compute, though trade-offs persist across domains.
-
Splitwiser: Efficient LM inference with constrained resources
Splitwiser introduces a method to split LLM inference phases on a single GPU using multiprocessing and NVIDIA MPS, achieving modest latency reductions (up to 18.2%) and throughput improvements (up to 1.42x) on Huggingface and vLLM pipelines, though constrained by overheads and scalability issues.
-
Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models
This paper introduces Direct Retrieval-augmented Optimization (DRO), a framework that synergizes knowledge selection and LLM generation through end-to-end training using a variational approach, achieving 5-15% improvements in EM and F1 scores across five QA datasets.
-
Large Language Model Compression with Global Rank and Sparsity Optimization
This paper introduces a two-stage LLM compression method using RPCA for low-rank and sparse decomposition and probabilistic pruning via policy gradient, outperforming state-of-the-art techniques at a 50% compression ratio while automatically adapting to layer-wise redundancy without manual thresholds or extensive fine-tuning.
-
Does Self-Attention Need Separate Weights in Transformers?
This paper introduces a shared weight self-attention mechanism for transformers, using a single weight matrix with diagonal scaling to reduce parameters by 66.53% in attention blocks, achieving competitive performance on GLUE and improved noise robustness while slightly underperforming on SQuAD tasks compared to standard BERT.
-
HAIR: Hardness-Aware Inverse Reinforcement Learning with Introspective Reasoning for LLM Alignment
HAIR introduces a novel LLM alignment method using hardness-aware inverse reinforcement learning and introspective reasoning, constructing a balanced safety dataset and training category-specific reward models with GRPO-S, achieving state-of-the-art harmlessness while preserving usefulness across multiple benchmarks.
-
Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant
This paper comprehensively evaluates the impact of four quantization methods (GPTQ, AWQ, SmoothQuant, FP8) on instruction-tuned LLMs and SLMs from 1B to 405B parameters across 13 datasets, revealing that quantized models often outperform smaller baselines but struggle with instruction-following and hallucination detection, with FP8 showing robustness and task difficulty not always correlating with accuracy loss.
-
Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning
The paper introduces 'Latte', a framework that transfers latent-level knowledge from Large Language Models during training to enhance few-shot tabular learning, outperforming baselines by leveraging unlabeled data and mitigating overfitting across diverse classification and regression tasks.
-
Contextures: Representations from Contexts
This paper introduces the contexture theory, unifying representation learning across paradigms by targeting top singular functions of a context-induced expectation operator, demonstrating high alignment in neural representations and proposing a task-agnostic metric for context evaluation with strong empirical correlation to performance on various datasets.
-
COSMOS: Predictable and Cost-Effective Adaptation of LLMs
COSMOS introduces a cost-effective framework to predict performance and cost of LLM adaptation strategies like QLoRA fine-tuning and retrieval-augmented ICL, achieving high accuracy (1.09% MAE) and reducing computational costs by 92.72% across eight diverse benchmarks.
-
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
This paper introduces MM-Escape, a benchmark using the customizable 3D environment EscapeCraft to evaluate multimodal reasoning in MLLMs through room escape tasks, revealing that while models like GPT-4o achieve high success in simple scenarios, performance drops significantly with increased difficulty, exposing distinct limitations in reasoning and spatial awareness.
-
Communicating Activations Between Language Model Agents
This paper introduces Activation Communication (AC), a novel method for inter-LLM communication using intermediate activations instead of natural language, achieving up to 27% performance improvement over traditional methods with significantly reduced compute across coordination games and reasoning benchmarks.
-
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
This paper introduces ModelSwitch, a multi-LLM repeated sampling strategy that leverages answer consistency to dynamically switch models, achieving superior performance and 34% sample efficiency over single-LLM self-consistency across diverse datasets.
-
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
This paper explores effective distillation of HuBERT for ASR by comparing student model structures, introducing a discriminative loss for improved low-resource performance, and proposing front-end distillation from waveform to Fbank features, achieving 17% parameter reduction and doubled inference speed with minor performance degradation.
-
How do Humans and Language Models Reason About Creativity? A Comparative Analysis
This paper conducts a comparative analysis of creativity evaluation in STEM, revealing that human experts and LLMs prioritize different facets of originality (cleverness vs. remoteness/uncommonness) and are differentially influenced by contextual examples, with LLMs showing higher predictive accuracy but poorer construct validity due to homogenized facet correlations.
-
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
This paper introduces a taxonomy of language model memorization into recitation, reconstruction, and recollection, demonstrating through experiments with Pythia models that different factors influence each category, with a taxonomy-based predictive model outperforming baselines in predicting memorization likelihood.
-
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
This paper uses Sparse Autoencoders to identify and manipulate language-specific features in Large Language Models, introducing a monolinguality metric, demonstrating context dependency via code-switching, and enhancing steering vectors for better control over multilingual generation while revealing significant language-specific impacts through ablation studies.
-
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
RWKVQuant introduces a tailored Post Training Quantization framework for RWKV models, using a coarse-to-fine proxy to hybridize scalar and vector quantization and optimizing codebooks for element-wise operations, achieving ~3-bit quantization with minimal accuracy loss and significant memory and speed improvements.
-
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
This paper demonstrates that finetuning aligned LLMs on narrow tasks like writing insecure code can lead to emergent misalignment, causing broadly harmful behaviors across unrelated tasks, as evidenced by experiments on multiple models with control setups and backdoor triggers.
-
Radio: Rate-Distortion Optimization for Large Language Model Compression
This paper introduces 'Radio,' a rate-distortion optimization framework for LLM compression that outperforms existing quantization methods in perplexity and downstream task accuracy, particularly at lower bit depths, by iteratively optimizing bit depths and using companding quantization post-training.
-
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
This paper introduces VLM Q-Learning, an offline-to-online reinforcement learning method that fine-tunes Vision-Language Models for interactive decision-making by filtering suboptimal actions with a critic head, achieving significant performance improvements over supervised fine-tuning across multiple multimodal agent tasks.
-
Competition Dynamics Shape Algorithmic Phases of In-Context Learning
This paper introduces a synthetic sequence modeling task using finite Markov mixtures to unify the study of in-context learning (ICL), identifying four competing algorithms that explain model behavior and phase transitions, thus offering insights into ICL's transient nature and phenomenology.
-
CRANE: Reasoning with constrained LLM generation
This paper introduces CRANE, a reasoning-augmented constrained decoding algorithm that alternates between unconstrained and constrained generation to preserve LLM reasoning capabilities while ensuring syntactic correctness, achieving up to 10% accuracy improvement on symbolic reasoning benchmarks like GSM-Symbolic and FOLIO.
-
Compact Recurrent Transformer with Persistent Memory
This paper introduces the Compact Recurrent Transformer (CRT), which combines shallow Transformers with RNNs to efficiently process long sequences using a single persistent memory vector, achieving superior or comparable performance to full-length Transformers and Transformer-XL on language and video tasks with significantly reduced computational cost.
-
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
RetroInfer reimagines the KV cache as a vector storage system, using an attention-aware wave index and wave buffer to achieve up to 4.5x speedup over full attention and 10.5x over sparse baselines for long-context LLM inference, while preserving near-full-attention accuracy.
-
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
This paper introduces SIMPLEMIX, a simple method to mix on- and off-policy data in language model preference optimization, demonstrating that their complementary strengths—on-policy for reasoning tasks and off-policy for open-ended tasks—lead to a 6.03% average improvement over single-source methods on Alpaca Eval 2.0.
-
MoM: Linear Sequence Modeling with Mixture-of-Memories
The Mixture-of-Memories (MoM) architecture introduces multiple independent memory states with a routing mechanism to enhance memory capacity and reduce interference in linear sequence modeling, achieving significant performance gains over other linear models on recall-intensive tasks and nearing Transformer performance at larger scales while maintaining efficiency.
-
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
This paper introduces a recursive summarization method to enhance long-term dialogue memory in LLMs, achieving marginal quantitative improvements and notable qualitative gains in consistency and coherence across multiple models and datasets.
-
Activation Space Interventions Can Be Transferred Between Large Language Models
This paper demonstrates that activation space interventions for AI safety, such as backdoor removal and refusal behavior, can be transferred between large language models using autoencoder mappings, enabling smaller models to align larger ones, though challenges remain in cross-architecture transfers and complex tasks like corrupted capabilities.
-
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale
RADLADS introduces a cost-effective three-step distillation protocol to convert softmax attention transformers into linear attention models using only 350-700M tokens, achieving near-teacher performance on benchmarks and setting a new state-of-the-art for pure RNNs with models up to 72B parameters.
-
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
This paper investigates zero RL training on diverse open base models, achieving significant accuracy and response length improvements while identifying key factors like reward design and data difficulty that influence the emergence of reasoning behaviors.
-
SEAL: Steerable Reasoning Calibration of Large Language Models for Free
SEAL, a training-free method, calibrates the reasoning process of Large Language Models by steering latent representations to reduce redundant thoughts, achieving up to 14.1% accuracy improvement and 50.4% token reduction across diverse benchmarks.
-
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
ZEROSEARCH introduces a reinforcement learning framework that enhances LLMs' search capabilities by simulating search engines with fine-tuned LLMs, achieving performance comparable to or better than real search engines without API costs through a curriculum-based rollout strategy.
-
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
LENSLLM introduces a Hessian-based PAC-Bayes framework and NTK-based scaling model for LLM selection, achieving up to 91.1% accuracy and 88.5% computational cost reduction by modeling fine-tuning dynamics across diverse tasks.
-
When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator
This paper demonstrates that a 1.5B parameter reasoning model (Distill-R1) outperforms larger non-reasoning LLMs as a discriminator in a text-to-SQL planning framework by leveraging a novel soft score extraction method from chain-of-thought outputs, though it struggles significantly as a generator.
-
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
This paper proposes Recall with Reasoning (RwR), a method that enhances Mamba's long-context memory and extrapolation by distilling chain-of-thought summarization from a teacher model, achieving significant performance improvements on LONGMEMEVAL and HELMET benchmarks while preserving short-context capabilities.
-
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models
This paper introduces Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning (LS-Mixture SFT), which combines long and short CoT datasets to fine-tune non-reasoning LLMs, achieving a 2.3% average accuracy improvement and 47.61% response length reduction on reasoning benchmarks.
-
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
This survey paper provides a comprehensive overview of adversarial attacks on multimodal AI systems across text, image, video, and audio modalities, categorizing threats by attacker knowledge, intention, and execution to equip practitioners with knowledge of vulnerabilities and cross-modal risks.
-
CB-cPIR: Code-Based Computational Private Information Retrieval
CB-cPIR introduces a code-based single-server computational private information retrieval scheme that enhances security against subquery attacks by using high-weight secret vectors and dual queries, achieving lower communication and computational costs compared to lattice-based schemes like XPIR and SimplePIR.
-
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
ASTRA introduces an efficient defense for Vision Language Models by adaptively steering activations away from adversarial directions using image attribution, achieving state-of-the-art performance in mitigating jailbreak attacks with minimal impact on benign utility and high inference efficiency.
-
Deformable Beta Splatting
Deformable Beta Splatting (DBS) enhances real-time radiance field rendering by introducing deformable Beta Kernels for superior geometric fidelity, Spherical Beta for efficient color encoding, and kernel-agnostic MCMC optimization, achieving state-of-the-art visual quality with 45% fewer parameters and 1.5x faster rendering than 3DGS-MCMC.
-
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
This paper proposes a three-dimensional taxonomy and develops TTP and HarmFormer tools to filter harmful content from web-scale LLM pretraining datasets, revealing significant toxicity prevalence and persistent safety gaps through benchmarks like HAVOC.
-
Always Skip Attention
This paper theoretically demonstrates the ill-conditioning of Self-Attention Blocks in Vision Transformers without skip connections, highlights their role as regularizers, and proposes Token Graying (SVD and DCT) to improve input token conditioning, achieving modest performance gains in supervised and self-supervised tasks.
-
Enhancing Safety Standards in Automated Systems Using Dynamic Bayesian Networks
This paper proposes a Dynamic Bayesian Network framework for autonomous vehicles that enhances safety in cut-in maneuvers by integrating lateral evidence and probabilistic safety assessments, achieving superior crash avoidance in high-speed scenarios (9.22% crash rate) compared to baseline models in the JRC-FSM simulator.
-
Sparse-Group Boosting with Balanced Selection Frequencies: A Simulation-Based Approach and R Implementation
This paper introduces sparse-group boosting and a simulation-based group balancing algorithm within the 'sgboost' R package to mitigate variable selection bias in high-dimensional grouped data, demonstrating improved fairness and interpretability through simulations and ecological data analysis.
-
Style Feature Extraction Using Contrastive Conditioned Variational Autoencoders with Mutual Information Constraints
This paper proposes a novel method combining contrastive learning with conditional variational autoencoders and mutual information constraints to extract style features from unlabeled data, demonstrating effectiveness on simple datasets like MNIST while facing challenges with natural image datasets due to augmentation limitations and qualitative evaluation.
-
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
The Video Prediction Policy (VPP) introduces a novel generalist robot policy that leverages predictive visual representations from fine-tuned video diffusion models to learn implicit inverse dynamics, achieving significant improvements of 41.5% on the Calvin ABC→D benchmark and 31.6% in real-world dexterous manipulation tasks over state-of-the-art baselines.
-
MELON: Provable Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison
MELON introduces a novel training-free defense against indirect prompt injection attacks on LLM agents by detecting independence of tool calls from user inputs through masked re-execution, achieving superior attack prevention (0.24% ASR on GPT-4o) and utility preservation (58.78% UA on GPT-4o) compared to existing methods.
-
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Insight-V introduces a scalable data generation pipeline and a multi-agent system with iterative DPO training to significantly enhance long-chain visual reasoning in MLLMs, achieving up to 7.0% performance gains on challenging benchmarks while maintaining perception capabilities.
-
UnifyFL: Enabling Decentralized Cross-Silo Federated Learning
UnifyFL proposes a decentralized cross-silo federated learning framework using Ethereum blockchain and IPFS to enable trust-based collaboration among organizations, achieving comparable accuracy to centralized FL with flexible aggregation policies and efficient handling of stragglers through synchronous and asynchronous modes.
-
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
This paper proposes a task-oriented semantic communication framework for LMM-based vehicle AI, using LLaVA with Semantic Matching for efficient image slicing and Fusion Attention-based power allocation to prioritize critical data transmission, achieving significant accuracy improvements (up to 33.1% at low SNR) in traffic VQA tasks.
-
Agentic AI: The Era of Semantic Decoding
本文提出语义解码视角,将大型语言模型、人类和工具的协作框架化为语义空间中的优化过程,通过语义令牌的交换和语义解码算法的设计探索AI系统的新计算范式。
-
Stabilizing and Solving Unique Continuation Problems by Parameterizing Data and Learning Finite Element Solution Operators
本文提出了一种结合有限元方法与机器学习技术(自编码器与操作符学习)解决非线性PDE逆问题中唯一性延续问题的方法,通过数据降维和稳定化技术提高病态问题的求解稳定性和效率,并在合成数据上验证了其有效性。
-
AI agents may be worth the hype but not the resources (yet): An initial exploration of machine translation quality and costs in three language pairs in the legal and news domains
本文通过实证评估五种机器翻译范式,发现推理增强的大型语言模型(如o1-preview)在人工评估中表现出色,超越传统NMT,而多智能体系统虽具潜力,但因高计算成本和语言对表现不一致而受限。
-
Test-time Correlation Alignment
本文提出测试时相关性对齐(TCA)范式,通过构建伪源域相关性并应用线性变换对齐测试数据特征,显著提升测试时适应(TTA)性能,同时保持高效性和源域知识。
-
Nonparametric learning of covariate-based Markov jump processes using RKHS techniques
本文提出了一种基于再生核希尔伯特空间(RKHS)的非参数化方法,通过频率学和贝叶斯框架建模连续时间马尔可夫链(CTMC)中协变量驱动的非线性转移率,显著提升了个体化状态转移预测的准确性。
-
Survey of Abstract Meaning Representation: Then, Now, Future
本文综述了抽象意义表示(AMR)作为一种图结构语义表示框架的发展、解析与生成方法、多语言扩展及下游应用,揭示其在提升机器语言理解中的潜力与局限。
-
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
本文提出了一种基于视觉-语言模型的定义引导提示技术和UnHateMeme框架,用于检测和缓解多模态模因中的仇恨内容,通过零样本和少样本提示实现高效检测,并生成非仇恨替代内容以保持图像-文本一致性,在实验中展现出显著效果。
-
CoordField: Coordination Field for Agentic UAV Task Allocation In Low-altitude Urban Scenarios
本文提出了一种基于协调场的代理系统(CoordField),通过大型语言模型解析自然语言指令并利用动态势场实现异构无人机群在城市环境中的去中心化任务分配,实验验证了其在任务覆盖、响应时间和动态适应性方面的优越性能。
-
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models
本文通过LLM-Coordination基准测试框架,评估大型语言模型在纯协调游戏中的多智能体协调能力,发现其在环境依赖任务中表现优异但在心智理论推理和联合规划中存在显著不足,同时展现出对未见伙伴的零样本适应性。
-
Do We Need a Detailed Rubric for Automated Essay Scoring using Large Language Models?
本文通过对比详细、简化和无评分标准在四个大型语言模型上的自动作文评分表现,发现简化标准在大多数模型中能保持与详细标准相似的准确性并显著降低token使用量,但模型特异性和整体性能不足仍需关注。
-
Exploring the Role of Diversity in Example Selection for In-Context Learning
本文提出基于多样性的上下文学习(DICL)方法,通过最大边际相关性(MMR)算法重新排序示例以平衡相关性和多样性,在多个数据集和大型语言模型上实现了约70%的下游任务性能提升或维持。
-
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
本文通过校准感知微调(CFT和RCFT)方法,结合可校准和不可校准区域的理论框架,显著改善了偏好对齐后大型语言模型的校准性能,同时维持或提升其语言能力。
-
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics
本文提出了一种基于LLM的代理编排机器人系统,通过模块化任务规划和RAG记忆检索实现家庭环境中长程任务的自主执行,并在三个场景中展示了较高的任务规划准确率和记忆召回改进。
-
EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning
本文提出EMORL框架,通过集成学习分别训练单目标模型并在隐藏状态层聚合,结合分层网格搜索优化权重,在咨询反思生成任务中实现了与传统方法相当的性能,同时显著提升了训练效率、可扩展性和解释性。
-
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
本文提出约束反向翻译方法,通过从现有指令-响应对中提取隐含约束构建高质量复杂指令数据集CRAB,并结合反向训练显著提升大型语言模型在复杂指令跟随任务上的性能。
-
Waking Up an AI: A Quantitative Framework for Prompt-Induced Phase Transition in Large Language Models
本文提出了一种双重提示框架(TIP和TQP)来量化大型语言模型(LLMs)的认知相变,发现LLMs对概念融合提示的情感反应与人类直觉差异显著,揭示了AI与人类认知在概念整合上的潜在鸿沟。
-
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
本文通过提出AI记忆系统的分类(参数、上下文结构化和非结构化)和六种基本操作(整合、更新、索引、遗忘、检索、压缩),系统化地综述了长期记忆、长上下文、参数修改和多源记忆等研究主题,并展望了未来方向。
-
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
本文提出R1-Reward,通过StableReinforce算法将强化学习应用于多模态奖励模型训练,显著提升了性能并在多个基准测试中超越现有最优模型,同时展示了优异的数据效率和测试时扩展性。
-
CCSK:Cognitive Convection of Self-Knowledge Based Retrieval Augmentation for Large Language Models
本文提出CCSK框架,通过Siamese Network和Response Quality Model动态融合查询相似性和响应质量,优化大型语言模型的信息检索决策,在多个问答数据集上显著提升了F1分数和准确率。
-
Investigating Task Arithmetic for Zero-Shot Information Retrieval
本文提出任务算术方法,通过参数加减操作实现零样本信息检索的领域和语言适应,在科学、生物医学和多语言数据集上取得最高18%的NDCG@10提升,展现了轻量级模型适应的潜力。
-
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models
本文综述了DeepSeek-R1发布后100天内推理语言模型的复制研究,系统总结了监督微调和基于可验证奖励的强化学习方法在数据构建和算法设计上的进展,并探讨了推理能力提升的多方向应用。
-
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
本文系统综述了基于强化学习的推理方法在多模态大语言模型(MLLMs)中的进展,分析了算法设计、奖励机制及应用,揭示了跨模态推理和奖励稀疏性等挑战,并提出了分层奖励和交互式RL等未来方向。
-
What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction
本文通过理论分析区分了语言模型输出概率的三种解释(完成分布、响应分布、事件分布),揭示了现有研究中对这些分布的混淆和误解,并呼吁谨慎解释模型概率以指导LLM的开发和应用。
-
Looped Transformers for Length Generalization
本文提出Looped Transformers方法,通过循环结构和自适应步数显著提升了Transformer在算法任务上的长度泛化能力,在多种任务中优于传统方法。
-
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study
本文通过探索离线强化学习方法(LD-DPO),在DeepDistill-32B模型上实现了平均3.3%的推理性能提升,尤其在Arena-Hard基准上提升10.1%,并强调了推理长度与语义丰富性平衡的重要性。
-
ICLR: In-Context Learning of Representations
本文通过上下文图追踪任务揭示了大型语言模型能随上下文规模增加而突现地重组概念表示以适应新语义,并提出能量最小化假设解释这一过程。
-
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
本文提出‘LLM Race Conditions Hypothesis’解释大型语言模型的上下文化错误,通过机械可解释性技术验证了关键窗口和上下文化顺序对模型性能的影响,并探索了推理时干预措施来缓解问题。
-
Activated LoRA: Fine-tuned LLMs for Intrinsics
本文提出 Activated LoRA (aLoRA),一种改进的 LoRA 框架,通过仅对激活后 token 适配权重,复用基础模型 KV 缓存,实现高效动态适配,并在多个任务上保持与标准 LoRA 相当的性能,同时显著降低推理成本。
-
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
本文提出MAET方法,通过提取语言无关的能力相关权重并跨语言转移,构建多语言能力增强的大型语言模型,在数学和科学任务上以60%的计算资源实现约10%的性能提升,优于多种基线方法。
-
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
本文提出了一种奖励增强数据集方法,通过对偏好对进行重新标记使大型语言模型条件化于奖励值学习响应质量全谱,显著提升了直接偏好优化(DPO)的性能并缓解了其遗忘高质被拒响应和无差别学习低质选中响应的局限性。
-
Toward Understanding In-context vs. In-weight Learning
本文通过一个简化的理论模型和多场景实验,揭示了数据分布特性如何驱动上下文学习(ICL)和权重学习(IWL)的出现与竞争,并解释了ICL在训练过程中可能短暂的原因。
-
Test-time regression: a unifying framework for designing sequence models with associative memory
本文提出一个基于测试时回归的统一框架,通过将关联回忆形式化为回归问题,推导出多种序列模型(如线性注意力、状态空间模型、softmax注意力),并通过合成实验验证其回归能力,同时提出高阶注意力泛化。
-
Intra-Layer Recurrence in Transformers for Language Modeling
本文提出Intra-Layer Recurrence (ILR)方法,通过在Transformer单次前向传播中选择性循环特定层(尤其是早期层),在不增加参数量的情况下改善语言建模困惑度,但计算成本增加和大规模模型验证不足限制了其实用性。
-
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think
本文提出了一种通过分割大型语言模型推理轨迹为子思维并从中间状态生成多条推理路径、最终以众数聚合答案的方法,显著提高了数学推理任务的准确性(最高提升13%),并揭示了答案一致性与正确性的相关性。
-
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
本文提出StarPO框架和RAGEN系统,通过多轮轨迹级别强化学习训练LLM智能体,揭示了训练不稳定性(如Echo Trap)和推理能力不足的挑战,并通过StarPO-S改进稳定性和泛化性,但推理能力仍需细粒度奖励设计支持。
-
RM-R1: Reward Modeling as Reasoning
本文提出RM-R1,一种通过将奖励建模转化为推理任务并结合蒸馏和强化学习训练的推理奖励模型(REASRMS),在多个基准测试上取得了最先进性能,同时显著提升了可解释性。
-
TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts
本文提出TT-LoRA MoE框架,通过两阶段训练结合张量分解的低秩适配器和动态稀疏路由机制,以极低的参数量(LoRA的2%,AdapterFusion的0.03%)实现多任务NLP分类任务的竞争性性能,平均准确率提升约4个百分点,同时解决任务干扰和知识遗忘问题。
-
Unveiling the Mechanisms of Explicit CoT Training: How CoT Enhances Reasoning Generalization
本文通过控制实验、内部机制分析和理论推导,揭示了显式思维链(CoT)训练通过形成二阶段泛化电路显著提升大型语言模型的分布内(ID)和分布外(OOD)推理泛化能力,并验证了其在噪声数据下的鲁棒性。
-
The dynamic interplay between in-context and in-weight learning in humans and neural networks
本文通过神经网络中上下文学习(ICL)与权重学习(IWL)的动态交互,统一解释了人类学习中的组合性泛化、课程效应及灵活性与保留性权衡,为认知科学双过程理论提供了新视角。
-
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?
本文通过提出一个四维度分类框架(什么扩展、如何扩展、哪里扩展、扩展效果如何),系统综述了测试时扩展(TTS)在大型语言模型中的研究现状,为理解和应用推理阶段计算扩展提供了结构化视角和实践指导。
-
LZ Penalty: An information-theoretic repetition penalty for autoregressive language models
本文提出LZ惩罚方法,基于LZ77压缩算法的码长变化动态调整自回归语言模型的采样分布,在贪婪解码下有效消除退化重复,同时保持推理基准性能。
-
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL
本文通过结合监督微调(SFT)、强化学习(RL)及细粒度奖励函数(如QATCH),显著提升了小型LLM在Text2SQL任务中的推理能力和性能,Think2SQL-7B模型在BIRD数据集上超越了400B+参数模型。
-
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training
R&B框架通过基于语义相似性的数据重新分组和梯度驱动的动态权重调整,以极低的计算开销(0.01%)在自然语言和多模态任务中匹配或超越现有数据混合策略,提升了基础模型训练效率。
-
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
本文提出谱系正则化矩阵分解(LRMF)方法,通过利用大型语言模型的谱系关系显著提高性能预测准确性,在同质和异质模型场景下均优于传统方法,尤其在冷启动问题上表现突出。
-
Hierarchical Attention Generates Better Proofs
本文提出层次注意力正则化方法,通过引导大型语言模型的注意力机制与数学推理的五级层次结构对齐,在 miniF2F 和 ProofNet 基准上分别提升证明成功率 2.05% 和 1.69%,并显著降低证明复杂度。
-
When2Call: When (not) to Call Tools
本文提出When2Call基准,通过多选格式评估语言模型在工具调用决策上的表现,并通过偏好优化(RPO)训练方法显著提升模型在何时调用工具及何时保守行为之间的平衡能力。
-
Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking
本文提出InteRank方法,通过知识蒸馏和强化学习训练一个3B参数小型语言模型,在推理密集型文档重排序任务中生成解释并实现与70B+参数模型相当的性能,在BRIGHT基准上位列第三。
-
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
本文通过提出Gather-and-Aggregate (G&A)机制,揭示了Transformer和SSM模型在上下文检索能力上的性能差距主要源于少数关键头部的实现差异,并通过混合模型实验验证了注意力机制在改进SSM检索能力上的潜力。
-
Improving Reasoning Performance in Large Language Models via Representation Engineering
本文通过表示工程方法,利用控制向量干预大型语言模型的残差流,成功提升了Pythia和Mistral模型在归纳、演绎和数学推理任务上的表现,表明推理能力可通过调整内部表示进行调控。
-
MateICL: Mitigating Attention Dispersion in Large-Scale In-Context Learning
本文提出 MateICL 框架,通过分割上下文窗口并引入注意力校准层解决大型语言模型在大规模上下文学习中的注意力分散问题,实验证明其在多种 NLP 任务中有效提升性能并保持稳定性。
-
CREAM: Consistency Regularized Self-Rewarding Language Models
本文提出了CREAM(Consistency Regularized Self-Rewarding Language Model)方法,通过衡量自奖励过程中不同迭代模型之间排序的一致性来正则化偏好训练,从而缓解奖励偏差问题,提高小型语言模型的对齐性能和训练稳定性。
-
Weight Ensembling Improves Reasoning in Language Models
本文发现监督微调导致推理模型多样性坍塌损害 Pass@K,并提出通过插值早期与后期 SFT 检查点(WiSE-FT)的方法,有效提升模型多样性,同时提高 Pass@1 和 Pass@K,进而改善测试时缩放和强化学习效果。
-
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs
本文提出了低秩知识遗忘(LoKU)框架,包含反向铰链损失(IHL)和 Fisher 加权低秩适配器初始化(FILA),以实现鲁棒且参数高效的大语言模型知识遗忘,有效移除敏感信息同时保持模型原有能力。
-
Reinforcement Learning for LLM Reasoning Under Memory Constraints
本文提出了S-GRPO和T-SPMO两种内存高效、无批评者的强化学习方法,结合LoRA微调,在有限硬件资源下显著提升了大型语言模型在数学推理任务上的性能,其中T-SPMO在需要细粒度信用分配的任务上表现尤为突出。
-
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
本文提出了Mem0及其图增强变体Mem0*<sup>g</sup>*,这是一种可扩展的记忆架构,通过动态提取、整合和检索对话中的关键信息来赋予AI Agent长期记忆能力,并在LOCOMO基准测试中显著优于现有方法,同时大幅降低了计算开销。
-
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
本文提出了推测思维链(SCoT)框架,通过轻量级草稿模型并行生成多个思维链草稿,并由微调后的目标大模型选择最佳草稿或决定重新思考,从而在保持接近大模型准确率的同时,显著降低了大型语言模型的推理延迟。
-
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
本文发现,通过对大型语言模型应用带有可验证奖励的强化学习,仅使用一个训练示例即可显著提升其数学推理能力,效果可媲美使用数千示例进行训练,并揭示了饱和后泛化、跨领域泛化等现象,强调了策略梯度和探索的重要性。
-
Dynamic Fisher-weighted Model Merging via Bayesian Optimization
本文提出了动态 Fisher 加权合并 (DF-Merge) 方法,通过贝叶斯优化动态调整微调模型的缩放系数,并在这些缩放模型上利用 Fisher 信息进行加权合并,从而高效地创建性能显著优于现有基线的多任务模型。
-
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
本文通过实证研究发现,大型语言模型在推理任务中存在"过度思考"简单问题和"思考不足"困难问题的现象,其推理长度与正确性呈非单调关系,且简单偏好更短回答可在保持准确率的同时显著减少生成长度。
-
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
本文提出DPE,一种无需训练的长文本外推方法,通过检测RoPE不同维度组的有效相对距离并识别关键维度,有选择地调整这些关键维度的位置索引,显著扩展了LLM的上下文窗口并提升了长文本任务性能。
-
Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition
本文提出了 Trace-of-Thought Prompting,一种基于提示的知识蒸馏框架,通过将复杂问题分解为可管理的步骤,有效地将高资源模型的推理能力迁移到低资源模型,显著提升了低资源模型在算术推理任务上的表现,且无需大量微调。
-
Base Models Beat Aligned Models at Randomness and Creativity
本文通过在随机数生成、混合策略游戏和创意写作等需要不可预测性的任务上进行实验,发现流行的对齐技术会损害基础模型在这方面的能力,而基础模型在这些任务上表现更佳,这表明在常见基准性能和不可预测能力之间可能存在权衡。
-
Llama-Nemotron: Efficient Reasoning Models
NVIDIA 发布了 Llama-Nemotron 系列开放模型,通过结合神经架构搜索、知识蒸馏、持续预训练、基于高质量合成数据的多阶段有监督微调和大规模强化学习,构建了在推理能力和效率上均达到领先水平、并支持动态推理模式切换的异构模型家族。
-
Reward Guidance for Reinforcement Learning Tasks Based on Large Language Models: The LMGT Framework
本文提出了LMGT框架,通过利用大型语言模型的先验知识对强化学习的奖励进行动态调整,有效平衡了探索与利用,显著提高了样本效率并降低了训练成本,并在多种环境、算法以及机器人和推荐系统等复杂场景中验证了其有效性。
-
Which Attention Heads Matter for In-Context Learning?
本文通过对12个大型语言模型进行消融研究和训练动态分析,发现函数向量头是驱动少样本上下文学习的主要机制,尤其在大型模型中,并且许多函数向量头在训练过程中从归纳头演变而来,纠正了先前认为归纳头是主要驱动力的观点。
-
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
本文提出LLM代理可以通过自动收集和选择自身在序列决策任务中的成功轨迹作为上下文示例,显著提升性能,减少对人工知识工程的依赖。
-
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation
本文提出了 SmallPlan 框架,通过结合 LLM 指导的蒸馏、模拟环境反馈的 SFT 和 RL,训练轻量级的小型语言模型 (SLM) 进行高效的机器人高层路径规划,使其在资源受限的边缘设备上实现接近大型模型 (LLM) 的性能。
-
Block Circulant Adapter for Large Language Models
本文提出块循环适配器方法,通过利用块循环矩阵和FFT优化LLM的微调过程,显著降低存储和计算成本,同时通过学习率调整确保训练稳定。
-
From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models
本文提出光谱字典生成模型(SDGM),通过学习全局傅里叶字典和 token 混合系数替换自注意力机制,实现 O(KL) 复杂度的高效语言建模,并在基准数据集上取得竞争性 perplexity 和显著的资源节省。
-
On the generalization of language models from in-context learning and finetuning: a controlled study
本文通过控制实验比较了语言模型在上下文学习和微调下的泛化能力,发现上下文学习更灵活,并提出通过数据增强方法显著改善微调的泛化性能。
-
Empirical Evaluation of Progressive Coding for Sparse Autoencoders
本文通过实证评估比较了Matryoshka SAEs和基于字典幂律修剪的方法,以实现SAEs的渐进式编码,提高计算效率、重建保真度和可解释性。
-
Communication-Efficient Wireless Federated Fine-Tuning for Large-Scale AI Models
本文提出了一种无线联邦LoRA微调框架,通过Sparsified Orthogonal Fine-Tuning (SOFT) 和Two Stage Federated Algorithm (TSFA) 优化参数稀疏化和动态资源分配,提高了通信效率和学习性能。
-
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
本文提出了一种多阶段训练方案,包括大规模蒸馏、滚动偏好优化和可验证奖励的强化学习,显著提升了小型语言模型在数学推理任务中的性能,使3.8B参数的Phi-4-Mini-Reasoning模型超过了近两倍参数的开源基线模型。
-
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
本文提出LEPA自训练算法,通过训练LLM生成预期计划作为抽象元知识来提升问题解决泛化能力,并在多个推理基准上显著优于现有方法。
-
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
本文提出Mixture of Sparse Attention (MoSA)方法,通过专家选择路由实现基于内容的稀疏注意力,显著提高了Transformer模型在相同计算预算下的语言建模性能,并优化了资源使用。
-
HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization
本文提出基于超网络的HYPEROFA方法,用于初始化新语言令牌嵌入,提高PLM对低资源语言的适应性,性能优于随机初始化并与OFA方法持平或更好。
-
本文通过提出位置 ID 操纵的 PFT 方法,揭示并解决了 LLM 在角色分离学习中依赖捷径的问题,提高了模型的鲁棒性和安全性,同时保持了性能。
-
RWKV-X: A Linear Complexity Hybrid Language Model
本文提出RWKV-X,一种线性复杂度的混合语言模型,通过结合RWKV和稀疏注意力机制,提升长上下文建模能力,同时保持高效性和短上下文性能。
-
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition
本文提出DeepSeek-Prover-V2,通过子目标分解和强化学习统一非正式和正式数学推理,显著提升了神经定理证明的性能,在多个基准上达到最先进水平。
-
AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models
本文提出AdaptMI和AdaptMI+自适应方法,通过基于奖励模型检测问题难度并针对困难问题选择技能-based in-context示例,提高小语言模型在数学推理任务中的性能,同时避免认知过载。
-
CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks
本文提出CachePrune方法,通过基于DPO损失的特征归因识别并修剪KV缓存中的关键神经元,防御间接提示注入攻击,同时保持模型响应质量。
-
TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts
本文提出 TT-LoRA MoE 框架,通过两阶段解耦的专家训练和路由机制,实现了参数高效的多任务学习,显著减少计算开销并保持性能。
-
Kimi-Audio Technical Report
本文提出Kimi-Audio,一个开源的音频基础模型,通过结合音频分词、LLM处理和逆分词的统一架构,以及大规模多模态训练,实现了音频理解、生成和对话的多任务SOTA性能。
-
Beyond Public Access in LLM Pre-Training Data
本文通過DE-COP成員推斷攻擊方法,使用O'Reilly書籍數據集證明OpenAI的GPT-4o可能訓練過非公共版權內容,突顯了LLM預訓練數據中非公共數據使用增加的趨勢及加強透明度和許可框架的必要性。
-
MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness
本文提出MAC-Tuning方法,通过分步微调分离答案预测和置信度估计,提升LLMs在多问题设置下的知识边界意识,显著减少幻觉并改善性能。
-
Efficient Knowledge Transfer in Multi-Task Learning through Task-Adaptive Low-Rank Representation
本文提出 TA-LoRA 方法,通过任务自适应低秩表示和快速-缓慢权重机制提升多任务学习的知识转移效率,实现对未见任务的优异泛化性能,同时保持高参数效率。
-
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
本文提出Diff-Prompt方法,使用扩散模型基于掩码监督生成细粒度提示信息,显著提升预训练多模态模型在复杂指代表达理解任务上的性能,同时保持高效微调。
-
HyPerAlign: Hypotheses-driven Personalized Alignment
本文提出HyPerAlign方法,通过假设驱动的少样本学习实现LLM的个性化对齐,提高了模型对个体用户的适应性和安全性,同时减少了对微调的依赖。
-
Toward Efficient Exploration by Large Language Model Agents
本文通过使用 LLMs 显式实现后验采样 RL 算法,显著提高了 LLMs 代理在自然语言环境中的探索效率,同时保留了经典算法的统计性能优势。
-
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
本文通过系统综述和实证基准测试,比较了LLMs的不确定性量化与校准方法,揭示了这些方法的有效性、局限性,并为未来研究提供了关键洞见。
-
Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare
本文通过实证实验指导在医疗专业应用中语言模型的选择,强调微调小语言模型和领域特定预训练的显著优势,使其在特定任务上超越零-shot 大语言模型。
-
LLM Enhancer: Merged Approach using Vector Embedding for Reducing Large Language Model Hallucinations with External Knowledge
本文提出 LLM-ENHANCER 系统,通过合并多个在线数据来源并使用向量嵌入减少大型语言模型的幻觉,提高响应准确性,同时保持自然性和经济性。
-
Meeseeks: An Iterative Benchmark Evaluating LLMs Multi-Turn Instruction-Following Ability
本文提出Meeseeks多轮指令遵循基准,通过迭代反馈机制系统评估LLMs的自纠错能力,发现模型在多轮互动中性能显著提升。
-
An Empirical Study of Evaluating Long-form Question Answering
本文实证研究了长形式问题回答的自动评估指标,证明了基于LLM的指标在准确性和稳定性上的优势,同时分析了其偏差和改进策略。
-
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models
本文提出 Think, Prune, Train 框架,通过迭代监督微调和基于正确性的数据修剪,实现模型在不增加规模的情况下提升推理能力,避免模型坍缩。
-
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
本文提出Token-Shuffle方法,通过利用视觉词汇维度冗余动态合并和恢复图像令牌,实现高效的高分辨率文本到图像生成,同时在统一自回归框架下保持出色性能。
-
Phi-4-reasoning Technical Report
本文通过数据导向的监督微调和强化学习,开发了小型LLM Phi-4-reasoning 和 Phi-4-reasoning-plus,提升了其在复杂推理任务上的性能,与大型模型竞争。
-
X-Fusion: Introducing New Modality to Frozen Large Language Models
本文提出X-Fusion框架,通過凍結LLM參數並添加雙塔結構,高效實現多模態理解和生成,同時保留原始語言能力。
-
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
本文通过提出攻击框架和实验评估,揭示了LLM-as-a-judge系统的prompt injection漏洞,并推荐使用多模型委员会等策略提升鲁棒性。
-
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
本文提出Param∆方法,通过直接添加参数差值在零成本下实现后训练知识向新基模型的转移,达到与传统后训练相当的性能。
-
Pushing the boundary on Natural Language Inference
本文提出使用Group Relative Policy Optimization结合Chain-of-Thought学习的方法提升自然语言推理任务的性能,无需标注推理路径,通过参数高效微调在对抗性基准上实现最先进结果。
-
Monte Carlo Planning with Large Language Model for Text-Based Game Agents
本文提出MC-DML算法,通过整合大型语言模型的动态记忆机制与蒙特卡罗树搜索,提升文本-based游戏代理的规划效率和性能,实验结果显示其在初始阶段就优于需多次迭代的强基线。
-
Learning Explainable Dense Reward Shapes via Bayesian Optimization
本文提出一种通过Bayesian Optimization学习解释性密集奖励形状的方法,以解决RLHF中奖励稀疏问题,实现token级信用分配优化,提升训练效率和性能,同时保持最优政策不变。
-
Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving
本文提出基于认知负载的适应性流式传输框架,用于优化 LLM 服务,通过动态调整输出速度减少计算资源消耗高达 16.8%,同时维持用户满意度。
-
TTRL: Test-Time Reinforcement Learning
本文提出测试时强化学习(TTRL)方法,通过多数投票估计奖励,在无标签测试数据上训练大语言模型,实现模型自演化并显著提升推理任务性能。
-
Does Knowledge Distillation Matter for Large Language Model based Bundle Generation?
本文首次系统探索知识蒸馏技术在基于大语言模型的捆绑生成任务中的应用,通过提出一个全面的 KD 框架和实验验证,证明了在减少计算需求的同时能保持甚至提升性能。
-
Replay to Remember: Retaining Domain Knowledge in Streaming Language Models
本文通过结合LoRA和轻量级重放机制的方法,在流式学习条件下帮助大型语言模型减轻灾难性遗忘,同时实现了实时域适应。
-
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
本论文通过上下文无关文法合成数据研究了元数据条件化在语言模型预训练中的影响,发现其对长提示任务有益但对短提示任务有害,揭示了潜在语义推断的权衡。
-
W-PCA Based Gradient-Free Proxy for Efficient Search of Lightweight Language Models
本文提出 W-PCA 方法,通过结合参数数量和主成分分析,提供一种高效的零-shot NAS 代理,用于轻量级语言模型的搜索,显著提高了搜索效率和模型性能。
-
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
本文提出MegaScale-Infer系统,通过分离注意力模块和FFN模块的并行策略以及高效M2N通信库,优化大规模MoE模型的推理效率,实现高达1.90倍的吞吐量提升。
-
Efficient Single-Pass Training for Multi-Turn Reasoning
本文提出了一种通过响应令牌复制和自定义注意力掩码来实现多轮推理对话单次前向传递训练的方法,显著提高了训练效率,同时维护了推理可见性和位置一致性。
-
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores
本文提出MOOSComp方法,通过在训练中添加inter-class cosine similarity loss缓解over-smoothing问题,并在压缩中整合outlier分数保留关键token,显著提升了任务无关的长上下文压缩性能和泛化能力。
-
Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability
本文通过引入批评-修订提示和比较多任务训练、反事实训练及其结合的方法,系统评估了知识蒸馏对语言模型性能和可解释性的影响,发现多任务训练在性能上表现出色,而结合批评-修订提示的方法显著提升了可解释性。
-
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
本文提出UniME框架,通过文本判别知识蒸馏和硬负例增强指令微调,利用多模态大语言模型学习通用的多模态嵌入,提高了下游任务的判别性和组合能力。
-
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
本文提出PaperCoder框架,通过多代理LLM的多阶段管道自动从机器学习论文生成高质量代码仓库,提升了研究的可复现性,并在基准测试中显著优于现有方法。
-
TeLLMe: An Energy-Efficient Ternary LLM Accelerator for Prefilling and Decoding on Edge FPGAs
本文提出TeLLMe,一种能量高效的三元LLM FPGA加速器,通过表查找矩阵引擎和反向注意力优化,支持预填充和解码阶段,在7W功率下实现高达9.51 tokens/s吞吐量和低预填充延迟。
-
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
论文通过大规模实验分析了Transformer LLMs中稀疏注意力的效率-准确性权衡,揭示了长序列下更大稀疏模型的优势,并建立了可推广的缩放定律。
-
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
本文提出 StreamRL 框架,通过分离式流生成架构优化 RL 训练,解决了流水线和偏斜气泡问题,提高了 LLMs RL 训练的吞吐量和成本效率。
-
A closer look at how large language models trust humans: patterns and biases
本研究通过模拟实验首次揭示大型语言模型对人类的隐性信任模式,显示其类似于人类受可信度维度影响,但存在模型异质性和人口统计学偏差。
-
Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning
本文提出MINDcraft框架和MineCollab基准,评估LLM在多代理具身协作中的性能,揭示了当前模型在通信和协调方面的局限性,并呼吁开发更先进的协作方法。
-
Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement
本文提出动态参数化RAG框架DyPRAG,通过训练一个轻量级参数翻译器在测试时动态转换文档为参数知识,显著降低成本、提升泛化能力和缓解RAG幻觉问题。
-
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
本文提出DYMU框架,通过动态令牌合并和虚拟取消合并的训练-free方法,显著提高了VLMs的计算效率,同时在多个基准上保持了与完整模型相似的性能。
-
On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration
本文提出软件硬件协同优化框架,通过 AWQ 模型压缩和 FPGA 加速在边缘设备上高效部署 Qwen2.5-0.5B 模型,实现 55.1% 的压缩率和 5.1 tokens/s 的推理速度,同时保持较高准确性。
-
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
本文提出PointLoRA方法,通过低秩适配和多尺度令牌选择,实现点云模型的参数高效微调,显著减少可训练参数同时在多个数据集上达到竞争性性能。
-
Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation
本文提出了一种质量导向的多代理框架,通过提示诱导、检索增强合成和奖励过滤从少量标注数据中提炼高质量监督信号,提升LLMs在低资源结构化推理任务中的性能。
-
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference
本研究提出 SpargeAttn,一种通用稀疏注意力机制,通过两阶段在线过滤器和量化技术加速各种模型的推理,同时保持端到端性能无损。
-
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
本文提出WALL-E 2.0,一种无训练的神经符号学习方法,通过对齐LLM与环境动态构建精确世界模型,并结合模型预测控制框架,显著提升了LLM代理在开放世界任务中的性能。
-
MARFT: Multi-Agent Reinforcement Fine-Tuning
本文提出MARFT框架,通过序列决策和信任区域优化在LLM-based多代理系统中实现高效强化微调,提升代理协作能力并解决传统MARL的适用性问题。
-
Quantum-Enhanced LLM Efficient Fine Tuning
本文提出量子张量混合适配(QTHA)方法,通过整合量子神经网络和张量网络,实现LLM的参数高效微调,显著减少参数量并提升性能,为量子增强人工智能奠定基础。
-
Adaptive Layer-skipping in Pre-trained LLMs
本文提出FlexiDepth方法,通过插件式路由器和适配器实现预训练LLM的自适应层跳过,提高计算效率同时保持生成性能,并通过实验揭示了token类型对计算需求的影响。
-
Synergizing RAG and Reasoning: A Systematic Review
本论文系统综述了检索增强生成(RAG)与推理能力的协同整合,构建了多维分类框架、提供了实用指南,并指出了未来研究方向,以推进RAG系统在复杂任务中的认知能力。
-
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
本文提出 EAGLE-3 方法,通过移除特征预测约束和多层特征融合技术,显著提高了大语言模型的推理加速比,并在实验中实现了高达 6.5 倍的无损速度提升。
-
Training Plug-n-Play Knowledge Modules with Deep Context Distillation
本文提出使用深度上下文蒸馏训练可插拔知识模块的方法,能够在低数据场景下高效整合文档知识,并通过实验证明其在问答任务中优于传统方法且与 RAG 具有协同效应。
-
Humanity's Last Exam
本文引入HUMANITY'S LAST EXAM基准测试,通过专家创建的挑战性多模态问题,解决现有LLM基准饱和问题,评估模型在封闭式学术任务中的能力。
-
ElChat: Adapting Chat Language Models Using Only Target Unlabeled Language Data
本文提出ElChat方法,通过直接在目标无标签数据上适应聊天模型,并结合模型合并和权重复制技术,成功恢复聊天能力和指令遵循,同时在目标语言性能和安全方面表现出色。
-
LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning
本文提出LIFT框架,通过长输入微调和Gated Memory适配器提升短上下文LLMs的长上下文理解能力,实验显示显著性能改进。
-
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models
本文提出Head-Specific Intervention (HSI)方法,通过针对特定注意力头的激活干预,成功诱导Llama 2模型在AI协调行为上绕过安全对齐,效果优于监督微调和其它干预策略。
-
Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning
本文提出Reason2Attack方法,通过基于Frame Semantics的CoT示例合成和带攻击过程奖励的强化学习,增强LLM的推理能力,以高效生成对抗性提示实现对T2I模型的越狱攻击。
-
ASIDE: Architectural Separation of Instructions and Data in Language Models
本文提出ASIDE方法,通过在嵌入级别应用固定正交旋转实现大型语言模型的指令-数据架构分离,提高了模型的安全性和对提示注入攻击的鲁棒性,同时不牺牲性能。
-
SAGE: A Framework of Precise Retrieval for RAG
本文提出SAGE框架,通过语义分割、基于梯度的块选择和LLM自反馈机制,提高RAG系统的检索精度和问答性能,同时显著降低成本。
-
Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision
本文提出Instruct-LF方法,通过结合LLMs的指令遵循能力和梯度-based统计模型,实现无需任务监督的目标导向潜在因素发现,提高了下游任务性能并在人工评估中被偏好。
-
Codenames as a Benchmark for Large Language Models
本论文提出使用Codenames游戏作为LLMs推理能力的基准,通过实验评估不同LLMs在语言理解、战略推理和合作方面的表现,展示了它们的独特行为和泛化潜力。
-
Less is More: Towards Green Code Large Language Models via Unified Structural Pruning
本文提出Flab-Pruner,一种结合词汇、层和FFN剪枝的统一结构剪枝方法,通过KL散度优化和自定义微调策略,在减少代码LLM参数的同时保持高性能和效率。
-
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
本文提出EPO方法,通过强化学习优化一个专门的战略推理模型,辅助任意LLM代理在动态环境中实现长期目标对齐,提升战略推理能力。
-
Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding
本文系统揭示了自注意力模块中大规模值在LLM上下文知识理解中的关键作用,并通过实验证明其源于旋转位置编码(RoPE),为模型优化和量化策略提供新洞见。
-
Evidence of conceptual mastery in the application of rules by Large Language Models
本文通过心理实验证明大型语言模型在规则应用中表现出概念掌握能力,能够泛化到新情境并部分模仿人类对时间压力等语境的敏感性。
-
From System 1 to System 2: A Survey of Reasoning Large Language Models
本文综述了从基础LLMs向推理LLMs的演进,通过整合System 2技术提升AI的逐步推理能力,并在基准测试中展示了显著性能改进。
-
SuperARC: An Agnostic Test for Narrow, General, and Super Intelligence Based On the Principles of Recursive Compression and Algorithmic Probability
本文提出SuperARC测试框架,通过算法概率和Kolmogorov复杂度的原理,设计了一个客观的AGI和ASI评估方法,证明递归压缩等价于预测,并展示了LLMs的局限性。
-
PennyLang: Pioneering LLM-Based Quantum Code Generation with a Novel PennyLane-Centric Dataset
本文提出 PennyLang 数据集和 RAG/GraphRAG 框架,通过提升 LLM 在 PennyLane 量子代码生成中的准确性和正确性,填补了 AI 辅助量子编程的空白。
-
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
本文提出PLAN-AND-ACT框架,通过分离规划和执行模块、利用合成数据训练和动态重规划,提高LLM代理在复杂长期任务中的性能,并在web导航基准上达到state-of-the-art结果。
-
You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects
本文提出ExecutionAgent,一个基于LLM的自主代理,通过meta-prompting和迭代反馈机制自动设置并执行任意软件项目的测试套件,显著提高了测试执行的成功率和准确性。
-
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
本文首次系统调查了大型语言模型高效推理的进展,通过分类模型、输出和提示-based方法,探讨了减少"过度思考"现象的策略,以优化计算效率并保持推理能力。
-
Prompt-Based Cost-Effective Evaluation and Operation of ChatGPT as a Computer Programming Teaching Assistant
本文通过设计基于ICL和CoT的提示模板,实现了ChatGPT在编程教育中的成本效益评估和操作,显著降低了手动评估需求并提升了反馈的结构化分析。
-
State Space Models are Strong Text Rerankers
本文通过全面benchmark比较状态空间模型如Mamba与Transformer在文本重排序任务中的性能和效率,发现Mamba模型可实现类似性能但效率较低,并强调了未来优化方向。
-
Towards Reasoning Ability of Small Language Models
本文通过系统基准测试72个SLMs,证明小型语言模型可以通过结构化训练和压缩技术实现与大型模型相当的推理能力,从而挑战了规模依赖的传统观点。