Tag: Representation Learning

All the articles with the tag "Representation Learning".

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration

Published: 4 Jun, 2025 at 11:28 AM

89.30 🤔

本文提出逐层最优任务向量合并（LOT Merging）方法，通过最小化特征漂移优化模型合并过程，在视觉和视觉-语言任务上显著优于无训练基线方法，平均准确率提升高达4.4%。
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Published: 23 May, 2025 at 11:13 AM

89.28 🤔

本文提出MEAP训练范式，通过在下一词预测中引入随机掩码策略，显著提升大型语言模型在关键信息检索和长上下文推理任务中的性能，同时保持计算效率和架构兼容性。
Model Merging in Pre-training of Large Language Models

Published: 21 May, 2025 at 11:14 AM

89.09 🤔

本文提出预训练模型平均（PMA）策略，通过融合预训练阶段的检查点显著提升大型语言模型性能、预测退火效果并增强训练稳定性，为高效模型开发提供了新方法和实用指南。
Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation

Published: 22 May, 2025 at 11:18 AM

89.05 🤔

本文提出日志增强生成（LAG）框架，通过使用KV缓存直接复用过去的推理计算，显著提升大型语言模型在知识和推理密集型任务上的准确性和效率，优于标准代理系统及现有反思和KV缓存方法。
Memorization-Compression Cycles Improve Generalization

Published: 19 May, 2025 at 11:18 AM

88.89 🤔

本文通过提出信息瓶颈语言建模（IBLM）目标和Gated Phase Transition (GAPT)算法，理论和实验上证明了通过动态切换记忆和压缩阶段来降低表征熵，可以显著提升大型语言模型的泛化能力和冲突记忆分辨能力。

Tag: Representation Learning

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Model Merging in Pre-training of Large Language Models

Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation

Memorization-Compression Cycles Improve Generalization