Posts

All the articles I've posted.

Communicating Activations Between Language Model Agents

Published: 10 May, 2025 at 10:59 AM

87.71 🤔

This paper introduces Activation Communication (AC), a novel method for inter-LLM communication using intermediate activations instead of natural language, achieving up to 27% performance improvement over traditional methods with significantly reduced compute across coordination games and reasoning benchmarks.
Merge to Mix: Mixing Datasets via Model Merging

Published: 26 May, 2025 at 11:24 AM

87.71 🤔

本文提出*Merge to Mix*方法，通过模型合并技术作为代理，高效选择数据集混合用于大型模型微调，在图像分类和语言任务中显著优于传统方法，接近甚至部分超过Oracle性能。
MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning

Published: 1 Jun, 2025 at 11:52 AM

87.68 🤔

本文提出MELoRA，通过并行堆叠多个小型LoRA模块实现更高的等效秩，以更少的参数在自然语言理解和指令跟随任务上显著优于LoRA。
Pre-training vs. Fine-tuning: A Reproducibility Study on Dense Retrieval Knowledge Acquisition

Published: 18 May, 2025 at 11:16 AM

87.67 🤔

本文通过线性探查和神经元激活分析，复制并扩展了对密集检索模型中预训练与微调知识获取作用的研究，发现预训练知识在DPR模型中主导检索效果且微调导致知识分散，但此结论在不同架构（如Contriever、RepLlama）和表示策略下并不成立。
Scalable Model Merging with Progressive Layer-wise Distillation

Published: 4 Jun, 2025 at 11:26 AM

87.67 🤔

本文提出ProDistill算法，通过逐层教师-学生蒸馏高效合并大型预训练模型，理论证明领域特定数据的必要性，并在视觉、语言任务上实现显著性能提升（6.14%-6.61%），展现出优越的内存和计算效率。

Posts

Communicating Activations Between Language Model Agents

Merge to Mix: Mixing Datasets via Model Merging

MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning

Pre-training vs. Fine-tuning: A Reproducibility Study on Dense Retrieval Knowledge Acquisition

Scalable Model Merging with Progressive Layer-wise Distillation