Tag: Large Language Model

All the articles with the tag "Large Language Model".

M+: Extending MemoryLLM with Scalable Long-Term Memory

Published: 3 Jun, 2025 at 11:27 AM

90.20 🤔

M+通过引入长期记忆机制和协同训练的检索器，显著扩展了MemoryLLM的知识保留能力至超过160k token，并在长上下文任务中优于基线，同时保持较低GPU内存消耗。
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

Published: 13 May, 2025 at 11:21 AM

90.20 🤔

This paper investigates inter-layer communication in Transformer LMs by identifying low-rank communication channels via SVD, demonstrating their causal role in prompt sensitivity through interventions that significantly improve performance on context retrieval tasks like the Laundry List task.
Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Published: 31 May, 2025 at 11:22 AM

90.15 🤔

本文通过综述、基准测试和提出权重重分解与动量重置两种技术，探索了大型语言模型预训练中的参数和内存高效方法，显著提升了低秩方法的性能并减少内存消耗，但仍无法完全匹配全秩训练的效果。
Learning Composable Chains-of-Thought

Published: 30 May, 2025 at 11:12 AM

90.13 🤔

本文提出Composable Chain-of-Thought方法，通过数据增强改进原子任务CoT格式，并结合多任务学习或模型合并实现零样本组合推理，使用拒绝采样微调进一步提升性能，在字符串操作和自然语言任务上优于标准CoT基准。
LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades

Published: 22 May, 2025 at 11:17 AM

89.99 🤔

本文提出LoRASuite，一种针对大型语言模型升级的模块化方法，通过转换矩阵、层映射和注意力头映射高效适配LoRA权重，并在数学与常识任务上显著优于小规模LoRA微调，甚至在某些场景下超越全规模重新训练，同时大幅降低内存和时间消耗。

Tag: Large Language Model

M+: Extending MemoryLLM with Scalable Long-Term Memory

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Learning Composable Chains-of-Thought

LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades