Tag: Large Language Model
All the articles with the tag "Large Language Model".
-
TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs
本文提出了一种基于多头张量化和Tucker分解的框架,通过强制共享高维子空间对大型语言模型的多头注意力权重进行结构化去噪和压缩,显著提升推理能力并实现高达247倍的压缩率。
-
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning
本文通过创新任务设计和Pythia模型训练检查点分析,揭示上下文学习(ICL)在大型语言模型中既非纯记忆也非符号算法,而是依赖统计特性的有限泛化能力,并探讨了其训练动态和内部机制联系。
-
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
LENSLLM introduces a Hessian-based PAC-Bayes framework and NTK-based scaling model for LLM selection, achieving up to 91.1% accuracy and 88.5% computational cost reduction by modeling fine-tuning dynamics across diverse tasks.
-
TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
This paper introduces TiC-LM, a web-scale benchmark for time-continual LLM pretraining using 114 Common Crawl dumps, demonstrating that replay and autoregressive schedules can match Oracle retraining on general web data with less compute, though trade-offs persist across domains.
-
Can Large Reasoning Models Self-Train?
本文提出Self-Rewarded Training (SRT) 方法,通过模型自一致性驱动强化学习实现无监督数学推理能力提升,初期性能媲美有监督方法,但因奖励黑客问题导致长期训练性能崩溃,并探索了提前停止和课程学习等缓解策略。