Tag: Representation Learning
All the articles with the tag "Representation Learning".
-
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
本文提出谱系正则化矩阵分解(LRMF)方法,通过利用大型语言模型的谱系关系显著提高性能预测准确性,在同质和异质模型场景下均优于传统方法,尤其在冷启动问题上表现突出。
-
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores
本文提出MOOSComp方法,通过在训练中添加inter-class cosine similarity loss缓解over-smoothing问题,并在压缩中整合outlier分数保留关键token,显著提升了任务无关的长上下文压缩性能和泛化能力。
-
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
RetroInfer reimagines the KV cache as a vector storage system, using an attention-aware wave index and wave buffer to achieve up to 4.5x speedup over full attention and 10.5x over sparse baselines for long-context LLM inference, while preserving near-full-attention accuracy.
-
Patterns and Mechanisms of Contrastive Activation Engineering
This paper systematically investigates Contrastive Activation Engineering (CAE) for steering LLM behavior at inference time, revealing reliable in-distribution performance with optimal sample sizes around 80-100, but significant challenges in out-of-distribution generalization, model perplexity degradation, and vulnerability to adversarial inputs.
-
Intra-Layer Recurrence in Transformers for Language Modeling
本文提出Intra-Layer Recurrence (ILR)方法,通过在Transformer单次前向传播中选择性循环特定层(尤其是早期层),在不增加参数量的情况下改善语言建模困惑度,但计算成本增加和大规模模型验证不足限制了其实用性。