Tag: Representation Learning
All the articles with the tag "Representation Learning".
-
Explaining Context Length Scaling and Bounds for Language Models
本文从内在空间视角提出理论框架,解释上下文长度对语言模型损失的影响,推导出与数据集大小相关的最优上下文长度,并通过自然语言和合成数据实验验证假设。
-
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
This paper introduces a taxonomy of language model memorization into recitation, reconstruction, and recollection, demonstrating through experiments with Pythia models that different factors influence each category, with a taxonomy-based predictive model outperforming baselines in predicting memorization likelihood.
-
Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching
本文提出了一种跨分词器蒸馏方法ALM,通过近似似然匹配实现不同分词器间的知识转移,首次在子词到字节级迁移等场景中取得显著效果,并在多个应用案例中优于现有方法。
-
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
本文通过提出AI记忆系统的分类(参数、上下文结构化和非结构化)和六种基本操作(整合、更新、索引、遗忘、检索、压缩),系统化地综述了长期记忆、长上下文、参数修改和多源记忆等研究主题,并展望了未来方向。
-
How does Transformer Learn Implicit Reasoning?
本文通过在受控符号环境中从头训练Transformer模型,揭示了隐式多跳推理的三阶段发展轨迹,并利用跨查询语义补丁和余弦表示透镜工具,阐明了推理能力与隐藏空间聚类的关联,为模型可解释性提供了新见解。