Tag: Efficiency

All the articles with the tag "Efficiency".

From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models

Published: 4 May, 2025 at 04:33 PM

75.71 🤔

本文提出光谱字典生成模型（SDGM），通过学习全局傅里叶字典和 token 混合系数替换自注意力机制，实现 O(KL) 复杂度的高效语言建模，并在基准数据集上取得竞争性 perplexity 和显著的资源节省。
CCSK:Cognitive Convection of Self-Knowledge Based Retrieval Augmentation for Large Language Models

Published: 7 May, 2025 at 08:43 AM

70.69 🤔

本文提出CCSK框架，通过Siamese Network和Response Quality Model动态融合查询相似性和响应质量，优化大型语言模型的信息检索决策，在多个问答数据集上显著提升了F1分数和准确率。
From System 1 to System 2: A Survey of Reasoning Large Language Models

Published: 4 May, 2025 at 04:26 PM

75.04 🤔

本文综述了从基础LLMs向推理LLMs的演进，通过整合System 2技术提升AI的逐步推理能力，并在基准测试中展示了显著性能改进。
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning

Published: 9 May, 2025 at 11:06 AM

68.33 🤔

This paper introduces SIMPLEMIX, a simple method to mix on- and off-policy data in language model preference optimization, demonstrating that their complementary strengths—on-policy for reasoning tasks and off-policy for open-ended tasks—lead to a 6.03% average improvement over single-source methods on Alpaca Eval 2.0.
Intra-Layer Recurrence in Transformers for Language Modeling

Published: 7 May, 2025 at 12:12 AM

69.79 🤔

本文提出Intra-Layer Recurrence (ILR)方法，通过在Transformer单次前向传播中选择性循环特定层（尤其是早期层），在不增加参数量的情况下改善语言建模困惑度，但计算成本增加和大规模模型验证不足限制了其实用性。

Tag: Efficiency

From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models

CCSK:Cognitive Convection of Self-Knowledge Based Retrieval Augmentation for Large Language Models

From System 1 to System 2: A Survey of Reasoning Large Language Models

SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning

Intra-Layer Recurrence in Transformers for Language Modeling