Tag: Pre-training

All the articles with the tag "Pre-training".

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing

Published: 4 May, 2025 at 04:33 PM

69.21 🤔

本文提出Mixture of Sparse Attention (MoSA)方法，通过专家选择路由实现基于内容的稀疏注意力，显著提高了Transformer模型在相同计算预算下的语言建模性能，并优化了资源使用。
Contextures: Representations from Contexts

Published: 10 May, 2025 at 11:05 AM

69.00 🤔

This paper introduces the contexture theory, unifying representation learning across paradigms by targeting top singular functions of a context-induced expectation operator, demonstrating high alignment in neural representations and proposing a task-agnostic metric for context evaluation with strong empirical correlation to performance on various datasets.
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models

Published: 7 May, 2025 at 12:17 AM

68.20 🤔

本文提出MAET方法，通过提取语言无关的能力相关权重并跨语言转移，构建多语言能力增强的大型语言模型，在数学和科学任务上以60%的计算资源实现约10%的性能提升，优于多种基线方法。
LZ Penalty: An information-theoretic repetition penalty for autoregressive language models

Published: 6 May, 2025 at 11:19 PM

67.26 🤔

本文提出LZ惩罚方法，基于LZ77压缩算法的码长变化动态调整自回归语言模型的采样分布，在贪婪解码下有效消除退化重复，同时保持推理基准性能。
Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare

Published: 4 May, 2025 at 04:31 PM

67.24 🤔

本文通过实证实验指导在医疗专业应用中语言模型的选择，强调微调小语言模型和领域特定预训练的显著优势，使其在特定任务上超越零-shot 大语言模型。

Tag: Pre-training

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing

Contextures: Representations from Contexts

Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models

LZ Penalty: An information-theoretic repetition penalty for autoregressive language models

Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare