Tag: Large Language Model

All the articles with the tag "Large Language Model".

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing

Published: 4 May, 2025 at 04:33 PM

69.21 🤔

本文提出Mixture of Sparse Attention (MoSA)方法，通过专家选择路由实现基于内容的稀疏注意力，显著提高了Transformer模型在相同计算预算下的语言建模性能，并优化了资源使用。
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

Published: 12 May, 2025 at 11:18 AM

69.15 🤔

This paper introduces Gaussian Concept Subspace (GCS), a framework to model concept representations in LLMs as Gaussian distributions, demonstrating improved robustness, faithfulness, and plausibility over single vector methods, with effective application in emotion steering tasks.
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

Published: 10 May, 2025 at 10:59 AM

68.75 🤔

This paper introduces MM-Escape, a benchmark using the customizable 3D environment EscapeCraft to evaluate multimodal reasoning in MLLMs through room escape tasks, revealing that while models like GPT-4o achieve high success in simple scenarios, performance drops significantly with increased difficulty, exposing distinct limitations in reasoning and spatial awareness.
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning

Published: 9 May, 2025 at 11:06 AM

68.33 🤔

This paper introduces SIMPLEMIX, a simple method to mix on- and off-policy data in language model preference optimization, demonstrating that their complementary strengths—on-policy for reasoning tasks and off-policy for open-ended tasks—lead to a 6.03% average improvement over single-source methods on Alpaca Eval 2.0.
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models

Published: 7 May, 2025 at 12:17 AM

68.20 🤔

本文提出MAET方法，通过提取语言无关的能力相关权重并跨语言转移，构建多语言能力增强的大型语言模型，在数学和科学任务上以60%的计算资源实现约10%的性能提升，优于多种基线方法。

Tag: Large Language Model

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing

Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning

Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models