Posts
All the articles I've posted.
-
Contextures: Representations from Contexts
This paper introduces the contexture theory, unifying representation learning across paradigms by targeting top singular functions of a context-induced expectation operator, demonstrating high alignment in neural representations and proposing a task-agnostic metric for context evaluation with strong empirical correlation to performance on various datasets.
-
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
This paper introduces MM-Escape, a benchmark using the customizable 3D environment EscapeCraft to evaluate multimodal reasoning in MLLMs through room escape tasks, revealing that while models like GPT-4o achieve high success in simple scenarios, performance drops significantly with increased difficulty, exposing distinct limitations in reasoning and spatial awareness.
-
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
This paper introduces SIMPLEMIX, a simple method to mix on- and off-policy data in language model preference optimization, demonstrating that their complementary strengths—on-policy for reasoning tasks and off-policy for open-ended tasks—lead to a 6.03% average improvement over single-source methods on Alpaca Eval 2.0.
-
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
本文提出MAET方法,通过提取语言无关的能力相关权重并跨语言转移,构建多语言能力增强的大型语言模型,在数学和科学任务上以60%的计算资源实现约10%的性能提升,优于多种基线方法。
-
HAIR: Hardness-Aware Inverse Reinforcement Learning with Introspective Reasoning for LLM Alignment
HAIR introduces a novel LLM alignment method using hardness-aware inverse reinforcement learning and introspective reasoning, constructing a balanced safety dataset and training category-specific reward models with GRPO-S, achieving state-of-the-art harmlessness while preserving usefulness across multiple benchmarks.