Posts

All the articles I've posted.

Hierarchical Attention Generates Better Proofs

Published: 6 May, 2025 at 11:16 PM

69.57 🤔

本文提出层次注意力正则化方法，通过引导大型语言模型的注意力机制与数学推理的五级层次结构对齐，在 miniF2F 和 ProofNet 基准上分别提升证明成功率 2.05% 和 1.69%，并显著降低证明复杂度。
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization

Published: 9 May, 2025 at 11:10 AM

69.29 🤔

RWKVQuant introduces a tailored Post Training Quantization framework for RWKV models, using a coarse-to-fine proxy to hybridize scalar and vector quantization and optimizing codebooks for element-wise operations, achieving ~3-bit quantization with minimal accuracy loss and significant memory and speed improvements.
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing

Published: 4 May, 2025 at 04:33 PM

69.21 🤔

本文提出Mixture of Sparse Attention (MoSA)方法，通过专家选择路由实现基于内容的稀疏注意力，显著提高了Transformer模型在相同计算预算下的语言建模性能，并优化了资源使用。
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

Published: 12 May, 2025 at 11:18 AM

69.15 🤔

This paper introduces Gaussian Concept Subspace (GCS), a framework to model concept representations in LLMs as Gaussian distributions, demonstrating improved robustness, faithfulness, and plausibility over single vector methods, with effective application in emotion steering tasks.
Training Plug-n-Play Knowledge Modules with Deep Context Distillation

Published: 4 May, 2025 at 04:28 PM

69.06 🤔

本文提出使用深度上下文蒸馏训练可插拔知识模块的方法，能够在低数据场景下高效整合文档知识，并通过实验证明其在问答任务中优于传统方法且与 RAG 具有协同效应。

Posts

Hierarchical Attention Generates Better Proofs

RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing

Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution

Training Plug-n-Play Knowledge Modules with Deep Context Distillation