Posts
All the articles I've posted.
-
Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models
本文通过分析大型语言模型中长链式思维能力的激活模式,提出了一种训练无关的激活控制方法(EELo-CoT)和参数高效微调策略,在推理时动态调整激活值以显著提升自反思率和准确率。
-
Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs
本文通过层级上下文掩码和跨任务补丁方法,验证了大型语言模型内部存在‘内部思维链’,即在不同网络深度学习并按序执行复合任务的子任务,从而提升了模型透明度并为指令级行为控制开辟了新路径。
-
The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation
本文通过系统性实验证明,在大型语言模型微调中增加语言多样性可显著提升所有类别翻译对的性能,并通过中层表征分析揭示跨语言迁移机制,但多样性收益存在阈值。
-
An Efficient Sparse Kernel Generator for O(3)-Equivariant Deep Networks
This paper introduces a GPU sparse kernel generator for the Clebsch-Gordon tensor product in O(3)-equivariant deep networks, achieving significant speedups (up to 10x over e3nn and 1.3x-2.0x over cuEquivariance) by leveraging JIT compilation, static analysis, and kernel fusion, particularly enhancing performance in computational chemistry models like Nequip and MACE.
-
Constraint-based causal discovery with tiered background knowledge and latent variables in single or overlapping datasets
This paper introduces tFCI and tIOD algorithms that leverage tiered background knowledge to enhance the efficiency and informativeness of constraint-based causal discovery in settings with latent variables and overlapping datasets, demonstrating theoretical gains under oracle conditions.