Posts
All the articles I've posted.
-
Thought calibration: Efficient and confident test-time scaling
本文提出‘思想校准’方法,通过推理树抽象和轻量级探针动态决定语言模型推理终止时机,在分布内数据上减少高达60%的思考token,同时保持性能,并在分布外数据上实现20%的减少。
-
Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection
本文提出Route-To-Reason(RTR)框架,通过动态路由机制联合选择最优模型和推理策略,在多个推理任务上实现了更高的准确率和超过60%的token使用量减少,显著优化了性能与成本的权衡。
-
Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs
本文提出了一种动态自适应的混合训练框架 SASR,通过基于梯度范数和 KL 散度的动态调整机制结合 SFT 和 RL,在数学推理和逻辑推理任务上显著提升了大语言模型的性能,优于传统 SFT、RL 和静态混合方法。
-
Sparse-Group Boosting with Balanced Selection Frequencies: A Simulation-Based Approach and R Implementation
This paper introduces sparse-group boosting and a simulation-based group balancing algorithm within the 'sgboost' R package to mitigate variable selection bias in high-dimensional grouped data, demonstrating improved fairness and interpretability through simulations and ecological data analysis.
-
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
ASTRA introduces an efficient defense for Vision Language Models by adaptively steering activations away from adversarial directions using image attribution, achieving state-of-the-art performance in mitigating jailbreak attacks with minimal impact on benign utility and high inference efficiency.