Posts

All the articles I've posted.

Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M

Published: 19 May, 2025 at 11:16 AM

83.59 🤔

本文通过基于提示的方法初步研究了大型语言模型（LLMs）对MovieLens-1M推荐数据集的记忆程度，发现所有测试模型均表现出一定记忆，且记忆程度与推荐性能和模型规模正相关，同时揭示了流行度偏见问题。
DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs

Published: 18 May, 2025 at 11:17 AM

83.58 🤔

本文提出DialogueReason，一种基于对话的推理模式，通过PPO和规则奖励函数训练大型语言模型，以提升复杂复合问答任务中的推理多样性和连贯性，并在MATH、AIME和GPQA数据集上展现出比单论式推理更强的鲁棒性。
Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking

Published: 6 May, 2025 at 11:15 PM

83.31 🤔

本文提出InteRank方法，通过知识蒸馏和强化学习训练一个3B参数小型语言模型，在推理密集型文档重排序任务中生成解释并实现与70B+参数模型相当的性能，在BRIGHT基准上位列第三。
Recursive Inference Scaling: A Winning Path to Scalable Inference in Language and Multimodal Systems

Published: 12 May, 2025 at 11:20 AM

82.89 🤔

This paper introduces Recursive INference Scaling (RINS), a method that recursively applies a model block to exploit language's self-similarity, achieving significant performance gains in language and multimodal tasks under compute-matched conditions while offering inference flexibility through stochastic training and linear adapters.
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Published: 7 May, 2025 at 08:43 AM

82.56 🤔

本文提出R1-Reward，通过StableReinforce算法将强化学习应用于多模态奖励模型训练，显著提升了性能并在多个基准测试中超越现有最优模型，同时展示了优异的数据效率和测试时扩展性。

Posts

Do LLMs Memorize Recommendation Datasets? A Preliminary Study on MovieLens-1M

DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs

Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking

Recursive Inference Scaling: A Winning Path to Scalable Inference in Language and Multimodal Systems

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning