Tag: Human-AI Interaction

All the articles with the tag "Human-AI Interaction".

Reverse Preference Optimization for Complex Instruction Following

Published: 1 Jun, 2025 at 11:44 AM

85.20 🤔

本文提出逆向偏好优化（RPO）方法，通过动态反转指令中未满足的约束消除偏好对噪声，在多轮复杂指令跟随任务上显著优于DPO基线，并在70B模型上超越GPT-4o。
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

Published: 8 May, 2025 at 06:19 PM

83.98 🤔

This paper introduces a recursive summarization method to enhance long-term dialogue memory in LLMs, achieving marginal quantitative improvements and notable qualitative gains in consistency and coherence across multiple models and datasets.
DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs

Published: 18 May, 2025 at 11:17 AM

83.58 🤔

本文提出DialogueReason，一种基于对话的推理模式，通过PPO和规则奖励函数训练大型语言模型，以提升复杂复合问答任务中的推理多样性和连贯性，并在MATH、AIME和GPQA数据集上展现出比单论式推理更强的鲁棒性。
Efficient Reasoning for LLMs through Speculative Chain-of-Thought

Published: 6 May, 2025 at 01:19 AM

79.97 🤔

本文提出了推测思维链（SCoT）框架，通过轻量级草稿模型并行生成多个思维链草稿，并由微调后的目标大模型选择最佳草稿或决定重新思考，从而在保持接近大模型准确率的同时，显著降低了大型语言模型的推理延迟。
Toward Efficient Exploration by Large Language Model Agents

Published: 4 May, 2025 at 04:31 PM

79.45 🤔

本文通过使用 LLMs 显式实现后验采样 RL 算法，显著提高了 LLMs 代理在自然语言环境中的探索效率，同时保留了经典算法的统计性能优势。

Tag: Human-AI Interaction

Reverse Preference Optimization for Complex Instruction Following

Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs

Efficient Reasoning for LLMs through Speculative Chain-of-Thought

Toward Efficient Exploration by Large Language Model Agents