Tag: Reasoning

All the articles with the tag "Reasoning".

Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories

Published: 16 May, 2025 at 11:10 AM

92.78 🤔

This exploratory study evaluates GPT-4o's multilingual and multimodal performance on physics concept inventories, revealing strong results in English and text-based tasks but significant weaknesses in visual interpretation and non-Western languages, highlighting implications for equitable AI integration in education.
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning

Published: 28 May, 2025 at 11:25 AM

92.52 🤔

本文通过理论分析和Re-distillation技术，揭示了小规模SFT在R1风格RL中的效率瓶颈，并以极少样本（<1K）在K&K和MATH数据集上接近RL性能，显著提升了数据效率。
1bit-Merging: Dynamic Quantized Merging for Large Language Models

Published: 1 Jun, 2025 at 11:52 AM

92.20 🤔

1bit-Merging提出了一种动态模型合并框架，通过1位量化任务向量和任务特定路由，在保持94.53%性能的同时将存储需求降至55.02%，在通用知识、数学推理和代码生成任务上优于传统和动态合并方法。
Reward Reasoning Model

Published: 24 May, 2025 at 11:08 AM

92.11 🤔

本文提出奖励推理模型（RRMs），通过链式推理过程在生成奖励前自适应利用测试时计算资源，在多个奖励建模基准和实际应用中显著提升性能，尤其在复杂推理任务上表现优异。
TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression

Published: 5 Jun, 2025 at 11:22 AM

92.10 🤔

本文提出TLDR方法，通过动态再加权系统1和系统2推理数据，显著压缩大型语言模型的推理输出token数量（约40%），同时在多难度数学任务上基本保持准确性。

Tag: Reasoning

Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories

Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning

1bit-Merging: Dynamic Quantized Merging for Large Language Models

Reward Reasoning Model

TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression