Tag: Large Language Model

All the articles with the tag "Large Language Model".

R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search

Published: 30 May, 2025 at 11:22 AM

86.90 🤔

R1-Compress通过块级压缩和块间搜索机制有效压缩长链式推理（Long-CoT），在减少约20% token使用量的同时保持了与基线接近的推理准确率（92.4% vs 93.0%）。
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

Published: 18 May, 2025 at 11:22 AM

86.89 🤔

本文通过将GRPO算法应用于Qwen2-Audio-7B-Instruct模型，在音频问答任务中取得了64.5%的最佳准确率，证明强化学习在小规模数据集上优于监督微调，但显式推理过程未显著提升性能，且与人类水平仍有差距。
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Published: 17 May, 2025 at 11:02 AM

86.87 🤔

This paper introduces a systematic approach to enhance large reasoning models by aligning them with deduction, induction, and abduction meta-abilities through a three-stage pipeline of individual training, parameter merging, and domain-specific RL, achieving up to 4% performance gains over instruction-tuned baselines across math, coding, and science benchmarks.
AI agents may be worth the hype but not the resources (yet): An initial exploration of machine translation quality and costs in three language pairs in the legal and news domains

Published: 8 May, 2025 at 12:22 AM

86.86 🤔

本文通过实证评估五种机器翻译范式，发现推理增强的大型语言模型（如o1-preview）在人工评估中表现出色，超越传统NMT，而多智能体系统虽具潜力，但因高计算成本和语言对表现不一致而受限。
Activated LoRA: Fine-tuned LLMs for Intrinsics

Published: 7 May, 2025 at 12:17 AM

86.84 🤔

本文提出 Activated LoRA (aLoRA)，一种改进的 LoRA 框架，通过仅对激活后 token 适配权重，复用基础模型 KV 缓存，实现高效动态适配，并在多个任务上保持与标准 LoRA 相当的性能，同时显著降低推理成本。

Tag: Large Language Model

R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

AI agents may be worth the hype but not the resources (yet): An initial exploration of machine translation quality and costs in three language pairs in the legal and news domains

Activated LoRA: Fine-tuned LLMs for Intrinsics