Tag: Large Language Model

All the articles with the tag "Large Language Model".

DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?

Published: 2 Jun, 2025 at 11:31 AM

86.10 🤔

本文首次系统比较了推理型与非推理型大语言模型在自然语言生成评估中的表现，发现推理能力的效果高度依赖模型架构，OpenAI o3-mini 在机器翻译评估中显著优于非推理型模型，而 DeepSeek-R1 仅在文本摘要一致性评估中表现突出，蒸馏模型在 32B 参数规模时仍有效。
ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Published: 8 May, 2025 at 06:16 PM

86.09 🤔

ZEROSEARCH introduces a reinforcement learning framework that enhances LLMs' search capabilities by simulating search engines with fine-tuned LLMs, achieving performance comparable to or better than real search engines without API costs through a curriculum-based rollout strategy.
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models

Published: 21 May, 2025 at 11:29 AM

86.09 🤔

本文提出两种测试时计算扩展算法（淘汰赛式和联赛式），通过生成多个候选解决方案并进行成对比较，在理论上证明其失败概率随计算资源增加呈指数或幂律下降，并在多个数据集和模型上验证了性能提升。
SELF: Self-Extend the Context Length With Logistic Growth Function

Published: 1 Jun, 2025 at 11:52 AM

86.07 🤔

本文提出SELF方法，通过逻辑增长函数动态调整token分组大小以扩展大型语言模型的上下文长度，在部分长上下文任务上相较Self-Extend提升了性能，但普适性和稳定性仍需验证。
Investigating Task Arithmetic for Zero-Shot Information Retrieval

Published: 7 May, 2025 at 08:43 AM

86.02 🤔

本文提出任务算术方法，通过参数加减操作实现零样本信息检索的领域和语言适应，在科学、生物医学和多语言数据集上取得最高18%的NDCG@10提升，展现了轻量级模型适应的潜力。

Tag: Large Language Model

DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models

SELF: Self-Extend the Context Length With Logistic Growth Function

Investigating Task Arithmetic for Zero-Shot Information Retrieval