Tag: Large Language Model

All the articles with the tag "Large Language Model".

Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models

Published: 7 May, 2025 at 08:42 AM

78.41 🤔

本文系统综述了基于强化学习的推理方法在多模态大语言模型（MLLMs）中的进展，分析了算法设计、奖励机制及应用，揭示了跨模态推理和奖励稀疏性等挑战，并提出了分层奖励和交互式RL等未来方向。
Layered Unlearning for Adversarial Relearning

Published: 19 May, 2025 at 11:17 AM

77.78 🤔

本文提出分层遗忘（Layered Unlearning, LU）方法，通过多阶段逐步遗忘数据子集并诱导不同抑制机制，增强大型语言模型对对抗性重新学习的鲁棒性，尽管对语料库攻击仍显脆弱。
Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning

Published: 11 May, 2025 at 11:08 AM

77.34 🤔

The paper introduces 'Latte', a framework that transfers latent-level knowledge from Large Language Models during training to enhance few-shot tabular learning, outperforming baselines by leveraging unlabeled data and mitigating overfitting across diverse classification and regression tasks.
Large Language Model Compression with Global Rank and Sparsity Optimization

Published: 11 May, 2025 at 11:14 AM

77.26 🤔

This paper introduces a two-stage LLM compression method using RPCA for low-rank and sparse decomposition and probabilistic pruning via policy gradient, outperforming state-of-the-art techniques at a 50% compression ratio while automatically adapting to layer-wise redundancy without manual thresholds or extensive fine-tuning.
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes

Published: 12 May, 2025 at 11:14 AM

76.90 🤔

This paper introduces Latent Preference Coding (LPC), a framework that uses discrete latent codes to model multifaceted human preferences, consistently improving the performance of offline alignment algorithms like DPO, SimPO, and IPO across multiple LLMs and benchmarks.

Tag: Large Language Model

Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models

Layered Unlearning for Adversarial Relearning

Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning

Large Language Model Compression with Global Rank and Sparsity Optimization

Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes