Tag: Multimodal Data
All the articles with the tag "Multimodal Data".
-
Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant
This paper comprehensively evaluates the impact of four quantization methods (GPTQ, AWQ, SmoothQuant, FP8) on instruction-tuned LLMs and SLMs from 1B to 405B parameters across 13 datasets, revealing that quantized models often outperform smaller baselines but struggle with instruction-following and hallucination detection, with FP8 showing robustness and task difficulty not always correlating with accuracy loss.
-
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
本文通过提出AI记忆系统的分类(参数、上下文结构化和非结构化)和六种基本操作(整合、更新、索引、遗忘、检索、压缩),系统化地综述了长期记忆、长上下文、参数修改和多源记忆等研究主题,并展望了未来方向。
-
How does Transformer Learn Implicit Reasoning?
本文通过在受控符号环境中从头训练Transformer模型,揭示了隐式多跳推理的三阶段发展轨迹,并利用跨查询语义补丁和余弦表示透镜工具,阐明了推理能力与隐藏空间聚类的关联,为模型可解释性提供了新见解。
-
Language Models are Universal Embedders
本文基于多语言解码器模型(如BLOOM)提出通用嵌入器构建方法,通过对比学习和参数高效微调实现跨语言、跨任务的高质量嵌入,实验表明其在多语言和多任务场景中具有显著潜力和泛化能力。
-
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores
本文提出MOOSComp方法,通过在训练中添加inter-class cosine similarity loss缓解over-smoothing问题,并在压缩中整合outlier分数保留关键token,显著提升了任务无关的长上下文压缩性能和泛化能力。