Tag: Model Compression

All the articles with the tag "Model Compression".

Knowledge Grafting of Large Language Models

Published: 28 May, 2025 at 11:21 AM

88.92 🤔

GraftLLM提出了一种通过模块感知压缩生成SkillPack的方法，实现大型语言模型间高效跨能力转移、知识融合和无遗忘持续学习，并在多个基准测试中显著优于现有方法。
EfficientLLM: Efficiency in Large Language Models

Published: 24 May, 2025 at 11:12 AM

85.05 🤔

EfficientLLM通过大规模实证基准测试，系统评估了大型语言模型在架构预训练、微调和推理阶段的效率优化技术，揭示了资源权衡和任务依赖性，为从业者提供了基于数据的模型和技术选择指导。
On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration

Published: 4 May, 2025 at 04:29 PM

53.38 🤔

本文提出软件硬件协同优化框架，通过 AWQ 模型压缩和 FPGA 加速在边缘设备上高效部署 Qwen2.5-0.5B 模型，实现 55.1% 的压缩率和 5.1 tokens/s 的推理速度，同时保持较高准确性。