Tag: Knowledge Distillation
All the articles with the tag "Knowledge Distillation".
-
Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking
本文提出InteRank方法,通过知识蒸馏和强化学习训练一个3B参数小型语言模型,在推理密集型文档重排序任务中生成解释并实现与70B+参数模型相当的性能,在BRIGHT基准上位列第三。
-
Does Knowledge Distillation Matter for Large Language Model based Bundle Generation?
本文首次系统探索知识蒸馏技术在基于大语言模型的捆绑生成任务中的应用,通过提出一个全面的 KD 框架和实验验证,证明了在减少计算需求的同时能保持甚至提升性能。
-
Llama-Nemotron: Efficient Reasoning Models
NVIDIA 发布了 Llama-Nemotron 系列开放模型,通过结合神经架构搜索、知识蒸馏、持续预训练、基于高质量合成数据的多阶段有监督微调和大规模强化学习,构建了在推理能力和效率上均达到领先水平、并支持动态推理模式切换的异构模型家族。
-
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
This paper explores effective distillation of HuBERT for ASR by comparing student model structures, introducing a discriminative loss for improved low-resource performance, and proposing front-end distillation from waveform to Fbank features, achieving 17% parameter reduction and doubled inference speed with minor performance degradation.
-
Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability
本文通过引入批评-修订提示和比较多任务训练、反事实训练及其结合的方法,系统评估了知识蒸馏对语言模型性能和可解释性的影响,发现多任务训练在性能上表现出色,而结合批评-修订提示的方法显著提升了可解释性。