Tag: Representation Learning

All the articles with the tag "Representation Learning".

Radio: Rate-Distortion Optimization for Large Language Model Compression

Published: 9 May, 2025 at 11:09 AM

79.84 🤔

This paper introduces 'Radio,' a rate-distortion optimization framework for LLM compression that outperforms existing quantization methods in perplexity and downstream task accuracy, particularly at lower bit depths, by iteratively optimizing bit depths and using companding quantization post-training.
Does Self-Attention Need Separate Weights in Transformers?

Published: 11 May, 2025 at 11:12 AM

79.57 🤔

This paper introduces a shared weight self-attention mechanism for transformers, using a single weight matrix with diagonal scaling to reduce parameters by 66.53% in attention blocks, achieving competitive performance on GLUE and improved noise robustness while slightly underperforming on SQuAD tasks compared to standard BERT.
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores

Published: 4 May, 2025 at 04:29 PM

77.68 🤔

本文提出MOOSComp方法，通过在训练中添加inter-class cosine similarity loss缓解over-smoothing问题，并在压缩中整合outlier分数保留关键token，显著提升了任务无关的长上下文压缩性能和泛化能力。
Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning

Published: 11 May, 2025 at 11:08 AM

77.34 🤔

The paper introduces 'Latte', a framework that transfers latent-level knowledge from Large Language Models during training to enhance few-shot tabular learning, outperforming baselines by leveraging unlabeled data and mitigating overfitting across diverse classification and regression tasks.
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance

Published: 6 May, 2025 at 11:17 PM

75.40 🤔

本文提出谱系正则化矩阵分解（LRMF）方法，通过利用大型语言模型的谱系关系显著提高性能预测准确性，在同质和异质模型场景下均优于传统方法，尤其在冷启动问题上表现突出。

Tag: Representation Learning

Radio: Rate-Distortion Optimization for Large Language Model Compression

Does Self-Attention Need Separate Weights in Transformers?

MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores

Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning

Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance