Tag: Representation Learning

All the articles with the tag "Representation Learning".

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Published: 6 May, 2025 at 01:18 AM

81.62 🤔

本文提出DPE，一种无需训练的长文本外推方法，通过检测RoPE不同维度组的有效相对距离并识别关键维度，有选择地调整这些关键维度的位置索引，显著扩展了LLM的上下文窗口并提升了长文本任务性能。
Radio: Rate-Distortion Optimization for Large Language Model Compression

Published: 9 May, 2025 at 11:09 AM

79.84 🤔

This paper introduces 'Radio,' a rate-distortion optimization framework for LLM compression that outperforms existing quantization methods in perplexity and downstream task accuracy, particularly at lower bit depths, by iteratively optimizing bit depths and using companding quantization post-training.
Does Self-Attention Need Separate Weights in Transformers?

Published: 11 May, 2025 at 11:12 AM

79.57 🤔

This paper introduces a shared weight self-attention mechanism for transformers, using a single weight matrix with diagonal scaling to reduce parameters by 66.53% in attention blocks, achieving competitive performance on GLUE and improved noise robustness while slightly underperforming on SQuAD tasks compared to standard BERT.
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores

Published: 4 May, 2025 at 04:29 PM

77.68 🤔

本文提出MOOSComp方法，通过在训练中添加inter-class cosine similarity loss缓解over-smoothing问题，并在压缩中整合outlier分数保留关键token，显著提升了任务无关的长上下文压缩性能和泛化能力。
Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning

Published: 11 May, 2025 at 11:08 AM

77.34 🤔

The paper introduces 'Latte', a framework that transfers latent-level knowledge from Large Language Models during training to enhance few-shot tabular learning, outperforming baselines by leveraging unlabeled data and mitigating overfitting across diverse classification and regression tasks.

Tag: Representation Learning

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Radio: Rate-Distortion Optimization for Large Language Model Compression

Does Self-Attention Need Separate Weights in Transformers?

MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores

Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning