Tag: Efficiency

All the articles with the tag "Efficiency".

Does Self-Attention Need Separate Weights in Transformers?

Published: 11 May, 2025 at 11:12 AM

79.57 🤔

This paper introduces a shared weight self-attention mechanism for transformers, using a single weight matrix with diagonal scaling to reduce parameters by 66.53% in attention blocks, achieving competitive performance on GLUE and improved noise robustness while slightly underperforming on SQuAD tasks compared to standard BERT.
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Published: 6 May, 2025 at 01:18 AM

81.62 🤔

本文提出DPE，一种无需训练的长文本外推方法，通过检测RoPE不同维度组的有效相对距离并识别关键维度，有选择地调整这些关键维度的位置索引，显著扩展了LLM的上下文窗口并提升了长文本任务性能。
LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment

Published: 13 May, 2025 at 11:21 AM

78.95 🤔

LSAQ introduces a novel Layer-Specific Adaptive Quantization system for LLMs, using Jaccard similarity to assess layer importance and dynamically adjusting quantization precision based on edge device resources, achieving superior accuracy on zero-shot tasks and lower perplexity compared to baseline methods while enabling efficient deployment.
Accelerating Large Language Model Reasoning via Speculative Search

Published: 13 May, 2025 at 11:12 AM

78.41 🤔

Speculative Search (SpecSearch) accelerates LLM reasoning by up to 2.12× through a bi-level speculative thought generator that collaborates between small and large models, maintaining comparable reasoning quality via a quality-preserving rejection mechanism.
Efficient Reasoning for LLMs through Speculative Chain-of-Thought

Published: 6 May, 2025 at 01:19 AM

79.97 🤔

本文提出了推测思维链（SCoT）框架，通过轻量级草稿模型并行生成多个思维链草稿，并由微调后的目标大模型选择最佳草稿或决定重新思考，从而在保持接近大模型准确率的同时，显著降低了大型语言模型的推理延迟。

Tag: Efficiency

Does Self-Attention Need Separate Weights in Transformers?

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment

Accelerating Large Language Model Reasoning via Speculative Search

Efficient Reasoning for LLMs through Speculative Chain-of-Thought