Tag: Large Language Model
All the articles with the tag "Large Language Model".
-
CRANE: Reasoning with constrained LLM generation
This paper introduces CRANE, a reasoning-augmented constrained decoding algorithm that alternates between unconstrained and constrained generation to preserve LLM reasoning capabilities while ensuring syntactic correctness, achieving up to 10% accuracy improvement on symbolic reasoning benchmarks like GSM-Symbolic and FOLIO.
-
Splitwiser: Efficient LM inference with constrained resources
Splitwiser introduces a method to split LLM inference phases on a single GPU using multiprocessing and NVIDIA MPS, achieving modest latency reductions (up to 18.2%) and throughput improvements (up to 1.42x) on Huggingface and vLLM pipelines, though constrained by overheads and scalability issues.
-
How do Humans and Language Models Reason About Creativity? A Comparative Analysis
This paper conducts a comparative analysis of creativity evaluation in STEM, revealing that human experts and LLMs prioritize different facets of originality (cleverness vs. remoteness/uncommonness) and are differentially influenced by contextual examples, with LLMs showing higher predictive accuracy but poorer construct validity due to homogenized facet correlations.
-
Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving
本文提出基于认知负载的适应性流式传输框架,用于优化 LLM 服务,通过动态调整输出速度减少计算资源消耗高达 16.8%,同时维持用户满意度。
-
Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision
本文提出Instruct-LF方法,通过结合LLMs的指令遵循能力和梯度-based统计模型,实现无需任务监督的目标导向潜在因素发现,提高了下游任务性能并在人工评估中被偏好。