Tag: Large Language Model

All the articles with the tag "Large Language Model".

CRANE: Reasoning with constrained LLM generation

Published: 9 May, 2025 at 11:08 AM

61.03 🤔

This paper introduces CRANE, a reasoning-augmented constrained decoding algorithm that alternates between unconstrained and constrained generation to preserve LLM reasoning capabilities while ensuring syntactic correctness, achieving up to 10% accuracy improvement on symbolic reasoning benchmarks like GSM-Symbolic and FOLIO.
Splitwiser: Efficient LM inference with constrained resources

Published: 11 May, 2025 at 11:14 AM

60.85 🤔

Splitwiser introduces a method to split LLM inference phases on a single GPU using multiprocessing and NVIDIA MPS, achieving modest latency reductions (up to 18.2%) and throughput improvements (up to 1.42x) on Huggingface and vLLM pipelines, though constrained by overheads and scalability issues.
How do Humans and Language Models Reason About Creativity? A Comparative Analysis

Published: 10 May, 2025 at 10:59 AM

60.58 🤔

This paper conducts a comparative analysis of creativity evaluation in STEM, revealing that human experts and LLMs prioritize different facets of originality (cleverness vs. remoteness/uncommonness) and are differentially influenced by contextual examples, with LLMs showing higher predictive accuracy but poorer construct validity due to homogenized facet correlations.
Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving

Published: 4 May, 2025 at 04:30 PM

60.43 🤔

本文提出基于认知负载的适应性流式传输框架，用于优化 LLM 服务，通过动态调整输出速度减少计算资源消耗高达 16.8%，同时维持用户满意度。
Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision

Published: 4 May, 2025 at 04:27 PM

59.70 🤔

本文提出Instruct-LF方法，通过结合LLMs的指令遵循能力和梯度-based统计模型，实现无需任务监督的目标导向潜在因素发现，提高了下游任务性能并在人工评估中被偏好。

Tag: Large Language Model

CRANE: Reasoning with constrained LLM generation

Splitwiser: Efficient LM inference with constrained resources

How do Humans and Language Models Reason About Creativity? A Comparative Analysis

Streaming, Fast and Slow: Cognitive Load-Aware Streaming for Efficient LLM Serving

Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision