Tag: Multimodality
All the articles with the tag "Multimodality".
-
Splitwiser: Efficient LM inference with constrained resources
Splitwiser introduces a method to split LLM inference phases on a single GPU using multiprocessing and NVIDIA MPS, achieving modest latency reductions (up to 18.2%) and throughput improvements (up to 1.42x) on Huggingface and vLLM pipelines, though constrained by overheads and scalability issues.
-
RWKV-X: A Linear Complexity Hybrid Language Model
本文提出RWKV-X,一种线性复杂度的混合语言模型,通过结合RWKV和稀疏注意力机制,提升长上下文建模能力,同时保持高效性和短上下文性能。