Communicating Activations Between Language Model Agents

This paper introduces Activation Communication (AC), a novel method for inter-LLM communication using intermediate activations instead of natural language, achieving up to 27% performance improvement over traditional methods with significantly reduced compute across coordination games and reasoning benchmarks.

Large Language Model, Multi-Agent, Reasoning, Efficiency, Representation Learning

Vignav Ramesh, Kenneth Li

Kempner Institute for AI, Harvard University

Generated by grok-3

Background Problem

The research addresses the high inference costs and information loss inherent in natural language communication between large language models (LLMs) used as autonomous agents. As LLMs are increasingly deployed for complex reasoning and decision-making tasks, multi-agent communication has emerged as a method to enhance their capabilities through collaboration. However, natural language as a medium scales poorly with the number of agents and messages, and it abstracts away rich internal representations that could be more informative. This work aims to solve these issues by proposing a more efficient and information-rich communication protocol using intermediate activations instead of natural language, potentially unlocking better performance with lower computational overhead.

Method

The proposed method, termed Activation Communication (AC), involves direct communication between language models (LMs) via their intermediate activations. Specifically, for two models A and B, the process is as follows: (1) pause model B’s computation at an intermediate layer j; (2) combine B’s activation at layer j with model A’s activation at layer k using a function f (options include sum, mean, or replace); (3) pass the combined output to B’s next layer j+1 and continue the forward pass until decoding. The method requires no additional task-specific parameters or data, leveraging frozen pre-trained models. For cases where activation spaces differ significantly, a task-agnostic linear mapping matrix W is learned once per model pair to project activations into a compatible space, trained on general text data to minimize MSE loss. This approach aims to preserve richer internal representations compared to natural language tokens and reduce compute by avoiding multiple full forward passes.

Experiment

The experiments are conducted in two setups: multi-player coordination games (Countries and Tip Sheets) and seven reasoning benchmarks (Biographies, GSM8k, and five MMLU subsets). Datasets are chosen to span diverse domains, with coordination games testing information transfer and reasoning benchmarks evaluating complex task performance. Models tested include various sizes from the LLaMA family (3B to 8B) and across families (Qwen, Gemma) to assess generalizability. Baselines include single-model performance and Natural Language Debate (NLD), with AC tested using different functions (sum, mean, replace) at fixed layers (k=j=26, determined empirically). Results show AC, particularly with the ‘replace’ function, achieves up to 27% improvement over NLD across datasets with less than 1/4 the compute, as quantified by FLOPs analysis. The setup is reasonable for controlled testing, but the subset sampling (e.g., 100 samples per dataset) and fixed layer choice may limit robustness claims. The learned mapping matrix W shows inconsistent gains, likely due to out-of-distribution training data, though in-distribution training on GSM8k significantly boosts performance (78% vs. 64%). While results match the expectation of compute efficiency and performance gains, the lack of testing on larger models (>70B) and real-world dynamic tasks suggests potential gaps in practical applicability.

Further Thoughts

The concept of activation communication opens up fascinating avenues for rethinking how LLMs collaborate, particularly in resource-constrained environments where compute efficiency is paramount. However, the reliance on aligned activation spaces or a learned mapping matrix W raises questions about scalability across highly divergent model architectures or training regimes—could this method be extended to black-box models via API access by approximating activations through output embeddings? Additionally, the interpretability trade-off is significant; while activations carry high-entropy information, their opacity could hinder trust in safety-critical applications like medical or legal reasoning, suggesting a need for hybrid approaches combining natural language for transparency with activations for efficiency. This work also connects to broader discussions in representation learning, such as the platonic representation hypothesis mentioned in the paper, which posits universal latent structures across models—could AC be a stepping stone to uncovering such universals by studying cross-model activation mappings? Finally, integrating AC with techniques like Retrieval-Augmented Generation (RAG) could be a powerful direction, where activations from a knowledge-retrieval model enhance a reasoning model’s context without verbose text exchanges, potentially revolutionizing real-time multi-agent systems.