Contextures: Representations from Contexts

This paper introduces the contexture theory, unifying representation learning across paradigms by targeting top singular functions of a context-induced expectation operator, demonstrating high alignment in neural representations and proposing a task-agnostic metric for context evaluation with strong empirical correlation to performance on various datasets.

Representation Learning, Supervised Learning, Self-Supervised Learning, Foundation Model, Pre-training, Scaling Laws

Runtian Zhai, Kai Yang, Che-Ping Tsai, Burak Varici, Zico Kolter, Pradeep Ravikumar

Carnegie Mellon University, Peking University

Generated by grok-3

Background Problem

The paper addresses a fundamental gap in understanding representation learning, particularly in foundation models, by questioning what representations these models learn and why they are useful across diverse tasks. Despite the empirical success of deep learning, there is no systematic characterization of learned representations, leading to mysteries about their effectiveness in tasks different from pretraining objectives. The authors aim to provide a unified theoretical framework, termed ‘contexture theory,’ to explain representation learning across supervised, self-supervised, and manifold learning paradigms, solving the key problem of identifying the target of representation learning as the top singular functions of a context-induced expectation operator.

Method

The core idea of the contexture theory is that representation learning can be characterized as learning from the association between an input variable X and a context variable A, aiming to approximate the top singular functions of the expectation operator induced by this context. The method works by defining a joint distribution P^+ over X and A, which induces kernels and an expectation operator T^{P^+}. The optimal representation, termed ‘learning the contexture,’ is achieved by recovering the span of the top-d singular functions of T^{P^+}. Main steps include: (1) defining various contexts (labels, transformations, graphs, stochastic features), (2) proving that common learning objectives (e.g., supervised learning with MSE, self-supervised contrastive and non-contrastive losses, and graph-based node representation learning) target these singular functions, and (3) proposing a task-agnostic metric to evaluate context usefulness based on the spectrum of singular values, reflecting the association strength between X and A.

Experiment

The experiments are conducted on multiple datasets, including abalone and MNIST, alongside 28 classification and regression datasets from OpenML. The setup tests the alignment of learned representations with top singular functions using neural networks of varying depth and width, and evaluates context usefulness with a proposed metric (τ) against actual downstream prediction errors. Results show high alignment (CCA up to 0.9, mutual KNN over 0.8) between neural network representations and top eigenfunctions on abalone, but diminishing returns with excessive scaling due to optimization difficulties. The context evaluation metric correlates strongly with performance (median Pearson correlation 0.587, distance correlation 0.659) on most datasets, though it fails in cases of extreme association or cross-context type comparisons. The setup is reasonably comprehensive for initial validation, but the limited variety of contexts and datasets, along with reliance on kernel PCA for exact solutions, raises questions about scalability and generalizability. The results partially match expectations for moderate association contexts but highlight task-specific variations that the metric cannot universally capture.

Further Thoughts

The contexture theory offers a compelling framework to unify representation learning, but its practical impact hinges on addressing the acknowledged limitations, particularly the integration of optimization dynamics and architectural biases. An insightful connection could be drawn to recent works on neural architecture search and meta-learning, where inductive biases are explicitly optimized—could contexture theory guide the design of architectures that inherently align with optimal singular functions? Additionally, the idea of ‘context scaling’ resonates with ongoing research in curriculum learning and data selection for pretraining; exploring how to systematically design or evolve contexts using reinforcement learning or evolutionary algorithms could be a fruitful direction. The theory’s focus on spectral properties also invites a comparison with graph neural networks, where spectral methods are prevalent—could contexture theory provide new insights into designing GNNs for dynamic graphs? Lastly, the failure cases of the proposed metric suggest a need for hybrid evaluation strategies that incorporate task-specific signals, perhaps drawing from active learning paradigms to adaptively refine contexts during pretraining.