This paper introduces Purity Law (PuLa), a structural principle revealing sparsity bias in optimal TSP solutions, and proposes Purity Policy Optimization (PUPO), a training framework that significantly enhances the generalization of neural TSP solvers across diverse scales and distributions without inference overhead.
Reinforcement Learning, Transformer, Prediction, Efficiency, Robustness
Wenzhao Liu, Haoran Li, Congying Han, Zicheng Zhang, Anqi Li, Tiande Guo
University of Chinese Academy of Sciences
Generated by grok-3
Background Problem
The Traveling Salesman Problem (TSP), a fundamental NP-hard combinatorial optimization problem, poses significant challenges for neural approaches due to poor generalization across different instance scales and distributions. Traditional neural solvers often overfit to specific training patterns, failing to capture universal structural properties of optimal solutions. This paper aims to address this by identifying a novel structural principle, termed Purity Law (PuLa), which highlights a bias towards local sparsity in optimal TSP solutions, and leverages it to improve the generalization of neural solvers through a new training framework.
Method
The paper introduces two key contributions: the Purity Law (PuLa) and Purity Policy Optimization (PUPO). PuLa is a structural principle stating that the prevalence of edges in optimal TSP solutions follows a negative exponential distribution based on their ‘purity order,’ a measure of local vertex sparsity around an edge, defined as the number of vertices within a circle whose diameter is the edge. PUPO is a training framework that integrates PuLa into policy optimization for neural TSP solvers. It modifies the policy gradient in reinforcement learning (specifically REINFORCE) by incorporating ‘purity weightings,’ which are based on ‘purity availability’ (average minimum purity order of unvisited vertices) and ‘purity cost’ (combining purity order of new edges and impact on future purity potential). This encourages the model to favor actions aligning with PuLa during solution construction, aiming to learn consistent patterns across instances without altering network architecture. The method is designed to be compatible with existing neural constructive solvers, guiding training without additional inference overhead.
Experiment
The experiments evaluate PUPO’s impact on generalization using six configurations of three state-of-the-art neural solvers (AM, PF, INViT) trained on scales of 50 and 100, tested on randomly generated datasets with four distributions (uniform, clustered, explosion, implosion) and scales from 100 to 10,000, as well as the real-world TSPLIB dataset (up to 10,000 nodes). The setup compares vanilla REINFORCE training against PUPO, using metrics like average gap to optimal solutions, tour length, solving time, and purity metrics (e.g., proportion of 0-order pure edges). Results show PUPO consistently improves generalization, especially on larger scales (e.g., INViT-100 achieves a 15.01% relative improvement on scale 10,000) and across distributions, with negligible inference time overhead (deviations within 5%). However, smaller-scale instances sometimes show performance trade-offs, suggesting implicit regularization effects. The experimental design is comprehensive, covering diverse scenarios, but the focus on positive results might underplay limitations, such as hyperparameter sensitivity or negative impacts on smaller scales. Purity metrics confirm PUPO enhances alignment with PuLa, though the causal link to generalization remains empirically driven rather than theoretically grounded.
Further Thoughts
The introduction of Purity Law (PuLa) opens up intriguing avenues for understanding structural biases in combinatorial optimization problems beyond TSP, such as vehicle routing or graph-based problems in logistics and network design. Its statistical validation across diverse instances suggests a potential universal principle, but the lack of a rigorous theoretical foundation raises questions about whether this is a fundamental property or a dataset-specific observation. Could PuLa be linked to known geometric properties of Euclidean spaces, or is it an emergent behavior of optimization algorithms like LKH3 used to generate optimal solutions? Additionally, the implicit regularization effect of PUPO, as noted in reduced Frobenius norms, hints at connections to broader machine learning challenges like overfitting in deep reinforcement learning. Future work could explore integrating PuLa directly into network architectures, perhaps via attention mechanisms in transformers that prioritize sparsity, or test its applicability in other NP-hard problems. Comparing PUPO’s regularization effects with explicit techniques like weight decay could further clarify its unique contributions. Lastly, the trade-off observed in smaller-scale performance suggests a need for adaptive strategies that balance generalization and precision based on instance size, potentially drawing inspiration from meta-learning approaches.