Task-Core Memory Management and Consolidation for Long-term Continual Learning

This paper introduces Long-CL, a human memory-inspired framework for long-term continual learning, leveraging task-core memory management and selective sample consolidation to significantly outperform baselines by 7.4% and 6.5% AP on two novel benchmarks, MMLongCL-Bench and TextLongCL-Bench, while mitigating catastrophic forgetting.

Continual Learning, Large Language Model, Vision Foundation Model, Multimodal Data, Efficiency, Robustness

Tianyu Huai, Jie Zhou, Yuxuan Cai, Qin Chen, Wen Wu, Xingjiao Wu, Xipeng Qiu, Liang He

East China Normal University, Nanyang Technological University, Fudan University

Generated by grok-3

Background Problem

The paper addresses the challenge of long-term continual learning (CL), where models must learn from a vast sequence of tasks over extended periods, mimicking real-world scenarios. Unlike traditional CL, which handles a limited number of tasks, long-term CL exacerbates catastrophic forgetting, where performance on earlier tasks degrades as new ones are learned. The authors aim to answer how existing CL methods perform under long-term settings and how to mitigate forgetting over prolonged sequential updates, introducing a framework inspired by human memory mechanisms to manage and consolidate knowledge effectively.

Method

The proposed Long-CL framework for long-term continual learning comprises two main components inspired by human memory mechanisms:

Task-Core Memory Management (MemMan): This module identifies critical memory units by computing parameter drift between the current and previous models using a Top-K selection strategy (Equation 2), storing these in a mask matrix for dynamic updates. It also employs an adaptive memory updating mechanism that calculates a fusion weight ( $\alpha_t$ ) based on semantic similarity between task prototypes (Equation 5), balancing new knowledge integration and past memory preservation through position-aware weighting (Equation 7).
Long-term Memory Consolidation (MemCon): This component enhances knowledge retention by selecting hard samples (outliers or challenging cases) based on semantic distance to the task prototype (Equation 8) and differential samples (globally consistent across tasks) using cumulative distance to prior prototypes with a local distance constraint (Equation 9). These samples form a replay buffer to reinforce task adaptability and long-term consistency. The method leverages Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning on large language and vision-language models, unifying tasks under a generative paradigm with negative log-likelihood loss optimization (Equation 1).

Experiment

The experiments were conducted on two newly proposed benchmarks, MMLongCL-Bench (21 datasets, 4 multimodal task types) and TextLongCL-Bench (30 datasets, 3 textual task types), using LLaVA-7B and Qwen2.5-7B models, respectively. Evaluation metrics included Final Average Performance (AP) and Average Forgetting (AF), comparing Long-CL against baselines like ER, EWC, O-LoRA, and CL-MoE. Results showed Long-CL outperforming the state-of-the-art by 7.4% AP (51.93% vs. 44.53%) on MMLongCL-Bench and 6.5% AP (60.12% vs. 53.58%) on TextLongCL-Bench, with significant reductions in forgetting (AF of -9.93% and -0.89%, indicating backward transfer). The setup is comprehensive in task diversity, though the representativeness of real-world long-term CL scenarios is unclear. Ablation studies confirmed the contributions of MemMan and MemCon, with MemCon showing stronger impact via sample replay. However, the computational overhead of memory management and sample selection was not detailed, and hyperparameter sensitivity (e.g., buffer size, K value) showed potential trade-offs between performance and efficiency. The negative AF suggests intriguing backward transfer, but its mechanisms and sustainability over longer sequences remain underexplored. Overall, while results align with expectations of reduced forgetting, practical scalability concerns persist.

Further Thoughts

The Long-CL framework presents a compelling approach to long-term continual learning, particularly with its human memory-inspired mechanisms, which could inspire broader applications in adaptive AI systems beyond CL, such as in lifelong learning for robotics or personalized AI assistants. The negative AF values indicating backward transfer are particularly fascinating—could this be linked to shared representations in differential sample selection, and might it connect to emergent abilities in large models as seen in scaling law studies? However, the scalability of MemMan’s Top-K selection and the computational cost of prototype calculations over hundreds of tasks warrant further investigation; perhaps integrating efficient clustering or dimensionality reduction techniques could mitigate this. Additionally, the benchmarks, while diverse, might benefit from incorporating non-stationary data drifts or evolving task distributions, aligning more closely with real-world dynamics as explored in online learning literature. Finally, exploring connections with federated learning could address privacy concerns in long-term CL, especially for multimodal data across distributed systems, opening new research avenues for privacy-preserving continual learning.