Contaminated Multivariate Time-Series Anomaly Detection with Spatio-Temporal Graph Conditional Diffusion Models

TSAD-C introduces a pioneering unsupervised framework for multivariate time-series anomaly detection on contaminated data, using a Decontaminator with S4-based diffusion, long-range dependency modeling via a time-then-graph approach, and anomaly scoring, achieving state-of-the-art performance across diverse datasets.

Time Series Data, Unsupervised Learning, Generative Modeling, Diffusion Model, Detection, Graph Data

Thi Kieu Khanh Ho, Narges Armanfard

McGill University, Mila - Quebec AI Institute

Generated by grok-3

Background Problem

The paper addresses the critical challenge of unsupervised time-series anomaly detection (TSAD) in multivariate time-series (MTS) data when the training set is contaminated with anomalies, a common issue in real-world applications due to data shifts or human errors. Traditional unsupervised methods assume clean training data, leading to poor performance when anomalies are present during training, as they misdetect similar anomalies in testing. Additionally, existing methods struggle to capture long-range intra-variable (temporal) and inter-variable (spatial) dependencies effectively, often relying on short-term patterns or predefined graphs that fail in dynamic environments. The authors aim to solve these problems by developing TSAD-C, a novel framework that can handle contaminated training data and model both long-range temporal and spatial dependencies without requiring labeled data or predefined graph structures.

Method

TSAD-C is a fully unsupervised framework for TSAD on contaminated multivariate time-series data, comprising three core modules:

Decontaminator: This module uses masking strategies (random, random block, and blackout masking) to mask portions of the input data, assuming normal data predominates, and employs an S4-based conditional diffusion model to rectify anomalies. The diffusion process adds noise over multiple steps, while the reverse process, guided by an S4 noise estimator, reconstructs decontaminated data in a single step during training (full steps during testing). The S4 model ensures long-range intra-variable dependencies are captured, and noise error minimization focuses only on masked portions for efficiency.
Long-range Variable Dependency Modeling: Adopting a time-then-graph approach, this module first captures long-range intra-variable dependencies using multiple S4 layers to project data into an embedding space. Then, inter-variable dependencies are modeled dynamically via graphs constructed over short time windows, using self-attention and graph isomorphism networks (GIN) with regularization for smoothness, sparsity, and connectivity.
Anomaly Scoring: During testing, anomaly scores are computed as a weighted combination of root mean square errors (RMSE) from the Decontaminator (based on masked data reconstruction) and the dependency modeling module (based on direct input reconstruction), with a quantile-based threshold determined on an unlabeled validation set. The method is end-to-end, requiring no anomaly labels or prior graph knowledge, focusing on reducing contamination impact while modeling complex dependencies.

Experiment

The experiments were conducted on four diverse and challenging datasets (SMD, ICBEB, DODH, TUSZ) from industrial and biomedical domains, selected for their reliability over criticized benchmarks like NASA or SWaT. The setup included contaminated training data (10-20% anomalies), with separate validation and test sets, and performance was evaluated using F1-score, Recall, and Area Under Precision-Recall Curve (APR). TSAD-C was compared against 12 state-of-the-art unsupervised TSAD methods, categorized by their ability to handle intra-, inter-, or both variable dependencies. Results showed TSAD-C outperforming all baselines, achieving an average 6.3% F1 improvement over the second-best method, with significant gains in Recall, especially crucial for contaminated data scenarios. Additional experiments demonstrated TSAD-C’s resilience to varying anomaly types and ratios, with consistent performance across masking strategies (Random Block Masking being optimal). Ablation studies confirmed the complementary nature of all modules, though intra-variable modeling was prioritized. The Decontaminator’s efficiency was validated by comparing noise minimization strategies, showing faster training and better performance when focusing on masked portions. However, while the experimental setup is comprehensive, the reliance on specific datasets and fixed hyperparameters raises questions about generalizability to other contamination levels or real-time constraints, and the computational cost (e.g., training times per epoch) suggests potential scalability issues not fully addressed.

Further Thoughts

While TSAD-C presents a significant advancement in handling contaminated training data for TSAD, several deeper considerations arise. First, the Decontaminator’s effectiveness hinges on the assumption that normal data predominates and that masking can sufficiently reduce anomaly impact. In scenarios with higher contamination levels or subtle anomalies mimicking normal patterns, this approach might falter, suggesting a need for adaptive masking or anomaly estimation techniques—perhaps integrating ideas from robust statistics or outlier detection as a pre-processing step. Second, the computational overhead of S4 layers and dynamic graph learning, though innovative, could limit applicability in resource-constrained environments like edge devices in IoT systems, prompting exploration of lightweight alternatives or model compression techniques inspired by recent works in efficient deep learning. Additionally, connecting TSAD-C to broader AI safety and robustness research, the method’s ability to handle noisy data could inform strategies for training robust models in other domains like natural language processing, where datasets often contain label noise or adversarial examples. Finally, the time-then-graph framework’s success in TSAD might inspire hybrid architectures in other multimodal time-series tasks, such as video anomaly detection, where temporal and spatial dependencies are similarly critical. These directions warrant further investigation to enhance TSAD-C’s impact and adaptability.