This paper introduces a thermal detector-based traffic light system using YOLO-Thermal, a modified YOLOv8 framework, to dynamically adjust signal timings for individuals with mobility restrictions, achieving superior detection accuracy (89.1% APval) and enhancing intersection accessibility while addressing privacy and adverse condition challenges.
Detection, CNN, Multimodal Data, Robustness, Human-AI Interaction
Xiao Ni, Carsten Kuehnel, Xiaoyi Jiang
University of Munster, University of Applied Sciences Erfurt
Generated by grok-3
Background Problem
Urban intersections often fail to accommodate individuals with mobility restrictions due to fixed traffic signal timings and inadequate adaptive features, posing safety risks (e.g., 36% higher mortality rate for wheelchair users in car-related accidents). Existing RGB camera-based systems struggle in adverse weather or low-light conditions and raise privacy concerns. This paper aims to address these ICT-related barriers by developing a thermal detector-based traffic light system that dynamically adjusts signal durations and auditory cues for people with mobility and visual impairments, enhancing safety and accessibility while preserving privacy.
Method
The proposed method involves a thermal imaging-based system for barrier-free intersections with two core components:
- YOLO-Thermal Framework: Built on YOLOv8, it integrates advanced modules to tackle thermal imaging challenges like low resolution and lack of texture. Key enhancements include Triplet-Attention for cross-dimensional feature interaction, SPD-Conv for preserving fine-grained details in low-resolution images, SPPFCSPC for multi-scale feature extraction, and Quality Focal Loss (QFL) to address class imbalance in detection.
- Adaptive Traffic Light Control: Utilizes thermal cameras to detect pedestrians with mobility restrictions, categorizing them into walking impairments, visual impairments, and mobility burden groups. It dynamically extends green light durations (up to 6s, 8s, and 3s respectively) and enhances auditory signals for visually impaired individuals. A multi-frame validation approach (with parameter N) ensures robust detection by confirming absence only after N consecutive undetected frames, balancing safety and traffic flow.
Experiment
The experiments were conducted using the newly created TD4PWMR dataset (11,196 thermal images) focusing on pedestrians with mobility restrictions under diverse conditions (seasons, times of day). The setup compares YOLO-Thermal against state-of-the-art (SOTA) models like YOLOv11, YOLOv10, YOLOv9, YOLOv8, YOLOv7, YOLOv6, and RT-DETR on metrics like APval (average precision across IoU thresholds). Results show YOLO-Thermal achieves the highest APval (89.1%), outperforming competitors, with strong performance across various IoU thresholds and object sizes, while maintaining efficiency (90.1 FPS on RTX 2080 Ti). Ablation studies validate each module’s contribution, though SPD-Conv’s impact on small object detection was inconsistent with prior claims. The multi-frame validation study (N=2 optimal) balances success rate (77.2%) and latency (1.2s), meeting the 95% design criterion for accessibility. However, the dataset’s class imbalance remains a concern, and testing is limited to a specific camera setup, raising questions about generalizability. The experimental design is reasonable but not exhaustive, as it lacks cross-dataset validation or real-world deployment scenarios beyond controlled intersections.
Further Thoughts
The use of thermal imaging for privacy-preserving detection in urban settings is a promising direction, especially for sensitive applications like assisting vulnerable populations. However, the reliance on specific hardware (FLIR ThermiBot2 cameras) and the lack of cross-dataset validation raise concerns about the system’s adaptability to different thermal imaging setups or urban environments with varying intersection layouts. An interesting connection could be drawn to federated learning approaches in smart cities, where thermal detection models could be trained across multiple cities’ datasets without sharing raw data, enhancing generalizability while maintaining privacy. Additionally, the multi-frame validation latency issue might benefit from integrating lightweight tracking algorithms or predictive models that estimate pedestrian exit times based on historical crossing patterns, potentially reducing unnecessary green light extensions. Finally, exploring the ethical implications of prioritizing certain impairment groups over others in traffic control could align this work with broader discussions on fairness in AI for urban planning, ensuring that accessibility solutions do not inadvertently marginalize less prioritized groups.