Abstract:
Cloud detection serves as a critical research direction in remote sensing image processing, with broad applications in meteorological monitoring, environmental assessment, agricultural management, and military reconnaissance. Accurate detection and segmentation of cloud regions hold significant importance in improving the utilization efficiency of remote sensing data. However, the clouds are characterized by diverse and complex morphology, including cirrus, cumulus, and stratus, with significant variations in their thickness, transparency, and altitude. In response to these characteristics of clouds, this study designed a U-striped model for cloud detection based on convolutional and cross-stripe Transformer hybrid encoders (UCT-Net). Based on a U-shaped network architecture, the proposed UCT-Net incorporates both convolutional and Transformer encoders, jointly extracting features from satellite cloud imagery. Specifically, to enhance adaptability to diverse cloud morphologies, this study further designed a cross-stripe Transformer module to capture variations in cloud morphology effectively. Additionally, it also proposed a dual-weighted attention mechanism integrating texture and channel information, named the cross stripe encoder and conv encoder merge module(CCM), effectively facilitating the deep fusion of convolutional and cross-stripe Transformer-based encoders. The UCT-Net was evaluated and validated on two datasets: the GF12MS WHU dataset derived from GF-1 and GF-2 satellite data, and the HRC WHU dataset sourced from Google Earth. The results show that the UCT-Net achieved an precision of 92.70% on the GF12MS WHU dataset and 94.20% on the HRC WHU dataset, outperforming classical semantic segmentation algorithms. This demonstrates the superior performance of the UCT-Net in cloud detection tasks.