Abstract:
Currently, remote sensing image segmentation faces a range of challenges, including insufficient multi-scale feature fusion, limited interactions between global and local contextual information, and loss of spatial details caused by conventional upsampling. To address these issues, this study proposed a remote sensing image segmentation method based on multi-scale gated feature fusion (MSGFF) and dynamic sliding upsampling (DSU) under the Deeplab architecture: the MSGFF-DSU-Deeplab method. First, with DeeplabV3+ as a baseline model, a multi-scale global-local feature aggregation module (MGLFAM) was developed. Through global and local mixing operations and combining a gated attention mechanism, the MGLFAM enabled dynamic weighted fusion of multi-layer features, significantly improving the representation of key regions. Second, a gated convolutional attention fusion module (GCAFM) was designed and implemented. By integrating the convolutional attention mechanism with gated linear units, the GCAFM enabled the network to adaptively focus on key semantic regions, effectively enhancing the representation of edge features and small-scale objects. Third, a dynamic sliding effective upsampling module (SEUM) was proposed. Based on the collaborative optimization of depthwise separable convolution and the directional channel shifting strategy, the SEUM significantly enhanced the resolution reconstruction quality, overcoming the performance bottlenecks of conventional linear interpolation methods in recovering spatial details. Finally, the MSGFF-DSU-Deeplab method proposed in this study was experimentally validated on two high-resolution remote sensing image datasets: WHU-Building and Vaihingen. The results indicate that the proposed method significantly outperformed the mainstream segmentation methods in terms of various assessment metrics. It yielded mean intersection over union (mIoU) values of 90.59% on the WHU-Building dataset and 81.42% on the Vaihingen dataset, representing an increase of approximately 1% compared to the baseline model DeeplabV3+. Notably, the proposed method significantly improved the identification accuracy of edge details while maintaining the encouraging computational efficiency. Therefore, it particularly performs well in the edge segmentation of small-scale buildings and complex features. Overall, the proposed method enhances both the segmentation performance for remote sensing images and the model's adaptability to diverse scenarios, offering novel technical insights into the analysis of high-precision remote sensing images.