To address the accuracy reduction in the semantic segmentation of remote sensing images due to insufficient extraction of contextual dependencies and loss of spatial details, this study proposed a semantic segmentation method based on context- and class-aware feature fusion. With ResNet-50 as the backbone network for feature extraction, the proposed method incorporates the attention module during downsampling to enhance feature representation and contextual dependency extraction. It constructs a large receptive field block on skip connections to extract rich multiscale contextual information, thereby mitigating the impacts of scale variations between targets. Furthermore, it connects a scene feature association and fusion module in parallel behind the block to guide local feature fusion based on global features. Finally, it constructs a class prediction module and a class-aware feature fusion module in the decoder part to accurately fuse the low-level advanced semantic information with high-level detailed information. The proposed method was validated on the Potsdam and Vaihingen datasets and compared with six commonly used methods, including DeepLabv3+ and BuildFormer, to verify its effectiveness. Experimental results demonstrate that the proposed method outperformed other methods in terms of recall, F1-score, and accuracy. Particularly, it yielded intersection over union (IoU) values of 90.44% and 86.74% for building segmentation, achieving improvements of 1.55% and 2.41%, respectively, compared to suboptimal networks DeepLabv3+ and A2FPN.
何晓军, 罗杰. 结合上下文与类别感知特征融合的高分遥感图像语义分割[J]. 自然资源遥感, 2025, 37(2): 1-10.
HE Xiaojun, LUO Jie. Semantic segmentation of high-resolution remote sensing images based on context- and class-aware feature fusion. Remote Sensing for Natural Resources, 2025, 37(2): 1-10.
Liu Z, Zhao T, Liao F F, et al. Research and comparative analysis on urban built-up area extraction methods from high-resolution remote sensing image based on semantic segmentation network[J]. Remote Sensing for Land and Resources, 2021, 33(1):45-53.doi:10.6046/gtzyyg.2020162.
[2]
Zhang T, Su J, Liu C, et al. State and parameter estimation of the AquaCrop model for winter wheat using sensitivity informed particle filter[J]. Computers and Electronics in Agriculture, 2021, 180:105909.
[3]
Feng S, Fan Y, Tang Y, et al. A change detection method based on multi-scale adaptive convolution kernel network and multimodal conditional random field for multi-temporal multispectral images[J]. Remote Sensing, 2022, 14(21):5368.
Yu H, An N, Wang J, et al. High-resolution remote sensing-based dynamic monitoring of coal mine collapse areas in southwestern Guizhou:A case study of coal mine collapse areas in Liupanshui City[J]. Remote Sensing for Natural Resources, 2023, 35(3):310-318.doi:10.6046/zrzyyg.2022170.
[5]
Tian R, Sun G, Liu X, et al. Sobel edge detection based on weighted nuclear norm minimization image denoising[J]. Electronics, 2021, 10(6):655.
[6]
Yang J, He Y, Caspersen J. Region merging using local spectral angle thresholds:A more accurate method for hybrid segmentation of remote sensing images[J]. Remote Sensing of Environment, 2017, 190:137-148.
[7]
Zhang X, Feng X, Xiao P, et al. Segmentation quality evaluation using region-based precision and recall measures for remote sensing images[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2015, 102:73-84.
He X H, Chen M Y, Li P L, et al. Road extraction from remote sensing image by integrating DCNN with short range conditional random field[J]. Geomatics and Information Science of Wuhan University, 2024, 49(3):333-342.
[9]
Qi G, Zhang Y, Wang K, et al. Small object detection method based on adaptive spatial parallel convolution and fast multi-scale fusion[J]. Remote Sensing, 2022, 14(2):420.
Long L H, Zhu Y T, Yan J W, et al. New building extraction method based on semantic segmentation[J]. National Remote Sensing Bulletin, 2023, 27(11):2593-2602.
[11]
Zhu Z, Luo Y, Qi G, et al. Remote sensing image defogging networks based on dual self-attention boost residual octave convolution[J]. Remote Sensing, 2021, 13(16):3104.
[12]
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]// Conference on Computer Vision and Pattern Recognition.IEEE, 2015:640-651.
[13]
Ronneberger O, Fischer P, Brox T. U-net:Convolutional networks for biomedical image segmentation[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015:234-241.
Li W Y, Lou D B, Wang C H, et al. Research on granite-pegmatite information extraction method based on improved U-Net[J]. Remote Sensing for Natural Resources, 2024, 36(2):89-96.doi:10.6046/zrzyyg.2022500.
[15]
Pan X, Yang F, Gao L, et al. Building extraction from high-resolution aerial imagery using a generative adversarial network with spatial and channel attention mechanisms[J]. Remote Sensing, 2019, 11(8):917.
Liu S W, Cui Z Y, Li D Y. Multi-task learning for building object semantic segmentation of remote sensing image based on Unet network[J]. Remote Sensing for Land & Resources, 2020, 32(4):74-83.doi:10.6046/gtzyyg.2020.04.11.
[17]
Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Honolulu,HI,USA.IEEE, 2017:6230-6239.
[18]
Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]// Computer Vision-ECCV 2018:15th European Conference,Munich,Germany,September 8-14,2018,Proceedings,Part VII.ACM, 2018:833-851.
[19]
Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimationC]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Long Beach,CA,USA.IEEE, 2019:5686-5696.
Qu H C, Liang X. Fusion of hybrid attention mechanism and multi-scale feature enhancement for high-resolution satellite image building extraction[J]. Remote Sensing for Natural Resources, 2024, 36(4):107-116.doi:10.6046/zrzyyg.2023146.
[21]
Li H, Qiu K, Chen L, et al. SCAttNet:Semantic segmentation network with spatial and channel attention mechanism for high-resolution remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 18(5):905-909.
Zhang Y H, Zhang F, He Z F, et al. Remote sensing image segmentation based on attention guidance and multi-feature fusion[J]. Acta Optica Sinica, 2023, 43(24):3788/AOS230631.
[23]
Li R, Wang L, Zhang C, et al. A2-FPN for semantic segmentation of fine-resolution remotely sensed images[J]. International Journal of Remote Sensing, 2022, 43(3):1131-1155.
[24]
Wang L, Fang S, Meng X, et al. Building extraction with vision Transformer[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:5625711.
[25]
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words:Transformers for image recognition at scale[J/OL]. 2020:arXiv:2010.11929. http://arxiv.org/abs/2010.11929.
[26]
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Las Vegas,NV,USA.IEEE, 2016:770-778.
[27]
Liu S, Huang D. Receptive field block net for accurate and fast object detection[C]// Proceedings of the European Conference on Computer Vision (ECCV). 2018:385-400.