Accurately extracting building information from high-resolution remote sensing images faces challenges due to complex background transformations and the diversity of building shapes. This study developed a high-resolution building semantic segmentation network-building mining net (BMNet), which integrated a hybrid attention mechanism with multi-scale feature enhancement. First, the encoder utilized VGG-16 as the backbone network to extract features, obtaining four layers of feature representations. Then, a decoder was designed to address the issue of detail loss in high-layer features within multi-scale information. Specifically, a series attention module (SAM), which combined channel attention and spatial attention, was introduced to enhance the representation capabilities of high-layer features. Additionally, the building mining module(BMM) with progressive feature enhancement was designed to further improve the accuracy of building segmentation. With the upsampled feature mapping, the feature mapping post-processed using SAM, and initial prediction results as input, the BMM output background noise information and then filtered out background information using the context information exploration module designed in this study. Optimal prediction results were achieved after multiple processing using the BMM. Comparative experiment results indicate that the BMNet outperformed U-Net, with accuracy and intersection over union (IoU) increasing by 4.6% and 4.8%, respectively on the WHU Building dataset, by 7.9% and 8.9%, respectively on the Massachusetts buildings dataset, and by 6.7% and 11.0%, respectively on the Inria Aerial Image Labeling Dataset. These results validate the effectiveness and practicality of the proposed model.
Xu Z X, Cheng T, Hong S Y, et al. Review on applications of remote sensing in urban flood modeling[J]. Chinese Science Bulletin, 2018, 63(21):2156-2166.
Xiang Y, Huang Z, Hua Y Y, et al. Research and application of intelligent verification technology for authenticity of homestead reclamation project based on deep learning[J]. Bulletin of Surveying and Mapping, 2023(1):163-167.
doi: 10.13474/j.cnki.11-2246.2023.0028
Zhang Y, Guo H M, Yin W G, et al. Application of SVM image classification technology based on feature extraction in seismic damage identification of buildings by UAV remote sensing[J]. Journal of Catastrophology, 2022, 37(4):30-36,56.
[4]
Nielsen M M. Remote sensing for urban planning and management:The use of window-independent context segmentation to extract urban features in Stockholm[J]. Computers,Environment and Urban Systems, 2015, 52:1-9.
[5]
Karantzalos K, Paragios N. Recognition-driven two-dimensional competing priors toward automatic and accurate building detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2009, 47(1):133-144.
[6]
Huang X, Zhang L. An SVM ensemble approach combining spectral,structural,and semantic features for the classification of high-resolution remotely sensed imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2013, 51(1):257-272.
[7]
Belgiu M, Drǎgut L. Comparing supervised and unsupervised multiresolution segmentation approaches for extracting buildings from very high resolution imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 96:67-75.
pmid: 25284960
[8]
Wang Y, Wang C, Zhang H. Integrating H-A-α with fully convolutional networks for fully PolSAR classification[C]// 2017 International Workshop on Remote Sensing with Intelligent Processing (RSIP). May 18-21,2017,Shanghai,China.IEEE, 2017:1-4.
[9]
Ronneberger O, Fischer P, Brox T. U-net:Convolutional networks for biomedical image segmentation[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015:234-241.
[10]
Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2018:833-851.
[11]
Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 15-20,2019,Long Beach,CA,USA.IEEE, 2019:3141-3149.
Jin S, Guan M, Bian Y C, et al. Building extraction from remote sensing images based on improved U-Net[J]. Laser & Optoelectronics Progress, 2023, 60(4):3788/LOP213004.
[13]
Yu M, Chen X, Zhang W, et al. AGs-unet:Building extraction model for high resolution remote sensing images based on attention gates U network[J]. Sensors, 2022, 22(8):2932.
[14]
He X, Zhou Y, Zhao J, et al. Swin transformer embedding UNet for remote sensing image semantic segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:4408715.
[15]
Liu Z, Lin Y, Cao Y, et al. Swin transformer:Hierarchical vision transformer using shifted windows[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17,2021,Montreal,QC,Canada.IEEE, 2021:9992-10002.
[16]
Wang L, Li R, Wang D, et al. Transformer meets convolution:A bilateral awareness network for semantic segmentation of very fine resolution urban scene images[J]. Remote Sensing, 2021, 13(16):3065.
[17]
Chen K, Zou Z, Shi Z. Building extraction from remote sensing images with sparse token transformers[J]. Remote Sensing, 2021, 13(21):4441.
[18]
Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 15-20,2019,Long Beach,CA,USA.IEEE, 2019:5686-5696.
[19]
Woo S, Park J, Lee J Y, et al. CBAM:Convolutional block attention module[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2018:3-19.
[20]
Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. June 18-23,2018,Salt Lake City,UT,USA.IEEE, 2018:7132-7141.
[21]
Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks[J]. Advances in Neural Information Processing Systems, 2015,28.
[22]
Ji S, Wei S, Lu M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(1):574-586.
[23]
Maggiori E, Tarabalka Y, Charpiat G, et al. Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark[C]// 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). July 23-28,2017,Fort Worth,TX,USA.IEEE, 2017:3226-3229.
[24]
Zhou Y, Chen Z, Wang B, et al. BOMSC-net:Boundary optimization and multi-scale context awareness based building extraction from high-resolution remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:5618617.
[25]
Fang L, Zhang L, Nie D, et al. Automatic brain labeling via multi-atlas guided fully convolutional networks[J]. Medical Image Analysis, 2019, 51:157-168.
doi: S1361-8415(18)30860-0
pmid: 30447544
[26]
Zhang H, Liao Y, Yang H, et al. A local-global dual-stream network for building extraction from very-high-resolution remote sensing images[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(3):1269-1283.
[27]
Guo H, Du B, Zhang L, et al. A coarse-to-fine boundary refinement network for building footprint extraction from remote sensing imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 183:240-252.
[28]
Liu Z, Shi Q, Ou J. LCS:A collaborative optimization framework of vector extraction and semantic segmentation for building extraction[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:5632615.