Building extraction using high-resolution satellite imagery based on an attention enhanced full convolution neural network
GUO Wen1(), ZHANG Qiao2()
1. The Third Institute of Photogrammetry and Remote Sensing, Ministry of Natural Resources, Chengdu 610100, China 2. School of Geoscience and Technology, Southwest Petroleum University, Chengdu 610500, China
Automatic extraction of buildings from satellite remote sensing images has a wide range of applications in the development of economy and society. Due to the influence of mutual occlusion, illumination, background environment and other factors in satellite remote sensing images, it is difficult for traditional methods to achieve high-precision building extraction. This paper proposes an attention enhanced feature pyramid network (FPN-SENet) and constructs a large-scale pixel-wise building dataset (SCRS dataset) by using multi-source high-resolution satellite images and vector data to realize the automatic extraction of buildings from multi-source satellite images, and compares it with the other full convolution neural networks. The results show that the accuracy of building extracted from SCRS dataset is close to the world’s leading open source satellite image dataset, and the accuracy of Pseudo color data is higher than that of true color data The accuracy of FPN-SENet is better than that of other full convolution neural networks. The extraction of building can also be improved by using the sum of cross entropy and Dice coefficient as the loss function. The overall accuracy of the best classification model is 95.2%, Kappa coefficient is 79.0%, and F1-score and IoU are 81.7% and 69.1% respectively. This study can provide a reference for building automatic extraction from high-resolution satellite images.
郭文, 张荞. 基于注意力增强全卷积神经网络的高分卫星影像建筑物提取[J]. 国土资源遥感, 2021, 33(2): 100-107.
GUO Wen, ZHANG Qiao. Building extraction using high-resolution satellite imagery based on an attention enhanced full convolution neural network. Remote Sensing for Land & Resources, 2021, 33(2): 100-107.
Blaschke T. Object based image analysis for remote sensing[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2010, 65(1):2-16.
doi: 10.1016/j.isprsjprs.2009.06.004
[2]
Yang Y, Newsam S. Geographic image retrieval using local invariant features[J]. IEEE Transactions on Geoscience and Remote Sensing, 2013, 51(2):818-832.
doi: 10.1109/TGRS.2012.2205158
[3]
Li E, Femiani J, Xu S B, et al. Robust rooftop extraction from visible band images using higher order CRF[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(8):4483-4495.
doi: 10.1109/TGRS.2015.2400462
[4]
Melgani F, Bruzzone L. Classification of hyperspectral remote sensing images with support vector machines[J]. IEEE Transactions on Geoscience and Remote Sensing, 2004, 42(8):1778-1790.
doi: 10.1109/TGRS.2004.831865
[5]
Hinton G E, Osindero S, Teh Y. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18:1527-1554.
doi: 10.1162/neco.2006.18.7.1527
[6]
Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[C]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
[7]
Graves A, Liwicki M, Fernandez S, et al. A novel connectionist system for unconstrained handwriting recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31:855-868.
doi: 10.1109/TPAMI.2008.137
pmid: 19299860
[8]
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]// International Conference on Neural Information Processing Systems, 2012.
[9]
Mikolov T, Deoras A, Kombrink S, et al. Empirical evaluation and combination of advanced language modeling techniques[C]// Florence:Conference of the International Speech Communication Association, 2011.
[10]
Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio, Speech and Language Processing, 2012, 20(1):30-42.
doi: 10.1109/TASL.2011.2134090
[11]
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]// IEEE Conference on Computer Vision and Pattern Recognition,Boston,MA,USA, 2015.
[12]
Wu G M, Shao X W, Guo Z L, et al. Automatic building segmentation of aerial imagery using multi-constraint fully convolutional networks[J]. Remote Sensing, 2018, 10(3):407-424.
doi: 10.3390/rs10030407
[13]
Zhang W K, Huang H, Schmitz M, et al. Effective fusion of multi-modal remote sensing data in a fully convolutional network for semantic labeling[J]. Remote Sensing, 2018, 10(52):1-14.
doi: 10.3390/rs10010001
[14]
Badrinarayanan V, Kendall A, Cipolla R. Segnet:A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(12):2481-2495.
doi: 10.1109/TPAMI.34
[15]
Ronneberger O, Fischer P, Brox T. U-Net:Convolutional networks for biomedical image segmentation[C]// Medical Image Computing and Computer-Assisted Intervention, 2015.
[16]
Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network[C]. IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[17]
Chen L C, Papandreou G, Kokkinos I, et al. Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected CRFS[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2018, 40:834-848.
[18]
Mnih V. Machine learning for aerial image labeling[M]. Toronto:Toronto University of Toronto, 2013.
[19]
Maggiori E, Tarabalka Y, Charpiat G, et al. Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark[C]// IEEE International Geoscience and Remote Sensing Symposium(IGARSS),Fort Worth,United States, 2017.
Ji S P, Wei S Q. Building extraction via convolutional neural networks from an open remote sensing building dataset[J]. Acta Geodaetica et Cartographica Sinica, 2019, 48(4):448-459.
[21]
Lin S Y, Doll'ar P, Girshick R, et al. Feature pyramid networks for object detection[C]// IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[22]
Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]// IEEE Conference on Computer Vision and Pattern Recognition, 2018.
Yang J Y, Zhou Z X, Du Z R, et al. Rural construction land extraction from high spatial resolution remote sensing image based on segnet semantic segmentation model[J]. Transactions of the Chinese Society of Agricultural Engineering, 2019, 35(5):251-258.
The First National Geographic National Conditions Census Leading Group Office of State Council. Geographic national conditions census content and indicator[M]. Beijing: Surveying and Mapping Press, 2013.
[25]
Congalton R G. A review of assessing the accuracy of classifications of remotely sensed data[J]. Remote Sensing of Environment, 1991, 37(1):35-46.
doi: 10.1016/0034-4257(91)90048-B
[26]
Maggiori E, Tarabalka Y, Charpiat G, et al. Convolutional neural networks for large-scale remote-sensing image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55:645-657.
doi: 10.1109/TGRS.2016.2612821