融合混合注意力机制与多尺度特征增强的高分影像建筑物提取

doi:10.6046/zrzyyg.2023146

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(13511 KB)

HTML
输出: BibTeX | EndNote (RIS)

摘要

由于复杂背景变换和建筑物形状多样化等因素影响,从高分辨率遥感图像中准确提取建筑物信息面临着挑战。该文提出了一种融合混合注意力机制与多尺度特征增强的高分辨率建筑物语义分割网络(building mining net,BMNet)。首先,编码器部分使用VGG-16作为主干网络来提取特征,得到4层特征表示; 然后设计解码器用于解决多尺度信息中高层特征的细节信息丢失问题,引入了混合通道注意力和空间注意力的串联注意力机制(series attention module,SAM),增强高层特征的表示能力; 同时,设计了一种渐进式特征增强的建筑物信息挖掘模块(building mining module,BMM),进一步提高建筑物分割的准确性。BMM把上采样后的特征映射、经过SAM处理的特征映射以及初始预测结果作为输入,获取背景噪声信息,并利用所设计的上下文信息探索模块滤除背景信息,在经过多次BMM处理后得到最佳预测结果。对比实验结果表明: BMNet在武汉大学建筑数据集上精度和交并比分别优于U-net 4.6%和4.8%,在马萨诸塞州建筑数据集和Inria航空图像标注数据集上精度和交并比分别优于U-net 7.9%,8.9%和6.7%,11.0%,验证了所提模型的有效性以及实用性。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	曲海成
	梁旭

关键词 ：语义分割, 高分辨率遥感影像, 建筑物提取, U-net, 注意力机制, 空洞卷积

Abstract：

Accurately extracting building information from high-resolution remote sensing images faces challenges due to complex background transformations and the diversity of building shapes. This study developed a high-resolution building semantic segmentation network-building mining net (BMNet), which integrated a hybrid attention mechanism with multi-scale feature enhancement. First, the encoder utilized VGG-16 as the backbone network to extract features, obtaining four layers of feature representations. Then, a decoder was designed to address the issue of detail loss in high-layer features within multi-scale information. Specifically, a series attention module (SAM), which combined channel attention and spatial attention, was introduced to enhance the representation capabilities of high-layer features. Additionally, the building mining module(BMM) with progressive feature enhancement was designed to further improve the accuracy of building segmentation. With the upsampled feature mapping, the feature mapping post-processed using SAM, and initial prediction results as input, the BMM output background noise information and then filtered out background information using the context information exploration module designed in this study. Optimal prediction results were achieved after multiple processing using the BMM. Comparative experiment results indicate that the BMNet outperformed U-Net, with accuracy and intersection over union (IoU) increasing by 4.6% and 4.8%, respectively on the WHU Building dataset, by 7.9% and 8.9%, respectively on the Massachusetts buildings dataset, and by 6.7% and 11.0%, respectively on the Inria Aerial Image Labeling Dataset. These results validate the effectiveness and practicality of the proposed model.

Key words： semantic segmentation high spatial resolution remote sensing image building information extraction U-net attention mechanism dilated convolution

收稿日期: 2023-05-23 出版日期: 2024-12-23

ZTFLH:

TP751

基金资助:国家自然科学基金面上项目“面向数据特性保持的高光谱影像高效压缩方法研究”(42271409);辽宁省高等学校基本科研项目“基于脉冲混合神经网络的高效能目标检测”(LJKMZ20220699)

通讯作者: 梁旭(1998-),男,硕士研究生,主要研究方向为数字图像处理与模式识别。Email: 15047728650@163.com。

作者简介: 曲海成(1981-),男,博士,副教授,主要研究方向为遥感图像高性能计算、智能大数据处理等。Email: quhaicheng@lntu.edu.cn。

引用本文:

曲海成, 梁旭. 融合混合注意力机制与多尺度特征增强的高分影像建筑物提取[J]. 自然资源遥感, 2024, 36(4): 107-116.
QU Haicheng, LIANG Xu. Building extraction from high-resolution images using a hybrid attention mechanism combined with multi-scale feature enhancement. Remote Sensing for Natural Resources, 2024, 36(4): 107-116.

链接本文:

https://www.gtzyyg.com/CN/10.6046/zrzyyg.2023146 或 https://www.gtzyyg.com/CN/Y2024/V36/I4/107

Fig.1 BMNet网络结构图

Fig.2 混合注意力模块

Fig.3 通道注意力模块

Fig.4 空间注意力模块

Fig.5 建筑物挖掘模块结构图

Tab.1 BMNet模型组合结构

Tab.2 消融实验结果

Fig.6 消融实验中间结果图(特征图)

Tab.3 武汉大学建筑数据集定对比实验结果

Tab.4 武汉大学建筑数据集定量对比实验结果

Tab.5 马萨诸塞州建筑数据集定性对比实验结果

Tab.6 马萨诸塞州建筑数据集定量对比实验结果

Tab.7 Inria航空图像数据集定性对比实验结果

Tab.8 Inria航空图像标注数据集定量对比实验结果

Fig.7 模型参数量和IoU数据对比

[1]	徐宗学, 程涛, 洪思扬, 等. 遥感技术在城市洪涝模拟中的应用进展[J]. 科学通报, 2018, 63(21):2156-2166.
	Xu Z X, Cheng T, Hong S Y, et al. Review on applications of remote sensing in urban flood modeling[J]. Chinese Science Bulletin, 2018, 63(21):2156-2166.
[2]	向煜, 黄志, 华媛媛, 等. 深度学习支持下的宅基地复垦项目真实性智能审查技术研究与应用[J]. 测绘通报, 2023(1):163-167. doi: 10.13474/j.cnki.11-2246.2023.0028
	Xiang Y, Huang Z, Hua Y Y, et al. Research and application of intelligent verification technology for authenticity of homestead reclamation project based on deep learning[J]. Bulletin of Surveying and Mapping, 2023(1):163-167. doi: 10.13474/j.cnki.11-2246.2023.0028
[3]	张莹, 郭红梅, 尹文刚, 等. 基于特征提取的SVM图像分类技术的无人机遥感建筑物震害识别应用研究[J]. 灾害学, 2022, 37(4):30-36,56.
	Zhang Y, Guo H M, Yin W G, et al. Application of SVM image classification technology based on feature extraction in seismic damage identification of buildings by UAV remote sensing[J]. Journal of Catastrophology, 2022, 37(4):30-36,56.
[4]	Nielsen M M. Remote sensing for urban planning and management:The use of window-independent context segmentation to extract urban features in Stockholm[J]. Computers,Environment and Urban Systems, 2015, 52:1-9.
[5]	Karantzalos K, Paragios N. Recognition-driven two-dimensional competing priors toward automatic and accurate building detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2009, 47(1):133-144.
[6]	Huang X, Zhang L. An SVM ensemble approach combining spectral,structural,and semantic features for the classification of high-resolution remotely sensed imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2013, 51(1):257-272.
[7]	Belgiu M, Drǎgut L. Comparing supervised and unsupervised multiresolution segmentation approaches for extracting buildings from very high resolution imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 96:67-75. pmid: 25284960
[8]	Wang Y, Wang C, Zhang H. Integrating H-A-α with fully convolutional networks for fully PolSAR classification[C]// 2017 International Workshop on Remote Sensing with Intelligent Processing (RSIP). May 18-21,2017,Shanghai,China.IEEE, 2017:1-4.
[9]	Ronneberger O, Fischer P, Brox T. U-net:Convolutional networks for biomedical image segmentation[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2015:234-241.
[10]	Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2018:833-851.
[11]	Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 15-20,2019,Long Beach,CA,USA.IEEE, 2019:3141-3149.
[12]	金澍, 关沫, 边玉婵, 等. 基于改进U-Net的遥感影像建筑物提取方法[J]. 激光与光电子学进展, 2023, 60(4):3788/LOP213004.
	Jin S, Guan M, Bian Y C, et al. Building extraction from remote sensing images based on improved U-Net[J]. Laser & Optoelectronics Progress, 2023, 60(4):3788/LOP213004.
[13]	Yu M, Chen X, Zhang W, et al. AGs-unet:Building extraction model for high resolution remote sensing images based on attention gates U network[J]. Sensors, 2022, 22(8):2932.
[14]	He X, Zhou Y, Zhao J, et al. Swin transformer embedding UNet for remote sensing image semantic segmentation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:4408715.
[15]	Liu Z, Lin Y, Cao Y, et al. Swin transformer:Hierarchical vision transformer using shifted windows[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). October 10-17,2021,Montreal,QC,Canada.IEEE, 2021:9992-10002.
[16]	Wang L, Li R, Wang D, et al. Transformer meets convolution:A bilateral awareness network for semantic segmentation of very fine resolution urban scene images[J]. Remote Sensing, 2021, 13(16):3065.
[17]	Chen K, Zou Z, Shi Z. Building extraction from remote sensing images with sparse token transformers[J]. Remote Sensing, 2021, 13(21):4441.
[18]	Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). June 15-20,2019,Long Beach,CA,USA.IEEE, 2019:5686-5696.
[19]	Woo S, Park J, Lee J Y, et al. CBAM:Convolutional block attention module[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2018:3-19.
[20]	Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. June 18-23,2018,Salt Lake City,UT,USA.IEEE, 2018:7132-7141.
[21]	Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks[J]. Advances in Neural Information Processing Systems, 2015,28.
[22]	Ji S, Wei S, Lu M. Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(1):574-586.
[23]	Maggiori E, Tarabalka Y, Charpiat G, et al. Can semantic labeling methods generalize to any city? The inria aerial image labeling benchmark[C]// 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). July 23-28,2017,Fort Worth,TX,USA.IEEE, 2017:3226-3229.
[24]	Zhou Y, Chen Z, Wang B, et al. BOMSC-net:Boundary optimization and multi-scale context awareness based building extraction from high-resolution remote sensing imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:5618617.
[25]	Fang L, Zhang L, Nie D, et al. Automatic brain labeling via multi-atlas guided fully convolutional networks[J]. Medical Image Analysis, 2019, 51:157-168. doi: S1361-8415(18)30860-0 pmid: 30447544
[26]	Zhang H, Liao Y, Yang H, et al. A local-global dual-stream network for building extraction from very-high-resolution remote sensing images[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(3):1269-1283.
[27]	Guo H, Du B, Zhang L, et al. A coarse-to-fine boundary refinement network for building footprint extraction from remote sensing imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 183:240-252.
[28]	Liu Z, Shi Q, Ou J. LCS:A collaborative optimization framework of vector extraction and semantic segmentation for building extraction[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60:5632615.

[1]	郑宗生, 王政翰, 王振华, 卢鹏, 高萌, 霍志俊. 改进3D-Octave卷积的高光谱图像分类方法[J]. 自然资源遥感, 2024, 36(4): 82-91.
[2]	潘俊杰, 慎利, 鄢薪, 聂欣, 董宽林. 一种基于对抗学习的高分辨率遥感影像语义分割无监督域自适应方法[J]. 自然资源遥感, 2024, 36(4): 149-157.
[3]	李世琦, 姚国清. 基于CNN与SETR的特征融合滑坡体检测[J]. 自然资源遥感, 2024, 36(4): 158-164.
[4]	赵金玲, 黄健, 梁梓君, 赵学丹, 靳涛, 葛行行, 魏晓燕, 邵远征. 基于BDANet的地震灾害建筑物损毁评估[J]. 自然资源遥感, 2024, 36(4): 193-200.
[5]	苏腾飞. 深度卷积语义分割网络在农田遥感影像分类中的对比研究——以河套灌区为例[J]. 自然资源遥感, 2024, 36(4): 210-217.
[6]	罗维, 李修华, 覃火娟, 张木清, 王泽平, 蒋柱辉. 基于多源卫星遥感影像的广西中南部地区甘蔗识别及产量预测[J]. 自然资源遥感, 2024, 36(3): 248-258.
[7]	白石, 唐攀攀, 苗朝, 金彩凤, 赵博, 万昊明. 基于高分辨率遥感影像和改进U-Net模型的滑坡提取——以汶川地区为例[J]. 自然资源遥感, 2024, 36(3): 96-107.
[8]	李婉悦, 娄德波, 王成辉, 刘欢, 张长青, 范莹琳, 杜晓川. 基于改进U-Net网络的花岗伟晶岩信息提取方法[J]. 自然资源遥感, 2024, 36(2): 89-96.
[9]	邓丁柱. 基于深度学习的多源卫星遥感影像云检测方法[J]. 自然资源遥感, 2023, 35(4): 9-16.
[10]	陈笛, 彭秋志, 黄培依, 刘雅璇. 采用注意力机制与改进YOLOv5的光伏用地检测[J]. 自然资源遥感, 2023, 35(4): 90-95.
[11]	刘立, 董先敏, 刘娟. 顾及地学特征的遥感影像语义分割模型性能评价方法[J]. 自然资源遥感, 2023, 35(3): 80-87.
[12]	牛祥华, 黄微, 黄睿, 蒋斯立. 基于注意力特征融合的高保真遥感图像薄云去除[J]. 自然资源遥感, 2023, 35(3): 116-123.
[13]	林佳惠, 刘广, 范景辉, 赵红丽, 白世彪, 潘宏宇. 联合改进U-Net模型和D-InSAR技术采矿沉陷提取方法[J]. 自然资源遥感, 2023, 35(3): 145-152.
[14]	郑宗生, 刘海霞, 王振华, 卢鹏, 沈绪坤, 唐鹏飞. 改进3D-CNN的高光谱图像地物分类方法[J]. 自然资源遥感, 2023, 35(2): 105-111.
[15]	金远航, 徐茂林, 郑佳媛. 基于改进YOLOv4-tiny的无人机影像枯死树木检测算法[J]. 自然资源遥感, 2023, 35(1): 90-98.

Viewed

Full text

Abstract

Cited

Shared

Discussed