融合CNN与Transformer的遥感影像道路信息提取

doi:10.6046/zrzyyg.2023237

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(4192 KB)

HTML
输出: BibTeX | EndNote (RIS)

摘要

利用高分辨率遥感影像进行道路信息提取时,深度神经网络很难同时学习影像全局上下文信息和边缘细节信息,为此,该文提出了一种同时学习全局语义信息和局部空间细节的级联神经网络。首先将输入的特征图分别送入到双分支编码器卷积神经网络(convolutional neural networks,CNN)和Transformer中,然后,采用了双分支融合模块(shuffle attention dual branch fusion block,SA-DBF)来有效地结合这2个分支学习到的特征,从而实现全局信息与局部信息的融合。其中,双分支融合模块通过细粒度交互对这2个分支的特征进行建模,同时利用多重注意力机制充分提取特征图的通道和空间信息,并抑制掉无效的噪声信息。在公共数据集Massachusetts道路数据集上对模型进行测试,准确率(overall accuracy,OA)、交并比(intersection over union,IoU)和F₁等评价指标分别达到98.04%,88.03%和65.13%; 与主流方法U-Net和TransRoadNet等进行比较,IoU分别提升了2.01个百分点和1.42个百分点,实验结果表明所提出的方法优于其他的比较方法,能够有效提高道路分割的精确度。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	曲海成
	王莹
	刘腊梅
	郝明

关键词 ：级联神经网络, Transformer, 特征融合, 注意力机制

Abstract：

Deep learning-based methods for information extraction of roads from high-resolution remote sensing images face challenges in extracting information about both global context and edge details. This study proposed a cascaded neural network for road segmentation in remote sensing images, allowing both types of information to be simultaneously learned. First, the input feature images were sent to encoders CNN and Transformer. Then, the characteristics learned by both branch encoders were effectively combined using the shuffle attention dual branch fusion (SA-DBF) module, thus achieving the fusion of global and local information. Using the SA-DBF module, the model of the features learned from both branches was established through fine-grained interaction, during which channel and spatial information in the feature images were efficiently extracted and invalid noise was suppressed using multiple attention mechanisms. The proposed network was evaluated using the Massachusetts Road dataset, yielding an overall accuracy rate (OA) of 98.04%, an intersection over union (IoU) of 88.03%, and an F1 score of 65.13%. Compared to that of mainstream methodsU-Net and TransRoadNet, the IoU of the proposed network increased by 2.01 and 1.42 percentage points, respectively. Experimental results indicate that the proposed method outperforms all the methods compared and can effectively improve the accuracy of road segmentation.

Key words： cascaded neural network Transformer feature fusion attention mechanism

收稿日期: 2023-08-02 出版日期: 2025-02-17

ZTFLH:

TP79

基金资助:国家自然科学基金面上项目“面向数据特性保持的高光谱影像高效压缩方法研究”(42271409);辽宁省高等学校基本科研项目“基于全脉冲混合神经网络的高效能目标检测”(LIKMZ20220699)

通讯作者: 王莹(1998-),女,硕士研究生,主要研究方向为数字图像处理与模式识别。Email: lntuwangying@163.com。

作者简介: 曲海成(1981-),男,博士,副教授,主要研究方向为遥感图像高性能计算、智能大数据处理等。Email: quhaicheng@lntu.edu.cn。

引用本文:

曲海成, 王莹, 刘腊梅, 郝明. 融合CNN与Transformer的遥感影像道路信息提取[J]. 自然资源遥感, 2025, 37(1): 38-45.
QU Haicheng, WANG Ying, LIU Lamei, HAO Ming. Information extraction of roads from remote sensing images using CNN combined with Transformer. Remote Sensing for Natural Resources, 2025, 37(1): 38-45.

链接本文:

https://www.gtzyyg.com/CN/10.6046/zrzyyg.2023237 或 https://www.gtzyyg.com/CN/Y2025/V37/I1/38

Fig.1 本文模型整体结构

Fig.2 SA- DBF模块结构图

Fig.3 空间注意力模块和通道注意力模块结构图

Fig.4 实验损失率变化趋势

Tab.1 不同模块消融实验的对比结果

Tab.2 不同注意力模块性能对比

Tab.3 Transformer规模对模型的影响结果

Tab.4 不同模型的实验对比结果

Tab.5 不同网络的实验对比结果

[1]	He D, Zhong Y, Wang X, et al. Deep convolutional neural network framework for subpixel mapping[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(11):9518-9539.
[2]	Huang B, Zhao B, Song Y. Urban land-use mapping using a deep convolutional neural network with high spatial resolution multispectral remote sensing imagery[J]. Remote Sensing of Environment, 2018,214:73-86.
[3]	Xu Y, Chen H, Du C, et al. MSACon Mining spatial attention-based contextual information for road extraction[J]. IEEE Transactions on Geoscience and Remote Sensing, 1809,60:5604317.
[4]	Yuan Q, Shen H, Li T, et al. Deep learning in environmental remote sensing achievements and challenges[J]. Remote Sensing of Environment an Interdisciplinary Journal, 2020,241:111716.
[5]	Zhu Q, Zhang Y, Wang L, et al. A global context-aware and batch-independent network for road extraction from VHR satellite imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021,175:353-365.
[6]	Yang K, Yi J, Chen A, et al. ConDinet++:Full-scale fusion network based on conditional dilated convolution to extract roads from remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2021,19:8015105.
[7]	He D, Shi Q, Liu X, et al. Generating 2m fine-scale urban tree cover product over 34 metropolises in China based on deep context-aware sub-pixel mapping network[J]. International Journal of Applied Earth Observation and Geoinformation, 2022,106:102667.
[8]	Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation[C]// IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE,2017:640-651.
[9]	Ronneberger O, Fischer P, Brox T. U-net convolutional networks for biomedical image segmentation[C]// IEEE Springer International 2015:234-241.
[10]	Badrinarayanan V, Kendall A, Cipolla R. SegNet:A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495. doi: 10.1109/TPAMI.2016.2644615 pmid: 28060704
[11]	Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[M]// Computer Vision-ECCV 2018.Cham Springer International Publishing,2018:833-851.
[12]	Gao L, Song W, Dai J, et al. Road extraction from high-resolution remote sensing imagery using refined deep residual convolutional neural network[J]. Remote Sensing, 2019, 11(5):552.
[13]	王勇, 曾祥强. 集成注意力机制和扩张卷积的道路提取模型[J]. 中国图象图形学报, 2022, 27(10):3102-3115.
	Wang Y, Zeng X Q. Road extraction model derived from integrated attention mechanism and dilated convolution[J]. Journal of Image and Graphics, 2022, 27(10):3102-3115.
[14]	吴强强, 王帅, 王彪, 等. 空间信息感知语义分割模型的高分辨率遥感影像道路提取[J]. 遥感学报, 2022, 26(9):1872-1885.
	Wu Q Q, Wang S, Wang B, et al. Road extraction method of high-resolution remote sensing image on the basis of the spatial information perception semantic segmentation model[J]. National RemoteSensing Bulletin, 2022, 26(9):1872-1885.
[15]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, Long Beach.ACM,2017:6000-6010.
[16]	Sanchis-Agudo M, Wang Y, Duraisamy K, et al. Easy attention:A simple self-attention mechanism for Transformers[J/OL]. 2023:arXiv:2308.12874.http //arxiv.org/abs/2308.12874.
[17]	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words:Transformers for image recognition at scale[J/OL].2020:arXiv:2010.11929.http //arxiv.org/abs/2010.11929.
[18]	Yang Z, Zhou D, Yang Y, et al. TransRoadNet:A novel road extraction method for remote sensing images via combining high-level semantic feature and context[J]. IEEE Geoscience and Remote Sensing Letters, 1973,19:6509505.
[19]	Dai Z, Liu H, Le Q V, et al. CoAtNet:Marrying convolution and attention for all data sizes[J/OL]. 2021:arXiv:2106.04803.http //arxiv.org/abs/2106.04803.
[20]	Cao Y, Xu J, Lin S, et al.GCNet:Non-local networks meet squeeze-excitation networks and beyond[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).Seoul,Korea (South). IEEE,2019:1971-1980.
[21]	Woo S, Park J, Lee J Y, et al. CBAM:Convolutional block attention module[M]// Computer Vision-ECCV 2018.Cham: Springer International Publishing,2018:3-19.
[22]	Su R, Huang W, Ma H, et al. SGE NET:Video object detection with squeezed GRU and information entropy map[C]//2021 IEEE International Conference on Image Processing (ICIP).Anchorage,AK,USA.IEEE,2021:689-693.
[23]	Wang Q, Wu B, Zhu P, et al.ECA-net:Efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle.IEEE,2020:11531-11539.
[24]	Zhou L, Zhang C, Wu M.D-LinkNet:LinkNet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Salt Lake City.IEEE,2018:192-1924.

[1]	刘晨晨, 葛小三, 武永斌, 余海坤, 张蓓蓓. 基于混合注意力机制和Deeplabv3+的遥感影像建筑物提取方法[J]. 自然资源遥感, 2025, 37(1): 31-37.
[2]	郑宗生, 王政翰, 王振华, 卢鹏, 高萌, 霍志俊. 改进3D-Octave卷积的高光谱图像分类方法[J]. 自然资源遥感, 2024, 36(4): 82-91.
[3]	曲海成, 梁旭. 融合混合注意力机制与多尺度特征增强的高分影像建筑物提取[J]. 自然资源遥感, 2024, 36(4): 107-116.
[4]	李薇, 樊彦国, 周培希. 改进Transformer的高光谱图像地物分类方法——以黄河三角洲为例[J]. 自然资源遥感, 2024, 36(3): 137-145.
[5]	邓丁柱. 基于深度学习的多源卫星遥感影像云检测方法[J]. 自然资源遥感, 2023, 35(4): 9-16.
[6]	陈笛, 彭秋志, 黄培依, 刘雅璇. 采用注意力机制与改进YOLOv5的光伏用地检测[J]. 自然资源遥感, 2023, 35(4): 90-95.
[7]	牛祥华, 黄微, 黄睿, 蒋斯立. 基于注意力特征融合的高保真遥感图像薄云去除[J]. 自然资源遥感, 2023, 35(3): 116-123.
[8]	蒋卓然, 周鑫鑫, 曹伟, 王亚华, 吴长彬. 基于协同判读机制的养殖蟹塘遥感智能检测方法[J]. 自然资源遥感, 2023, 35(3): 25-34.
[9]	郑宗生, 刘海霞, 王振华, 卢鹏, 沈绪坤, 唐鹏飞. 改进3D-CNN的高光谱图像地物分类方法[J]. 自然资源遥感, 2023, 35(2): 105-111.
[10]	金远航, 徐茂林, 郑佳媛. 基于改进YOLOv4-tiny的无人机影像枯死树木检测算法[J]. 自然资源遥感, 2023, 35(1): 90-98.
[11]	沈骏翱, 马梦婷, 宋致远, 柳汀洲, 张微. 基于深度学习语义分割模型的高分辨率遥感图像水体提取[J]. 自然资源遥感, 2022, 34(4): 129-135.
[12]	张鹏强, 高奎亮, 刘冰, 谭熊. 联合空谱信息的高光谱影像深度Transformer网络分类[J]. 自然资源遥感, 2022, 34(3): 27-32.
[13]	王艺儒, 王光辉, 杨化超, 刘慧杰. 基于生成对抗网络的遥感影像色彩一致性方法[J]. 自然资源遥感, 2022, 34(3): 65-72.
[14]	孔爱玲, 张承明, 李峰, 韩颖娟, 孙焕英, 杜漫飞. 基于知识引导的遥感影像融合方法[J]. 自然资源遥感, 2022, 34(2): 47-55.
[15]	刘广进, 王光辉, 毕卫华, 刘慧杰, 杨化超. 基于DenseNet与注意力机制的遥感影像云检测算法[J]. 自然资源遥感, 2022, 34(2): 88-96.

Viewed

Full text

Abstract

Cited

Shared

Discussed