Information extraction of roads from remote sensing images using CNN combined with Transformer

doi:10.6046/zrzyyg.2023237

Abstract
Figures/Tables
References
Related Articles
Metrics

Download: PDF(4192 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Deep learning-based methods for information extraction of roads from high-resolution remote sensing images face challenges in extracting information about both global context and edge details. This study proposed a cascaded neural network for road segmentation in remote sensing images, allowing both types of information to be simultaneously learned. First, the input feature images were sent to encoders CNN and Transformer. Then, the characteristics learned by both branch encoders were effectively combined using the shuffle attention dual branch fusion (SA-DBF) module, thus achieving the fusion of global and local information. Using the SA-DBF module, the model of the features learned from both branches was established through fine-grained interaction, during which channel and spatial information in the feature images were efficiently extracted and invalid noise was suppressed using multiple attention mechanisms. The proposed network was evaluated using the Massachusetts Road dataset, yielding an overall accuracy rate (OA) of 98.04%, an intersection over union (IoU) of 88.03%, and an F1 score of 65.13%. Compared to that of mainstream methodsU-Net and TransRoadNet, the IoU of the proposed network increased by 2.01 and 1.42 percentage points, respectively. Experimental results indicate that the proposed method outperforms all the methods compared and can effectively improve the accuracy of road segmentation.

Keywords cascaded neural network Transformer feature fusion attention mechanism

ZTFLH:

TP79

Issue Date: 17 February 2025

	Service

	E-mail this article
	E-mail Alert
	RSS
	Articles by authors

	Haicheng QU
	Ying WANG
	Lamei LIU
	Ming HAO

Cite this article:

Haicheng QU,Ying WANG,Lamei LIU, et al. Information extraction of roads from remote sensing images using CNN combined with Transformer[J]. Remote Sensing for Natural Resources, 2025, 37(1): 38-45.

URL:

https://www.gtzyyg.com/EN/10.6046/zrzyyg.2023237 OR https://www.gtzyyg.com/EN/Y2025/V37/I1/38

Fig.1 Overall structure of the model in this paper

Fig.2 SA- DBF module structure diagram

Fig.3 SA module and CA module structure diagram

Fig.4 Change Trend of Experimental Loss

Tab.1 Comparison results of different modules (%)

Tab.2 Performance comparison of different attention modules

Tab.3 Influence of Transformer scale on the model (%)

Tab.4 Experimental comparison results of different models

Tab.5 Experimental comparison results of different networks

[1]	He D, Zhong Y, Wang X, et al. Deep convolutional neural network framework for subpixel mapping[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(11):9518-9539.
[2]	Huang B, Zhao B, Song Y. Urban land-use mapping using a deep convolutional neural network with high spatial resolution multispectral remote sensing imagery[J]. Remote Sensing of Environment, 2018,214:73-86.
[3]	Xu Y, Chen H, Du C, et al. MSACon Mining spatial attention-based contextual information for road extraction[J]. IEEE Transactions on Geoscience and Remote Sensing, 1809,60:5604317.
[4]	Yuan Q, Shen H, Li T, et al. Deep learning in environmental remote sensing achievements and challenges[J]. Remote Sensing of Environment an Interdisciplinary Journal, 2020,241:111716.
[5]	Zhu Q, Zhang Y, Wang L, et al. A global context-aware and batch-independent network for road extraction from VHR satellite imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021,175:353-365.
[6]	Yang K, Yi J, Chen A, et al. ConDinet++:Full-scale fusion network based on conditional dilated convolution to extract roads from remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2021,19:8015105.
[7]	He D, Shi Q, Liu X, et al. Generating 2m fine-scale urban tree cover product over 34 metropolises in China based on deep context-aware sub-pixel mapping network[J]. International Journal of Applied Earth Observation and Geoinformation, 2022,106:102667.
[8]	Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation[C]// IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE,2017:640-651.
[9]	Ronneberger O, Fischer P, Brox T. U-net convolutional networks for biomedical image segmentation[C]// IEEE Springer International 2015:234-241.
[10]	Badrinarayanan V, Kendall A, Cipolla R. SegNet:A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495. doi: 10.1109/TPAMI.2016.2644615 pmid: 28060704
[11]	Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[M]// Computer Vision-ECCV 2018.Cham Springer International Publishing,2018:833-851.
[12]	Gao L, Song W, Dai J, et al. Road extraction from high-resolution remote sensing imagery using refined deep residual convolutional neural network[J]. Remote Sensing, 2019, 11(5):552.
[13]	王勇, 曾祥强. 集成注意力机制和扩张卷积的道路提取模型[J]. 中国图象图形学报, 2022, 27(10):3102-3115.
[13]	Wang Y, Zeng X Q. Road extraction model derived from integrated attention mechanism and dilated convolution[J]. Journal of Image and Graphics, 2022, 27(10):3102-3115.
[14]	吴强强, 王帅, 王彪, 等. 空间信息感知语义分割模型的高分辨率遥感影像道路提取[J]. 遥感学报, 2022, 26(9):1872-1885.
[14]	Wu Q Q, Wang S, Wang B, et al. Road extraction method of high-resolution remote sensing image on the basis of the spatial information perception semantic segmentation model[J]. National RemoteSensing Bulletin, 2022, 26(9):1872-1885.
[15]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, Long Beach.ACM,2017:6000-6010.
[16]	Sanchis-Agudo M, Wang Y, Duraisamy K, et al. Easy attention:A simple self-attention mechanism for Transformers[J/OL]. 2023:arXiv:2308.12874.http //arxiv.org/abs/2308.12874. url: http //arxiv.org/abs/2308.12874
[17]	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words:Transformers for image recognition at scale[J/OL].2020:arXiv:2010.11929.http //arxiv.org/abs/2010.11929. url: http //arxiv.org/abs/2010.11929
[18]	Yang Z, Zhou D, Yang Y, et al. TransRoadNet:A novel road extraction method for remote sensing images via combining high-level semantic feature and context[J]. IEEE Geoscience and Remote Sensing Letters, 1973,19:6509505.
[19]	Dai Z, Liu H, Le Q V, et al. CoAtNet:Marrying convolution and attention for all data sizes[J/OL]. 2021:arXiv:2106.04803.http //arxiv.org/abs/2106.04803. url: http //arxiv.org/abs/2106.04803
[20]	Cao Y, Xu J, Lin S, et al.GCNet:Non-local networks meet squeeze-excitation networks and beyond[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).Seoul,Korea (South). IEEE,2019:1971-1980.
[21]	Woo S, Park J, Lee J Y, et al. CBAM:Convolutional block attention module[M]// Computer Vision-ECCV 2018.Cham: Springer International Publishing,2018:3-19.
[22]	Su R, Huang W, Ma H, et al. SGE NET:Video object detection with squeezed GRU and information entropy map[C]//2021 IEEE International Conference on Image Processing (ICIP).Anchorage,AK,USA.IEEE,2021:689-693.
[23]	Wang Q, Wu B, Zhu P, et al.ECA-net:Efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle.IEEE,2020:11531-11539.
[24]	Zhou L, Zhang C, Wu M.D-LinkNet:LinkNet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Salt Lake City.IEEE,2018:192-1924.

[1]	LIU Chenchen, GE Xiaosan, WU Yongbin, YU Haikun, ZHANG Beibei. A method for information extraction of buildings from remote sensing images based on hybrid attention mechanism and Deeplabv3+[J]. Remote Sensing for Natural Resources, 2025, 37(1): 31-37.
[2]	QU Haicheng, LIANG Xu. Building extraction from high-resolution images using a hybrid attention mechanism combined with multi-scale feature enhancement[J]. Remote Sensing for Natural Resources, 2024, 36(4): 107-116.
[3]	ZHENG Zongsheng, WANG Zhenghan, WANG Zhenhua, LU Peng, GAO Meng, HUO Zhijun. An improved 3D Octave convolution-based method for hyperspectral image classification[J]. Remote Sensing for Natural Resources, 2024, 36(4): 82-91.
[4]	LI Wei, FAN Yanguo, ZHOU Peixi. Improved Transformer-based hyperspectral image classification method for surface features: A case study of the Yellow River Delta[J]. Remote Sensing for Natural Resources, 2024, 36(3): 137-145.
[5]	DENG Dingzhu. Deep learning-based cloud detection method for multi-source satellite remote sensing images[J]. Remote Sensing for Natural Resources, 2023, 35(4): 9-16.
[6]	CHEN Di, PENG Qiuzhi, HUANG Peiyi, LIU Yaxuan. Detecting land for photovoltaic development based on the attention mechanism and improved YOLOv5[J]. Remote Sensing for Natural Resources, 2023, 35(4): 90-95.
[7]	NIU Xianghua, HUANG Wei, HUANG Rui, JIANG Sili. A high-fidelity method for thin cloud removal from remote sensing images based on attentional feature fusion[J]. Remote Sensing for Natural Resources, 2023, 35(3): 116-123.
[8]	JIANG Zhuoran, ZHOU Xinxin, CAO Wei, WANG Yahua, WU Changbin. Intelligent detection of crab ponds using remote sensing images based on a cooperative interpretation mechanism[J]. Remote Sensing for Natural Resources, 2023, 35(3): 25-34.
[9]	ZHENG Zongsheng, LIU Haixia, WANG Zhenhua, LU Peng, SHEN Xukun, TANG Pengfei. Improved 3D-CNN-based method for surface feature classification using hyperspectral images[J]. Remote Sensing for Natural Resources, 2023, 35(2): 105-111.
[10]	JIN Yuanhang, XU Maolin, ZHENG Jiayuan. A dead tree detection algorithm based on improved YOLOv4-tiny for UAV images[J]. Remote Sensing for Natural Resources, 2023, 35(1): 90-98.
[11]	SHEN Jun’ao, MA Mengting, SONG Zhiyuan, LIU Tingzhou, ZHANG Wei. Water information extraction from high-resolution remote sensing images using the deep-learning based semantic segmentation model[J]. Remote Sensing for Natural Resources, 2022, 34(4): 129-135.
[12]	ZHANG Pengqiang, GAO Kuiliang, LIU Bing, TAN Xiong. Classification of hyperspectral images based on deep Transformer network combined with spatial-spectral information[J]. Remote Sensing for Natural Resources, 2022, 34(3): 27-32.
[13]	WANG Yiru, WANG Guanghui, YANG Huachao, LIU Huijie. A method for color consistency of remote sensing images based on generative adversarial networks[J]. Remote Sensing for Natural Resources, 2022, 34(3): 65-72.
[14]	KONG Ailing, ZHANG Chengming, LI Feng, HAN Yingjuan, SUN Huanying, DU Manfei. Knowledge-based remote sensing image fusion method[J]. Remote Sensing for Natural Resources, 2022, 34(2): 47-55.
[15]	LIU Guangjin, WANG Guanghui, BI Weihua, LIU Huijie, YANG Huachao. Cloud detection algorithm of remote sensing image based on DenseNet and attention mechanism[J]. Remote Sensing for Natural Resources, 2022, 34(2): 88-96.

Viewed

Full text

Abstract

Cited

Shared

Discussed