|
|
|
Abstract Multispectral (MS) and panchromatic (PAN) images serve as primary data sources for visible-near-infrared optical remote sensing imagery. In a typical land cover classification workflow,the spatial resolution of MS images is generally enhanced using pixel-level fusion methods,followed by image classification. However,the pixel-level fusion process is characterized by considerable time consumption and inconsistency with the optimization objectives of land cover classification,failing to meet the demand for end-to-end remote sensing image classification. To address these challenges,this paper proposed a dual-stream fully convolutional neural network,DSEUNet,which obviates the need for pixel-level fusion. Specifically,two branches were constructed based on the EfficientNet-B3 network to extract features from PAN and MS images,respectively. It was followed by feature-level fusion and decoding,thus outputting the ultimate classification results. Considering that PAN and MS images focus on different features of land cover elements,a spatial attention mechanism was incorporated in the PAN branch to enhance the perception of spatial information,such as details and edges. Moreover,a channel attention mechanism was incorporated in the MS branch to improve the perception of reflectance differences across multiple bands. Experiments on the 10-meter land cover dataset and ablation studies of the network structure demonstrate that the proposed network exhibited higher classification accuracy and faster inference speed. With the same backbone network,DSEUNet outperformed traditional pixel-level fusion-based classification methods,with an increase of 1.62 percentage points in mIoU,1.36 percentage points in mFscore,and 1.49 percentage points in Kappa coefficient,as well as a 17.69% improvement in inference speed.
|
| Keywords
land cover classification
deep learning
dual-stream network
panchromatic (PAN) image
multispectral (MS) image
|
|
|
|
Issue Date: 28 October 2025
|
|
|
| [1] |
Munechika C K, Warnick J S, Salvaggio C, et al. Resolution enhancement of multispectral image data to improve classification accuracy[J]. Photogrammetric Engineering and Remote Sensing, 1993, 59:67-72.
|
| [2] |
Jia X, Richards J A. Cluster-space representation for hyperspectral data classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2002, 40(3):593-598.
|
| [3] |
Li S, Kang X, Fang L, et al. Pixel-level image fusion:A survey of the state of the art[J]. Information Fusion, 2017, 33:100-112.
|
| [4] |
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6):84-90.
|
| [5] |
Li X, Xu F, Lyu X, et al. A remote-sensing image pan-sharpening method based on multi-scale channel attention residual network[J]. IEEE Access, 2020, 8:27163-27177.
|
| [6] |
Zhang W, Li J, Hua Z. Attention-based tri-UNet for remote sensing image pan-sharpening[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14:3719-3732.
|
| [7] |
Ghamisi P, Rasti B, Yokoya N, et al. Multisource and multitemporal data fusion in remote sensing:A comprehensive review of the state of the art[J]. IEEE Geoscience and Remote Sensing Magazine, 2019, 7(1):6-39.
doi: 10.1109/MGRS.2018.2890023
|
| [8] |
Moser G, De Giorgi A, Serpico S B. Multiresolution supervised classification of panchromatic and multispectral images by Markov random fields and graph cuts[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(9):5054-5070.
|
| [9] |
Mao T, Tang H, Wu J, et al. A generalized metaphor of Chinese restaurant franchise to fusing both panchromatic and multispectral images for unsupervised classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(8):4594-4604.
|
| [10] |
Zhang L, Zhang L, Du B. Deep learning for remote sensing data:A technical tutorial on the state of the art[J]. IEEE Geoscience and Remote Sensing Magazine, 2016, 4(2):22-40.
|
| [11] |
Liu X, Jiao L, Zhao J, et al. Deep multiple instance learning-based spatial-spectral classification for PAN and MS imagery[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(1):461-473.
|
| [12] |
Gaetano R, Ienco D, Ose K, et al. MRFusion:A Deep Learning architecture to fuse PAN and MS imagery for land cover mapping[J/OL]. arXiv, 2018(2018-06-29) [2024-11-03]. https://arxiv.org/abs/1806.11452v1.
url: https://arxiv.org/abs/1806.11452v1
|
| [13] |
Zhu H, Ma W, Li L, et al. A dual-branch attention fusion deep network for multiresolution remote-sensing image classification[J]. Information Fusion, 2020, 58:116-131.
|
| [14] |
Zhu H, Ma M, Ma W, et al. A spatial-channel progressive fusion ResNet for remote sensing classification[J]. Information Fusion, 2021, 70:72-87.
|
| [15] |
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):640-651.
doi: 10.1109/TPAMI.2016.2572683
pmid: 27244717
|
| [16] |
Ronneberger O, Fischer P, Brox T. U-net:Convolutional networks for biomedical image segmentation[C]// Proceedings of the Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference.Springer,2015:234-241.
|
| [17] |
Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018:7132-7141.
|
| [18] |
Woo S, Park J, Lee J Y, et al. CBAM:Convolutional block attention module[C]// Proceedings of the Proceedings of the European Conference on Computer Vision. Springer,2018:7132-7141.
|
| [19] |
Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019:3141-3149.
|
| [20] |
国家基础地理中心. CH/T 9032—2022 全球地理信息资源数据产品规范[S]. 北京: 中国地图出版社, 2022.
|
| [20] |
National Geomatics Center of China. CH/T 9032—2022 Global geographic information resource data product specification[S]. Beijing: China Cartographic Publishing House, 2022.
|
| [21] |
Tan M, Le Q V. EfficientNet:Rethinking model scaling for convolutional neural networks[J/OL]. arXiv, 2019(2019-05-28). https://arxiv.org/abs/1905.11946v5.
url: https://arxiv.org/abs/1905.11946v5
|
| [22] |
Howard A, Sandler M, Chen B, et al. Searching for MobileNetV3[C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2019:1314-1324.
|
| [23] |
He K, Zhang X, Ren S, et al. Deep residual learning for image re-cognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016:770-778.
|
| [24] |
Chen L C, Papandreou G, Kokkinos I, et al. DeepLab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4):834-848.
|
| [25] |
Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017:6230-6239.
|
| [26] |
Badrinarayanan V, Kendall A, Cipolla R. SegNet:A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495.
doi: 10.1109/TPAMI.2016.2644615
pmid: 28060704
|
| [27] |
Wang J, Sun K, Cheng T, et al. Deep high-resolution representation learning for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10):3349-3364.
|
| [28] |
Wang L, Li R, Zhang C, et al. UNetFormer:A UNet-like transfor-mer for efficient semantic segmentation of remote sensing urban scene imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 190:196-214.
|
| [29] |
Li R, Duan C, Zheng S, et al. MACU-net for semantic segmentation of fine-resolution remotely sensed images[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19:8007205.
|
| [30] |
Li R, Zheng S, Zhang C, et al. ABCNet:Attentive bilateral contextual network for efficient semantic segmentation of fine-resolution remotely sensed imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 181:84-98.
|
| [31] |
Li R, Zheng S, Zhang C, et al. Multiattention network for semantic segmentation of fine-resolution remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60:5607713.
|
| [32] |
Li R, Zheng S, Duan C, et al. Multistage attention ResU-net for semantic segmentation of fine-resolution remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2022, 19:8009205.
|
| [33] |
Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM:Visual explanations from deep networks via gradient-based localization[C]//2017 IEEE International Conference on Computer Vision (ICCV). IEEE,2017:618-626.
|
| [34] |
Sun X, Tian Y, Lu W, et al. From single- to multi-modal remote sensing imagery interpretation:A survey and taxonomy[J]. Science China Information Sciences, 2023, 66(4):140301.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
| |
Shared |
|
|
|
|
| |
Discussed |
|
|
|
|