基于全色-多光谱双流卷积网络的端到端地物分类方法

李英龙; 邓毓弸; 孔赟珑; 陈静波; 孟瑜; 刘帝佑

doi:10.6046/zrzyyg.2024208

基于全色-多光谱双流卷积网络的端到端地物分类方法

End-to-end land cover classification based on panchromatic-multispectral dual-stream convolutional network

摘要

摘要: 多光谱（multispectral，MS）影像和全色（panchromatic，PAN）影像是可见-近红外光学遥感影像的主要数据源。在典型的地物分类处理流程中，通常采用像素级融合方法来提高MS影像的空间分辨率，然后再进行影像分类。然而，像素级融合过程通常耗时较长且和地物分类的优化目标不匹配，已无法满足端到端遥感影像分类的需求。为了应对这些挑战，文章提出一种无需进行像素级融合的双流全卷积神经网络DSEUNet。该方法基于EfficientNet-B3网络构建2个分支，分别提取PAN影像和MS影像的特征并进行特征级融合，最后解码输出分类结果。考虑到PAN影像和MS影像表达地物要素的特征侧重点不同，文章在全色分支加入空间注意力机制以提高对细节、边缘等空间信息的感知能力，在多光谱分支加入通道注意力机制以提高对多波段反射率差异的感知能力。10 m地表覆盖数据集生产实验和网络结构消融实验表明，该文提出的网络具有更高的分类精度和更快的推理速度，在保持骨干网络相同的前提下，DSEUNet与传统对像素级融合影像分类的方法相比，分类精度的mIoU提升1.62百分点，mFscore提升1.36百分点，Kappa系数提升1.49百分点，推理速度提升17.69%。

Abstract: Multispectral （MS） and panchromatic （PAN） images serve as primary data sources for visible-near-infrared optical remote sensing imagery. In a typical land cover classification workflow，the spatial resolution of MS images is generally enhanced using pixel-level fusion methods，followed by image classification. However，the pixel-level fusion process is characterized by considerable time consumption and inconsistency with the optimization objectives of land cover classification，failing to meet the demand for end-to-end remote sensing image classification. To address these challenges，this paper proposed a dual-stream fully convolutional neural network，DSEUNet，which obviates the need for pixel-level fusion. Specifically，two branches were constructed based on the EfficientNet-B3 network to extract features from PAN and MS images，respectively. It was followed by feature-level fusion and decoding，thus outputting the ultimate classification results. Considering that PAN and MS images focus on different features of land cover elements，a spatial attention mechanism was incorporated in the PAN branch to enhance the perception of spatial information，such as details and edges. Moreover，a channel attention mechanism was incorporated in the MS branch to improve the perception of reflectance differences across multiple bands. Experiments on the 10-meter land cover dataset and ablation studies of the network structure demonstrate that the proposed network exhibited higher classification accuracy and faster inference speed. With the same backbone network，DSEUNet outperformed traditional pixel-level fusion-based classification methods，with an increase of 1.62 percentage points in mIoU，1.36 percentage points in mFscore，and 1.49 percentage points in Kappa coefficient，as well as a 17.69% improvement in inference speed.

HTML全文

参考文献(34)

施引文献

资源附件(0)