结合上下文与类别感知特征融合的高分遥感图像语义分割

何晓军; 罗杰

doi:10.6046/zrzyyg.2023312

结合上下文与类别感知特征融合的高分遥感图像语义分割

何晓军,
罗杰

Semantic segmentation of high-resolution remote sensing images based on context- and class-aware feature fusion

HE Xiaojun,
LUO Jie

摘要

摘要: 为了解决遥感图像语义分割任务中上下文依赖关系提取不足、空间细节信息损失导致分割精度下降等问题,提出了一种结合上下文与类别感知特征融合的语义分割方法。该方法首先以ResNet-50作为特征提取的主干网络,并在下采样中采用注意力模块,以增强特征表示和上下文依赖关系的提取; 然后在跳跃连接上构建大尺寸的感受野块,提取丰富的多尺度上下文信息,以减少目标之间尺度变化的影响; 其后并联场景特征关联融合模块,以全局特征来引导局部特征融合; 最后在解码器部分构建类别预测模块和类别感知特征融合模块,准确融合底层的高级语义信息与高层的细节信息。将所提方法在Potsdam和Vaihingen数据集上验证可行性,并与DeepLabv3+,BuildFormer等6种常用方法进行对比实验,以验证其先进性。实验结果表明,所提方法在Recall,F1-score和Accuracy指标上均优于其他方法,尤其是对建筑物分割的交并比(intersection over union,IoU)在2个数据集上分别达到90.44%和86.74%,较次优网络DeepLabv3+和A²FPN分别提升了1.55%和2.41%。

Abstract: To address the accuracy reduction in the semantic segmentation of remote sensing images due to insufficient extraction of contextual dependencies and loss of spatial details, this study proposed a semantic segmentation method based on context- and class-aware feature fusion. With ResNet-50 as the backbone network for feature extraction, the proposed method incorporates the attention module during downsampling to enhance feature representation and contextual dependency extraction. It constructs a large receptive field block on skip connections to extract rich multiscale contextual information, thereby mitigating the impacts of scale variations between targets. Furthermore, it connects a scene feature association and fusion module in parallel behind the block to guide local feature fusion based on global features. Finally, it constructs a class prediction module and a class-aware feature fusion module in the decoder part to accurately fuse the low-level advanced semantic information with high-level detailed information. The proposed method was validated on the Potsdam and Vaihingen datasets and compared with six commonly used methods, including DeepLabv3+ and BuildFormer, to verify its effectiveness. Experimental results demonstrate that the proposed method outperformed other methods in terms of recall, F1-score, and accuracy. Particularly, it yielded intersection over union (IoU) values of 90.44% and 86.74% for building segmentation, achieving improvements of 1.55% and 2.41%, respectively, compared to suboptimal networks DeepLabv3+ and A2FPN.

HTML全文

参考文献(27)

施引文献

资源附件(0)