基于多尺度门控特征融合与动态滑动上采样的遥感影像分割方法

何晓军; 米芮林

doi:10.6046/zrzyyg.2025198

基于多尺度门控特征融合与动态滑动上采样的遥感影像分割方法

A remote sensing image segmentation method based on multi-scale gated feature fusion and dynamic sliding upsampling

摘要

摘要: 针对当前遥感影像分割任务中存在的多尺度特征融合不充分、全局－局部上下文信息交互不足以及传统上采样方法导致的空间细节丢失等问题，提出一种基于多尺度门控特征融合与动态滑动上采样的遥感影像分割方法(multi-scale gated feature fusion and dynamic sliding upsampling Deeplab，MSGFF-DSU-Deeplab)。以DeeplabV3+为基线模型，首先，构建了多尺度全局－局部特征聚合模块(multi scale global local feature aggregation module, MGLFAM)，通过全局混合操作与局部混合操作，结合门控注意力机制，实现多层特征的动态加权融合，显著提升了关键区域的特征表达能力; 其次，设计和实现了门控卷积注意力特征增强模块(gated convolutional attention fusion module，GCAFM)，通过将卷积注意力与门控线性单元相结合，使网络能够自适应聚焦于语义关键区域，有效增强了边缘特征与小尺度目标的表征能力; 最后，提出了动态滑动上采样模块(sliding effective upsampling module，SEUM)，通过深度可分离卷积与定向通道移位策略的协同优化，显著提升了分辨率重建质量，突破了传统线性插值方法在空间细节恢复方面的性能瓶颈。在WHU-Building与Vaihingen这2个高分辨率遥感影像数据集上进行了实验验证，结果表明，所提方法在各种评价指标上均显著优于主流分割方法，其中在WHU-Building数据集上取得90.59%的平均交并比(mean intersection over union, mIoU)，在Vaihingen数据集上达到81.42%的mIoU，较基线模型DeeplabV3+均提升约1百分点，更重要的是在保持计算效率的同时，显著改善了边缘细节的识别精度，对小尺度建筑物和复杂地物边界的分割效果尤为突出。所提方法不仅提升了遥感影像的分割性能，还增强了模型对不同场景的适应能力，为高精度遥感影像分析提供了新的技术思路。

Abstract: Currently, remote sensing image segmentation faces a range of challenges, including insufficient multi-scale feature fusion, limited interactions between global and local contextual information, and loss of spatial details caused by conventional upsampling. To address these issues, this study proposed a remote sensing image segmentation method based on multi-scale gated feature fusion (MSGFF) and dynamic sliding upsampling (DSU) under the Deeplab architecture: the MSGFF-DSU-Deeplab method. First, with DeeplabV3+ as a baseline model, a multi-scale global-local feature aggregation module (MGLFAM) was developed. Through global and local mixing operations and combining a gated attention mechanism, the MGLFAM enabled dynamic weighted fusion of multi-layer features, significantly improving the representation of key regions. Second, a gated convolutional attention fusion module (GCAFM) was designed and implemented. By integrating the convolutional attention mechanism with gated linear units, the GCAFM enabled the network to adaptively focus on key semantic regions, effectively enhancing the representation of edge features and small-scale objects. Third, a dynamic sliding effective upsampling module (SEUM) was proposed. Based on the collaborative optimization of depthwise separable convolution and the directional channel shifting strategy, the SEUM significantly enhanced the resolution reconstruction quality, overcoming the performance bottlenecks of conventional linear interpolation methods in recovering spatial details. Finally, the MSGFF-DSU-Deeplab method proposed in this study was experimentally validated on two high-resolution remote sensing image datasets: WHU-Building and Vaihingen. The results indicate that the proposed method significantly outperformed the mainstream segmentation methods in terms of various assessment metrics. It yielded mean intersection over union (mIoU) values of 90.59% on the WHU-Building dataset and 81.42% on the Vaihingen dataset, representing an increase of approximately 1% compared to the baseline model DeeplabV3+. Notably, the proposed method significantly improved the identification accuracy of edge details while maintaining the encouraging computational efficiency. Therefore, it particularly performs well in the edge segmentation of small-scale buildings and complex features. Overall, the proposed method enhances both the segmentation performance for remote sensing images and the model's adaptability to diverse scenarios, offering novel technical insights into the analysis of high-precision remote sensing images.

HTML全文

参考文献(27)

施引文献

资源附件(0)