多维协同优化与动态特征补偿的遥感图像语义分割

曹辰钰; 陈琳琳; 王恒友; 霍连志

doi:10.6046/zrzyyg.2025173

多维协同优化与动态特征补偿的遥感图像语义分割

Semantic segmentation of remote sensing images based on multi-dimensional collaborative optimization and dynamic feature compensation

摘要

摘要: 针对高分辨率遥感影像中的语义信息衰减与多尺度特征冲突问题，该文提出一种多维协同优化与跨层级动态特征补偿的语义分割网络。首先，引入多维协同优化分割头模块(multi-dimensional collaborative optimization head, MCOH)，通过通道注意力、空间注意力与多尺度可变形卷积的三支路协同优化机制，抑制浅层特征噪声并增强深层语义一致性，实现细节保留与上下文建模的平衡; 同时，引入跨层级动态语义补偿模块(cross-level dynamic semantic compensation module, CDSC)，通过构建低层细节特征与高层语义特征的互相关矩阵，显式量化同类目标的语义相似性，并利用动态调制系数以残差形式定向增强细节信息的特征表达，有效缓解深层网络中的语义信息丢失问题。通过集成上述模块，且依托卷积神经网络(convolutional neural networks，CNN)局部细节提取与Transformer全局语义建模的互补优势，该方法在国际主流高分辨率遥感数据集上表现突出，在Vaihingen，Potsdam及LoveDA数据集上，平均交并比(mean intersection over union, mIoU)分别达82.49%，85.10%和52.15%，相较于许多主流模型，分割精度显著提升; 同时模型内存占用优于多数对比方法，参数和计算复杂度处于可接受范围，验证了精度与效率的平衡优势，为道路提取、灾害监测、土地覆盖分类等实际场景中的高分辨率遥感影像精准解译提供了兼具性能与实用性的高效技术方案。

Abstract: High-resolution remote sensing images suffer from semantic information attenuation and multi-scale feature conflicts. To address these issues, this study proposed a semantic segmentation network that integrates multi-dimensional collaborative optimization with cross-level dynamic feature compensation. Specifically, a multi-dimensional collaborative optimization head (MCOH) module was introduced to the network. Using a three-branch collaborative optimization mechanism that involved channel attention, spatial attention, and multi-scale deformable convolution, the MCOH module can suppress shallow feature noise and enhance deep semantic consistency, thereby achieving a trade-off between detail preservation and context modeling. Subsequently, a cross-level dynamic semantic compensation (CDSC) module was incorporated, which can explicitly quantify the semantic similarity of identical objects by constructing a cross-correlation matrix between low-level detail features and high-level semantic features. Furthermore, using a dynamic modulation coefficient, the CDSC module can directionally enhance the feature representation of details via residual connections, effectively mitigating the semantic information loss in a deep network. By integrating the above modules and leveraging the advantages of the convolutional neural network (CNN) and Transformer in terms of local detail extraction and global semantic modeling, respectively, the network model proposed in this study delivered outstanding performance on mainstream international high-resolution remote sensing datasets. The model exhibited mean intersection over union (mIoU) values of 82.49%, 85.10%, and 52.15% on the Vaihingen, Potsdam, and LoveDA datasets, respectively released by the International Society for Photogrammetry and Remote Sensing (ISPRS). Compared to many prevalent models, the proposed model significantly increased segmentation accuracy. Meanwhile, the proposed model outperformed most of its counterparts in terms of memory usage while maintaining parameters and computational complexity within an acceptable level, validating its balance between accuracy and efficiency. The proposed network model in this study serves as a high-performance and practical technical solution for the accurate interpretation of high-resolution remote sensing images in real-world applications such as road extraction, hazard monitoring, and land cover classification.

HTML全文

参考文献(37)

施引文献

资源附件(0)