基于改进型Swin-UperNet模型的农业塑料薄膜语义分割研究

马仪; 王国芳; 文刚; 王一帆; 严逸骏

doi:10.6046/zrzyyg.2025152

基于改进型Swin-UperNet模型的农业塑料薄膜语义分割研究

Semantic segmentation of agricultural plastic films based on the improved Swin-UperNet model

摘要

摘要: 准确掌握农业塑料薄膜的空间分布情况，对优化农业结构、改善生态环境以及保障输电线路安全至关重要。已有研究提出的农业塑料薄膜语义分割模型在利用遥感图像的全局上下文信息方面存在不足，尤其在背景复杂、场景多样的农村地区，导致塑料大棚和地膜在影像中像素占比相对较低，引发严重的样本不均衡问题，降低了检测效果。针对上述问题，该文提出了一种改进型Swin-UperNet语义分割模型，实现农业塑料薄膜(塑料大棚和地膜)的精确语义分割。该模型在编码器Swin Transformer模块与解码器UperNet模块间添加卷积块注意力模块(convolutional block attention module, CBAM)，从空间与通道维度增强模型对全局上下文信息的捕获能力，并将损失函数替换为焦点交叉熵损失函数(focal cross-entropy loss, FCELoss)，自适应增加少样本类别的权重以缓解样本不均衡问题。实验结果表明，改进后的Swin-UperNet模型在农业塑料薄膜语义分割任务上取得了显著提升，平均准确率(mean accuracy, mAccuracy)达95.81%，平均交并比(mean intersection over union, mIoU)达89.81%，相较于原始Swin-UperNet模型，mAccuracy和mIoU分别提升了7.92%和5.92%; 与经典的U-Net和Deeplabv3模型相比，mAccuracy分别提升6.41%和3.93%，mIoU分别提升10.49%和3.18%。

Abstract: Accurately understanding the spatial distribution of agricultural plastic films (including plastic greenhouses and plastic mulches) is crucial for optimizing the agricultural structure, improving ecosystems, and ensuring the safety of power transmission lines. Most existing models for semantic segmentation of agricultural plastic films show limitations in leveraging global contextual information in remote sensing images. Particularly, complex backgrounds and diverse scenarios in rural areas lead to low pixel proportions of plastic greenhouses and plastic mulches in the images. This consequence further causes severe class imbalance and undermines the detection performance. To address these challenges, this study proposed an improved Swin-UperNet model for accurate semantic segmentation of agricultural plastic films. The specific improvements of this model are as follows. A convolutional block attention module (CBAM) is added between the Swin Transformer encoder and the UperNet decoder to enhance the model's ability to capture global contextual information in both spatial and channel dimensions. The loss function is replaced with the focal cross-entropy loss (FCELoss) function to adaptively increase the weight of minority classes, effectively mitigating the class imbalance. Experimental results demonstrate that the improved Swin-UperNet model exhibited significant improvements in the semantic segmentation of agricultural plastic films. It yielded a mean accuracy (mAccuracy) of 95.81% and a mean intersection over union (mIoU) of 89.81%. Compared to the original Swin-UperNet model, the mAccuracy and mIoU increase by 7.92% and 5.92%, respectively. Compared to the classic U-Net and Deeplabv3 models, the mAccuracy increases by 6.41% and 3.93%, and the mIoU increases by 10.49% and 3.18%, respectively.

HTML全文

参考文献(15)

施引文献

资源附件(0)