Abstract:
Accurately understanding the spatial distribution of agricultural plastic films (including plastic greenhouses and plastic mulches) is crucial for optimizing the agricultural structure, improving ecosystems, and ensuring the safety of power transmission lines. Most existing models for semantic segmentation of agricultural plastic films show limitations in leveraging global contextual information in remote sensing images. Particularly, complex backgrounds and diverse scenarios in rural areas lead to low pixel proportions of plastic greenhouses and plastic mulches in the images. This consequence further causes severe class imbalance and undermines the detection performance. To address these challenges, this study proposed an improved Swin-UperNet model for accurate semantic segmentation of agricultural plastic films. The specific improvements of this model are as follows. A convolutional block attention module (CBAM) is added between the Swin Transformer encoder and the UperNet decoder to enhance the model's ability to capture global contextual information in both spatial and channel dimensions. The loss function is replaced with the focal cross-entropy loss (FCELoss) function to adaptively increase the weight of minority classes, effectively mitigating the class imbalance. Experimental results demonstrate that the improved Swin-UperNet model exhibited significant improvements in the semantic segmentation of agricultural plastic films. It yielded a mean accuracy (mAccuracy) of 95.81% and a mean intersection over union (mIoU) of 89.81%. Compared to the original Swin-UperNet model, the
mAccuracy and
mIoU increase by 7.92% and 5.92%, respectively. Compared to the classic U-Net and Deeplabv3 models, the
mAccuracy increases by 6.41% and 3.93%, and the
mIoU increases by 10.49% and 3.18%, respectively.