|
Abstract To address the issues of low detection accuracy caused by significant scale variations, complex scenes, and limited feature information of small targets in remote sensing images, as well as low detection efficiency resulting from the large parameter size and high complexity of current object detection models, this study proposes a lightweight YOLOv7-tiny model for remote sensing image detection. First, the network neck was improved by incorporating group shuffle convolution (GSConv) and VoV-GSCSP modules. This allows for sufficient detection accuracy while reducing computational costs and network complexity. Second, a dynamic head (DyHead) combined with an attention mechanism was adopted during prediction. The performance of the detection head was enhanced using multi-head self-attention across scale-aware feature layers, spatially-aware positions, and task-aware output channels. Finally, the loss function of the original model was optimized by integrating the normalized Wasserstein distance (NWD) metric for small-target assessment and a bounding box regression loss function based on the minimum point distance IoU (MPDIoU). This assists in enhancing robustness for small target detection. The experimental results demonstrate that the proposed algorithm achieved mAP@50 scores of 87.7% and 94.7% on the DIOR and RSOD datasets, respectively, indicating increases of 2.7 and 5.1 percentage points compared to the original YOLOv7-tiny model. Furthermore, the frames per second (FPS) increased by 12.2% and 11.9%, respectively. Therefore, the proposed algorithm can effectively enhance both the accuracy and real-time performance of small target detection from remote sensing images.
|
Keywords
remote sensing images
object detection
YOLOv7-tiny
GSConv
MPDIoU
DyHead
|
|
Issue Date: 03 September 2025
|
|
|
[1] |
Zou Z, Chen K, Shi Z, et al. Object detection in 20 years:A survey[J]. Proceedings of the IEEE, 2023, 111(3):257-276.
|
[2] |
Dai J, Li Y, He K, et al. R-FCN:Object detection via region-based fully convolutional networks[J]. Advances in neural information processing systems, 2016, 29:379-387.
|
[3] |
Zaidi S S A, Ansari M S, Aslam A, et al. A survey of modern deep learning based object detection models[J]. Digital Signal Processing, 2022, 126:103514.
|
[4] |
付涵, 范湘涛, 严珍珍, 等. 基于深度学习的遥感图像目标检测技术研究进展[J]. 遥感技术与应用, 2022, 37(2):290-305.
|
[4] |
Fu H, Fan X T, Yan Z Z, et al. Progress of object detection in remote sensing images based on deep learning[J]. Remote Sensing Technology and Application, 2022, 37(2):290-305.
|
[5] |
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition.June 23-28,2014,Columbus,OH,USA.IEEE, 2014: 580-587.
|
[6] |
Girshick R. Fast R-CNN[C]// 2015 IEEE International Conference on Computer Vision (ICCV).December 7-13, 2015: 1440-1448.
|
[7] |
Ren S, He K, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
|
[8] |
Redmon J, Divvala S, Girshick R, et al. You only look once:Unified,real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).June 27-30,2016,Las Vegas,NV,USA.IEEE, 2016: 779-788.
|
[9] |
Redmon J, Farhadi A. YOLO9000:Better,faster,stronger[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE, 2017: 6517-6525.
|
[10] |
Li C, Li L, Jiang H, et al. YOLOv6:A single-stage object detection framework for industrial applications[J/OL]. arXiv, 2022(2022-09-07)[2024-03/12].https://doi.org/10.48550/arXiv.2209.02976.
|
[11] |
Wang C Y, Yeh I H, Mark Liao H Y. YOLOv9:Learning what you want toLearn using programmable gradient information[C]// Computer Vision-ECCV 2024. Cham: Springer Nature Switzerland, 2025: 1-21.
|
[12] |
Liu W, Anguelov D, Erhan D, et al. SSD:single shot MultiBox detector[M]// Computer Vision-ECCV 2016.Cham: Springer International Publishing, 2016: 21-37.
|
[13] |
Zhou X, Wang D, Krhenbühl P. Objects as points[J/OL]. arXiv, 2019(2019-04-25)[2024-03/12].https://doi.org/10.48550/arXiv.1904.07850.
|
[14] |
Shamsolmoali P, Zareapoor M, Yang J, et al. Enhanced single-shot detector for small object detection in remote sensing images[C]// IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2022: 1716-1719.
|
[15] |
张路青, 郭莹. 基于卷积神经网络的遥感图像目标检测识别[J]. 舰船电子工程, 2023, 43(5):49-53.
|
[15] |
Zhang L Q, Guo Y. Remote sensing image object detection and reco-gnition based on convolutional neural network[J]. Ship Electronic Engineering, 2023, 43(5):49-53.
|
[16] |
Cao S, Wang T, Li T, et al. UAV small target detection algorithm based on an improved YOLOv5s model[J]. Journal of Visual Communication and Image Representation, 2023, 97:103936.
|
[17] |
Li X, Wei Y, Li J, et al. Improved YOLOv7 algorithm for small object detection in unmanned aerial vehicle image scenarios[J]. Applied Sciences, 2024, 14(4):1664.
|
[18] |
Wang C Y, Bochkovskiy A, Liao H M. YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE, 2023: 7464-7475.
|
[19] |
Tan M, Pang R, Le Q V. EfficientDet:Scalable and efficient object detection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE, 2020: 10781-10790.
|
[20] |
李安达, 吴瑞明, 李旭东. 改进YOLOv7的小目标检测算法研究[J]. 计算机工程与应用, 2024, 60(1):122-134.
|
[20] |
Li A D, Wu R M, Li X D. Research on improving YOLOv7’s small target detection algorithm[J]. Computer Engineering and Applications, 2024, 60(1):122-134.
|
[21] |
Gevorgyan Z. SIoU loss: More powerful learning for bounding box regression[EB/OL]. 2022: 2205.12740. https://arxiv.org/abs/2205.12740v1.
url: https://arxiv.org/abs/2205.12740v1
|
[22] |
Qi Z, Ren Y, Long J, et al. Application of YOLOv7 in remote sen-sing image target detection[C]// 2023 42nd Chinese Control Conference (CCC).IEEE, 2023: 7603-7608.
|
[23] |
Li H, Li J, Wei H, et al. Slim-neck by GSConv: A lightweight-design for real-time detector architectures[EB/OL]. 2022: 2206.02424. https://arxiv.org/abs/2206.02424v3.
url: https://arxiv.org/abs/2206.02424v3
|
[24] |
Dai X, Chen Y, Xiao B, et al. Dynamic head:Unifying object detection heads with attentions[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE, 2021: 7369-7378.
|
[25] |
Wang J, Xu C, Yang W, et al. A normalized Gaussian Wasserstein distance for tiny object detection[EB/OL]. 2021: 2110.13389. https://arxiv.org/abs/2110.13389v2.
url: https://arxiv.org/abs/2110.13389v2
|
[26] |
Ma S, Xu Y, Ma S, et al. MPDIoU: A loss for efficient and accurate bounding box regression[EB/OL]. 2023: 2307.07662. https://arxiv.org/abs/2307.07662v1.
url: https://arxiv.org/abs/2307.07662v1
|
[27] |
Zheng Z, Wang P, Liu W, et al. Distance-IoU loss:Faster and better learning for bounding box regression[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7):12993-13000.
|
[28] |
Bochkovskiy A, Wang C Y, Liao H M. YOLOv4:Optimal speed and accuracy of object detection[EB/OL]. 2020: 2004.10934. https://arxiv.org/abs/2004.10934v1.
url: https://arxiv.org/abs/2004.10934v1
|
[29] |
He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]// Computer Vision-ECCV 2014. Cham: Springer International Publishing, 2014: 346-361.
|
[30] |
Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]// 2017 IEEE International Conference on Computer Vision (ICCV).IEEE, 2017: 2999-3007.
|
[31] |
Chollet F. Xception: Deep learning with depthwise separable convolutions[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE, 2017: 1800-1807.
|
[32] |
Howard A G, Zhu M, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[J/OL]. ar-Xiv, 2017(2020-04-17)[2024-03/12].https://doi.org/10.48550/arXiv.1704.04861.
|
[33] |
Zhang X, Zhou X, Lin M, et al. ShuffleNet:An extremely efficient convolutional neural network for mobile devices[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE, 2018: 6848-6856.
|
[34] |
Zhang Y F, Ren W, Zhang Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506:146-157.
|
[35] |
Li K, Wan G, Cheng G, et al. Object detection in optical remote sensing images:A survey and a new benchmark[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2020, 159:296-307.
|
[36] |
Xiao Z, Liu Q, Tang G, et al. Elliptic Fourier transformation-based histograms of oriented gradients for rotationally invariant object detection in remote-sensing images[J]. International Journal of Remote Sensing, 2015, 36(2):618-644.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|