A YOLOv5-based target detection method using high-resolution remote sensing images
SONG Shuangshuang1(), XIAO Kaifei1, LIU Zhaohua1(), ZENG Zhaoliang2
1. School of Civil and Surveying & Mapping Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China 2. State Key Laboratory of Severe Weather, Chinese Academy of Meteorological Sciences, Beijing 100081, China
High-resolution remote sensing images contain rich data and information, which reduce the difference between the target and the background, resulting in substandard detection accuracy and reduced target detection performance. Based on the deep learning algorithm You Only Look Once (YOLO), this study designed a lightweight network model GC-YOLOv5 by combining end-to-end coordinate attention (CA) and the lightweight network module GhostConv. The CA was employed to encode channels along the horizontal and vertical directions, enabling the attention mechanism module to simultaneously capture remote spatial interactions with precise location information and helping the network locate targets of interest more accurately. The original ordinary convolutional module convolutional-batchnormal-SiLu (CBS) was replaced by the GhostConv module, reducing the number of parameters in the feature channel fusion process and the size of the optimal model. Experiments were conducted on the GC-YOLOv5 using the publicly available NWPU-VHR-10 dataset, with the robustness of the model verified on the RSOD dataset. The results show that GC-YOLOv5 yielded a detection accuracy of 96.5% on the NWPU-VHR-10 dataset, with a recall rate of 96.4% and mAP of 97.7%. Moreover, GC-YOLOv5 achieved satisfactory results on the RSOD dataset.
Fig.4 Normal convolution operations and GhostConv convolution module operations
Fig.5 Sample diagram of NWPU-VHR-10 dataset and RSOD dataset
平台
配置
脚本语言
Python3.7.15
深度学习框架
Torch1.12.1+cu113
GPU类型
Tesla T4
NVIDIA
NVIDIA-SMI 460.32.03
CUDA版本
CUDA Version:11.2
Tab.1 Experimental environment configuration
参数
配置
神经网络优化器
SGD
批次大小
16
学习率
0.01
动量参数
0.937
权重衰减
0.000 5
训练轮数
200
Tab.2 Experimental training parameters
Fig.6 Results comparison chart of precision and mAP@0.5
模型
精度/%
召回率/%
mAP@0.5/%
参数数量/106个
权重文件数量/MB
FPS/(幅·s-1)
YOLOv5
93.1
92.5
95
12.34
24.6
40.65
YOLOv5-CA
92.5
95.6
95.6
12.36
25.9
50.76
YOLOv5-Ghost
93.4
91.7
93.7
9.67
19.5
47.62
GC-YOLOv5
96.5
96.4
97.7
11.37
20.5
46.51
Tab.3 Performance improvement of each part design on the result
模型
精度/ %
召回 率/%
mAP@0.5/%
FPS/(幅·s-1)
Faster-RCNN
91.8
93.8
—
—
YOLOv5
93.1
92.5
95
40.65
GC-YOLOv5
96.5
96.4
97.7
46.51
Tab.4 Performance of different methods on NWPU-VHR-10 dataset
Fig.7 Example of detection results of NWPU-VHR-10 dataset
图像类别
真实标签
Faster-RCNN
YOLOv5
GC-YOLOv5
飞机
13
13
13
13
网球场
4
4
4
4
操场
1
1
1
1
汽车
0
0
0
1
棒球场
0
1
0
0
Tab.5 Comparison between the detection results of different algorithms and the real label in NWPU-VHR-10 dataset
模型
精度/%
召回率/%
mAP@0.5/%
FPS/(幅·)
Faster-RCNN
91.8
89.8
--
--
YOLOv5
94.0
86.0
88.7
64.93
GC-YOLOv5
93.4
90.5
92.3
64.93
Tab.6 Performance of different methods on RSOD dataset
Fig.8-1 Example of detection results of RSOD dataset
Fig.8-2 Example of detection results of RSOD dataset
图像类别
真实标签
Faster-RCNN
YOLOv5
GC-YOLOv5
飞机
10
10
10
10
立交桥
1
2
1
1
Tab.7 Comparison between the detection results of different algorithms and the real label in RSOD dataset
[1]
Fang X L, Hu F, Yang M, et al. Small object detection in remote sensing images based on super-resolution[J]. Pattern Recognition Letters, 2022, 153:107-112.
[2]
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 2014:580-587.
[3]
Girshick R. Fast R-CNN[C]// 2015 IEEE International Conference on Computer Vision (ICCV).IEEE, 2015:1440-1448.
[4]
Ren S Q, He K M, Girshick R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[5]
Redmon J, Divvala S, Girshick R, et al. You only look once:Unified,real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE, 2016:779-788.
[6]
Redmon J, Farhadi A. YOLO9000:Better,faster,stronger[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE, 2017:6517-6525.
Wang C Y, Bochkovskiy A, Liao H Y M. YOLOv7:Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[J/OL]. arXiv, 2022(2022-7-6). http://arxiv.org/abs/2207.02696.
url: http://arxiv.org/abs/2207.02696
[9]
Jiang P Y, Ergu D J, Liu F Y, et al. A review of YOLO algorithm developments[J]. Procedia Computer Science, 2022, 199:1066-1073.
[10]
Liu W, Anguelov D, Erhan D, et al. SSD:single shot MultiBox detector[C]// Computer Vision-ECCV 2016. Springer International Publishing, 2016:21-37.
[11]
Lamane M, Tabaa M, Klilou A. Classification of targets detected by mmWave radar using YOLOv5[J]. Procedia Computer Science, 2022, 203:426-431.
[12]
Puliti S, Astrup R. Automatic detection of snow breakage at single tree level using YOLOv5 applied to UAV imagery[J]. International Journal of Applied Earth Observation and Geoinformation, 2022, 112:102946.
[13]
Li S W, Gu X Y, Xu X R, et al. Detection of concealed cracks from ground penetrating radar images based on deep learning algorithm[J]. Construction and Building Materials, 2021, 273:121949.
[14]
Wang Y, Bashir S M A, Khan M, et al. Remote sensing image super-resolution and object detection:Benchmark and state of the art[J]. Expert Systems with Applications, 2022, 197:116793.
[15]
Qu Z F, Zhu F Z, Qi C X. Remote sensing image target detection:Improvement of the YOLOv3 model with auxiliary networks[J]. Remote Sensing, 2021, 13(19):3908.
[16]
Qi J T, Liu X N, Liu K, et al. An improved YOLOv5 model based on visual attention mechanism:Application to recognition of tomato virus disease[J]. Computers and Electronics in Agriculture, 2022, 194:106780.
[17]
Bao W X, Du X, Wang N, et al. A defect detection method based on BC-YOLO for transmission line components in UAV remote sensing images[J]. Remote Sensing, 2022, 14(20):5176.
[18]
Chen W, Gao L, Li X Y, et al. Lightweight convolutional neural network with knowledge distillation for cervical cells classification[J]. Biomedical Signal Processing and Control, 2022, 71:103177.
[19]
Yao J, Qi J M, Zhang J, et al. A real-time detection algorithm for kiwifruit defects based on YOLOv5[J]. Electronics, 2021, 10(14):1711.
[20]
Han K, Wang Y H, Tian Q, et al. GhostNet:More features from cheap operations[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE, 2020:1577-1586.
[21]
Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).IEEE, 2021:13708-13717.
[22]
Cheng G, Han J W, Zhou P C, et al. Multi-class geospatial object detection and geographic image classification based on collection of part detectors[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 98:119-132.
[23]
Cheng G, Han J W. A survey on object detection in optical remote sensing images[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 117:11-28.
[24]
Long Y, Gong Y P, Xiao Z F, et al. Accurate object localization in remote sensing images based on convolutional neural networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(5):2486-2498.
[25]
Jocher G, Chaurasia A, Stoken A, et al. Ultralytics/YOLOv5:V6.2-YOLOv5 classification models,apple M1,reproducibility,clearML and deci.ai integrations[Z]. Zenodo, 2022.
[26]
Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE, 2018:7132-7141.
[27]
Woo S H Y, Park J C, Lee J Y, et al. CBAM:Convolutional block attention module[C]// European Conference on Computer Vision.Springer, 2018:3-19.