Re-YOLOX: A YOLOX model for identifying nearshore monitoring targets improved based on the Resizer model
WANG Zhenhua1,2(), TAN Zhilian1,2, LI Jing1,2, CHANG Yingli1()
1. College of Information Technology,Shanghai Ocean University,Shanghai 201306,China 2. Key Laboratory of Marine Environmental Survey Technology and Application, Ministry of Natural Resources,Guangzhou 510000,China
Nearshore monitoring covers natural environments and human activities. High-accuracy identification of nearshore monitoring targets significantly influences the healthy development of the marine economy, the ecological protection of marine environments, and the prevention and mitigation of marine disasters. The nearshore monitoring targets feature multiple types, diverse sizes, and uncertainty. The existing identification models suffer low accuracy, low efficiency, and severe omission of small targets. This study proposed an identification model (Re-YOLOX) for nearshore monitoring targets by improving YOLOX using a learnable image resizer model (the Resizer model). First, the model training was intensified using the Resizer model to improve the feature learning and expression abilities and the recall rate of the Re-YOLOX model. Then, the feature pyramid fusion structure of the YOLOX algorithm was improved to reduce the omission of small targets in the identification. With the nearshore video data from UAV monitoring as the data set and cars, ships, and piles as monitoring targets, this study compared the Re-YOLOX model with other models, including CenterNet, Faster R-CNN, YOLOv3, and YOLOX. The results show that the Re-YOLOX model yielded a mean average precision of 94.23%, a mean recall of 91.99%, and a mean F1 score of 89.67%, all of which were higher than those of the other models. In summary, the Re-YOLOX model can improve the target identification accuracy while ensuring target identification efficiency, thus providing technical support for managing nearshore seas.
王振华, 谭智联, 李静, 常英立. Re-YOLOX: 利用Resizer改进的YOLOX近岸海域监测目标识别模型[J]. 自然资源遥感, 2023, 35(3): 10-16.
WANG Zhenhua, TAN Zhilian, LI Jing, CHANG Yingli. Re-YOLOX: A YOLOX model for identifying nearshore monitoring targets improved based on the Resizer model. Remote Sensing for Natural Resources, 2023, 35(3): 10-16.
He R, Shi H M, Tu J B, et al. The influence of ocean engineering in the coastal area of Tianjin[J]. Ocean Development and Management, 2019, 36(5):63-66.
[2]
周扬. 基于深度学习的目标跟踪算法研究[D]. 扬州: 扬州大学, 2019.
Zhou Y. Research on target tracking algorithm based on deep learning[D]. Yangzhou: Yangzhou University, 2019.
[3]
范航恺. 基于卷积神经网络的序列特异性预测研究[D]. 昆明: 云南大学, 2016.
Fan H K. Prediction of sequence specificity based on convolutional neural network[D]. Kunming: Yunnan University, 2016.
[4]
LeCun Y, Bottou L. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
doi: 10.1109/5.726791
[5]
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition,Columbus,OH,USA, 2014:580-587.
[6]
Redmon J, Divvala S, Girshick R, et al. You only look once:Unified,real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),Las Vegas,NV,USA, 2016:779-788.
Guan K P, Han X, Jiang Y. Visual image statistics method of ship traffic flow based on tbd strategy[J]. Journal of Shanghai Maritime University, 2021, 42(2): 40-44,95.
Xu Y L, Liang J R, Dong G J, et al. Target detection algorithm for aerial images based on improved CenterNet[J]. Laser and Optoelectronics Progress, 2021, 58(20): 192-201.
Sheng M W, Li J, Qin H D, et al. Ship target detection algorithm based on improved YOLOv3[J]. Navigation and Control, 2021, 20(2):95-109.
[13]
Talebi H, Milanfar P. Learning to resize images for computer vision tasks[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV),Montreal,QC,Canada, 2021:487-496.
[14]
Ge Z, Liu S, Wang F, et al. YOLOX:Exceeding YOLO series in 2021[EB/OL].[2021-08-06]. https://arxiv.org/abs/2107.08430.
[15]
Wang C Y, Liao H Y M, Wu Y H, et al. CSPNet:A new backbone that can enhance learning capability of CNN[C]// in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 390-391.
[16]
He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convo-lutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
doi: 10.1109/TPAMI.2015.2389824
Liu X, Chen S Y, Chen X L, et al. Deep multi-scale feature fusion target detection algorithm based on deep learning[J]. Laser and Optoelectronics Progress, 2021, 58(12): 304-312.
[18]
Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4,inception-resnet and the impact of residual connections on learning[C]// Thirty-first AAAI Conference on Artificial Intelligence, 2017:4278-4284.
[19]
Lin T Y, Maire M, Belongie S, et al. Microsoft COCO:Common objects in context[C]// Proceedings of the European Conference on Computer Vision, 2014:740-755.
Chen P D, Huang L, Xia Y, et al. Detection and recognition of road traffic signs in UAV images based on Mask R-CNN[J]. Remote Sensing for Land and Resources, 2020, 32(4): 61-67.doi: 10.6046/gtzyyg.2020.04.09.
doi: 10.6046/gtzyyg.2020.04.09