为了解决实际场景下船舶目标检测精度低的问题,本文在Cascade R-CNN算法的基础上,提出一种船舶目标检测方法Boat R-CNN。Boat R-CNN使用带自注意力机制的Swin-Transformer Tiny网络提取图像特征,使用Soft-NMS非极大值抑制方法提升候选框过滤精度,使用Smooth_L1损失函数加速模型收敛并减少梯度爆炸情况,使用CIOU边界框回归损失提高候选框回归质量,并针对船舶目标的形状特征优化锚框的长宽比,提高锚框的生成质量。实验结果表明,Boat R-CNN算法的精度相比原版Cascade R-CNN算法提高了21.8%,相比主流Faster R-CNN算法提高了30.3%,有效提升了实际场景下的船舶目标检测精度。
To address the issue of low accuracy in boat object detection in real-world scenarios, this paper improves upon the Cascade R-CNN algorithm and proposes a boat object detection method called Boat R-CNN. Boat R-CNN utilizes the Swin-Transformer Tiny network with a self-attention mechanism to extract image features, employs Soft-NMS for non-maximum suppression to enhance the filtering precision of candidate bounding boxes, uses the Smooth_L1 loss function to accelerate model convergence and reduce gradient explosion, and utilizes CIOU bounding box regression loss to improve the quality of candidate box regression. Furthermore, the aspect ratio of anchor boxes is optimized for the shape characteristics of boat objects, improving the quality of anchor box generation. Experimental results have shown that the Boat R-CNN algorithm has increased accuracy by 21.8% compared to the original Cascade R-CNN algorithm and 30.3% compared to the mainstream Faster R-CNN algorithm. Boat R-CNN effectively improves the accuracy of boat object detection in real-world scenarios.
2024,46(6): 144-149 收稿日期:2023-03-24
DOI:10.3404/j.issn.1672-7649.2024.06.025
分类号:TP391.41
作者简介:杨镇宇(1997-),男,硕士研究生,研究方向为海洋大数据系统与算法
参考文献:
[1] 陈科圻, 朱志亮, 邓小明, 等. 多尺度目标检测的深度学习研究综述[J]. 软件学报, 2021, 32(4): 1201-1227.
[2] 王瑶, 胥辉旗, 姜义, 等. 基于深度学习的舰船目标检测技术发展综述[J]. 飞航导弹, 2021, 6(2): 76-81.
[3] REN S, HE K, GIRSHICK R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[C]// In: Proceeding of the Neural Information Processing Systems, 2015: 91-99.
[4] CAI Z, VASCONCELOs N. Cascade R-CNN: delving into high quality object detection[C]// IEEE/CVE Comference on Computer Vision and Pattern Recognition, 2018.
[5] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2016.
[6] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Comell Uiversity Library’s arXiv.org, 2017.
[7] 袁明新, 张丽民, 朱友帅, 等. 基于深度学习方法的海上舰船目标检测[J]. 舰船科学技术, 2019, 41(1): 111-115+124.
[8] LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[OL]. Chinese Software Developer Network, 2021: 2103.14030.
[9] LIN TY, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// Proceeding. of the Computer Vision and Pattern Recognition. 2017: 2117-2125.
[10] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[C]// International Conference on Learning Representations, 2021.