基于最大熵强化学习的自主船舶航迹跟踪研究

公告通知

下载文档

联系方式

主管单位:: 中国船舶集团有限公司

主办单位:: 中国舰船研究院、中国船舶集团有限公司第七一四研究所

编辑出版:: 《舰船科学技术》编辑部

联系地址:: 北京市朝阳区科荟路55号院

邮编:: 100101

电话:: 陈老师：010-83027277
宋老师：010-83027276
李老师：010-83027269
梁老师：010-83027281

邮箱:: jckxjs@163.com

ISSN:: 1672-7649

CN:: 11-1885/U

友情链接

当前位置：首页 > 过刊浏览->2023年45卷23期

基于最大熵强化学习的自主船舶航迹跟踪研究
Research on autonomous ship track tracking based on maximum entropy reinforcement learning

DOI:

作者:: 翟宏睿^1,2, 罗亮^1,2, 杨萌³, 梁新月⁴, 焦仕昂^1,2, 刘维勤^1,2
ZHAI Hong-rui^1,2, LUO Liang^1,2, YANG Meng³, LIANG Xin-yue⁴, JIAO Shi-ang^1,2, LIU Wei-qin^1,2

作者单位:: 1. 高性能船舶技术教育部重点实验室，湖北武汉 430000;
2. 武汉理工大学船海与能源动力工程学院，湖北武汉 430000;
3. 中国舰船研究设计中心，湖北武汉 430000;
4. 武汉理工大学交通与物流工程学院，湖北武汉 430000
1. Key Laboratory of High Performance Ship Technology, Ministry of Education, Wuhan 430000, China;
2. School of Naval Architecture Ocean and Energy Power Engineering, Wuhan University of Technology, Wuhan 430000, China;
3. China Ship Research and Design Center, Wuhan 430000, China;
4. School of Transportation and Logistics Engineering, Wuhan University of Technology, Wuhan 430000, China

关键词:: 自主船舶;航迹跟踪;最大熵强化学习;视线制导算法;优先经验回放
autonomous ship; track tracking; maximum entropy reinforcement learning; line-of-sight guidance algorithm; priority experience replay

摘要:: 为解决自主船舶在航迹跟踪过程中使用最大熵强化学习作为控制器出现的收敛速度慢和训练时间长等问题，提出一种基于改进最大熵强化学习的航迹跟踪算法，引入了优先经验回放(PER)技术，并结合视线制导算法(LOS)，构建PER-SAC的深度强化学习控制器，设计了相应的状态、动作空间和奖励函数。仿真结果表明，设计的PER-SAC控制器能快速收敛，收敛稳定后的控制器相较于原始SAC控制器控制性能更稳定，且控制精度更高，为自主船舶的航迹跟踪控制提供了一定参考价值。
To solve the problems of slow convergence and long training time that can occur when using maximum entropy reinforcement learning as a controller in the course of track tracking for autonomous ships, a track tracking algorithm based on improved maximum entropy reinforcement learning is proposed, introducing the preferred experience playback (PER) technique and combining it with the line of sight guidance algorithm (LOS) to construct a deep reinforcement learning controller for PER-SAC and design the corresponding the state, action space and reward function. Simulation results show that the designed PER-SAC controller can converge quickly, and the control performance is more stable and the control accuracy is higher after convergence and stabilisation compared to the original SAC controller, which provides some reference value for the track tracking control of autonomous ships.

2023,45(23): 78-84 收稿日期：2022-12-08

DOI：10.3404/j.issn.1672-7649.2023.23.014

分类号：U664.82

基金项目：国家自然科学基金资助项目（52101368）；国防科工局国防基础科研计划项目（JCKY2020206B037）

作者简介：翟宏睿(1997-),男,硕士研究生,研究方向为绿色智能船舶

参考文献：
[1] ZHAO L, ROH M I, LEE S J. Control method for path following and collision avoidance of autonomous ship based on deep reinforcement learning[J]. Journal of Marine Science and Technology, 2019, 27(4): 1
[2] 王珂, 卜祥津, 李瑞峰, 等. 景深约束下的深度强化学习机器人路径规划[J]. 华中科技大学学报(自然科学版), 2018, 46(12): 77–82
WANG Ke, BU Xiangjin, LI Ruifeng, et al. Deep reinforcement learning robot path planning under the constraint of depth of field[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2018, 46(12): 77–82
[3] SHEN H, GUO C. Path-following control of underactuated ships using actor-critic reinforcement learning with MLP neural networks[C]//2016 Sixth International Conference on Information Science and Technology (ICIST). Dalian, China: IEEE, 2016: 317–321.
[4] MARTINSEN A B, LEKKAS A M. Straight-path following for underactuated marine vessels using deep reinforcement learning[J]. IFAC-PapersOnLine, 2018, 51(29): 329–334
[5] ZHAO Y, MA Y, HU S. USV formation and path-following control via deep reinforcement learning with random braking[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(12): 5468–5478
[6] 任彧, 赵师涛. 磁导航AGV深度强化学习路径跟踪控制方法[J]. 杭州电子科技大学学报(自然科学版), 2019, 39(2): 28–34
REN Yu, ZHAO Shitao. Magnetic navigation AGV deep reinforcement learning path tracking control method[J]. Journal of Hangzhou Dianzi University (Natural Science Edition), 2019, 39(2): 28–34
[7] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533
[8] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//ICLR (Poster). 2016.
[9] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[J]. arXiv: 1812.05905, 2018.
[10] CHUN D H, ROH M I, LEE H W, et al. Deep reinforcement learning-based collision avoidance for an autonomous ship[J]. Ocean Engineering, 2021, 234: 109216

基于最大熵强化学习的自主船舶航迹跟踪研究 Research on autonomous ship track tracking based on maximum entropy reinforcement learning

基于最大熵强化学习的自主船舶航迹跟踪研究
Research on autonomous ship track tracking based on maximum entropy reinforcement learning